datafu.pig.bags
Class BagLeftOuterJoin

java.lang.Object
  extended by org.apache.pig.EvalFunc<T>
      extended by datafu.pig.util.ContextualEvalFunc<T>
          extended by datafu.pig.util.AliasableEvalFunc<org.apache.pig.data.DataBag>
              extended by datafu.pig.bags.BagLeftOuterJoin

public class BagLeftOuterJoin
extends AliasableEvalFunc<org.apache.pig.data.DataBag>

Performs an in-memory left outer join across multiple bags.

The format for invocation is BagLeftOuterJoin(bag, 'key',....). This UDF expects that all bags are non-null and that there is a corresponding key for each bag. The key that is expected is the alias of the key inside of the preceding bag.

Example: define BagLeftOuterJoin datafu.pig.bags.BagLeftOuterJoin(); -- describe data: -- data: {bag1: {(key1: chararray,value1: chararray)},bag2: {(key2: chararray,value2: int)}} bag_joined = FOREACH data GENERATE BagLeftOuterJoin(bag1, 'key1', bag2, 'key2') as joined; -- describe bag_joined: -- bag_joined: {joined: {(bag1::key1: chararray, bag1::value1: chararray, bag2::key2: chararray, bag2::value2: int)}}

Author:
wvaughan

Field Summary
 
Fields inherited from class org.apache.pig.EvalFunc
log, pigLogger, reporter, returnType
 
Constructor Summary
BagLeftOuterJoin()
           
 
Method Summary
 org.apache.pig.data.DataBag exec(org.apache.pig.data.Tuple input)
           
 org.apache.pig.impl.logicalLayer.schema.Schema getOutputSchema(org.apache.pig.impl.logicalLayer.schema.Schema input)
          Specify the output schema as in {link EvalFunc#outputSchema(Schema)}.
 
Methods inherited from class datafu.pig.util.AliasableEvalFunc
getBag, getBoolean, getDouble, getDouble, getFieldAliases, getFloat, getFloat, getInteger, getInteger, getLong, getLong, getObject, getPosition, getPosition, getPrefixedAliasName, getString, getString, outputSchema
 
Methods inherited from class datafu.pig.util.ContextualEvalFunc
getContextProperties, getInstanceName, getInstanceProperties, setUDFContextSignature
 
Methods inherited from class org.apache.pig.EvalFunc
finish, getArgToFuncMapping, getCacheFiles, getInputSchema, getLogger, getPigLogger, getReporter, getReturnType, getSchemaName, isAsynchronous, progress, setInputSchema, setPigLogger, setReporter, warn
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

BagLeftOuterJoin

public BagLeftOuterJoin()
Method Detail

exec

public org.apache.pig.data.DataBag exec(org.apache.pig.data.Tuple input)
                                 throws java.io.IOException
Specified by:
exec in class org.apache.pig.EvalFunc<org.apache.pig.data.DataBag>
Throws:
java.io.IOException

getOutputSchema

public org.apache.pig.impl.logicalLayer.schema.Schema getOutputSchema(org.apache.pig.impl.logicalLayer.schema.Schema input)
Description copied from class: AliasableEvalFunc
Specify the output schema as in {link EvalFunc#outputSchema(Schema)}.

Specified by:
getOutputSchema in class AliasableEvalFunc<org.apache.pig.data.DataBag>
Returns:
outputSchema


Matthew Hayes, Sam Shah