datafu.pig.bags
Class BagLeftOuterJoin
java.lang.Object
org.apache.pig.EvalFunc<T>
datafu.pig.util.ContextualEvalFunc<T>
datafu.pig.util.AliasableEvalFunc<org.apache.pig.data.DataBag>
datafu.pig.bags.BagLeftOuterJoin
public class BagLeftOuterJoin
- extends AliasableEvalFunc<org.apache.pig.data.DataBag>
Performs an in-memory left outer join across multiple bags.
The format for invocation is BagLeftOuterJoin(bag, 'key',....).
This UDF expects that all bags are non-null and that there is a corresponding key for each bag.
The key that is expected is the alias of the key inside of the preceding bag.
Example:
define BagLeftOuterJoin datafu.pig.bags.BagLeftOuterJoin();
-- describe data:
-- data: {bag1: {(key1: chararray,value1: chararray)},bag2: {(key2: chararray,value2: int)}}
bag_joined = FOREACH data GENERATE BagLeftOuterJoin(bag1, 'key1', bag2, 'key2') as joined;
-- describe bag_joined:
-- bag_joined: {joined: {(bag1::key1: chararray, bag1::value1: chararray, bag2::key2: chararray, bag2::value2: int)}}
- Author:
- wvaughan
Fields inherited from class org.apache.pig.EvalFunc |
log, pigLogger, reporter, returnType |
Method Summary |
org.apache.pig.data.DataBag |
exec(org.apache.pig.data.Tuple input)
|
org.apache.pig.impl.logicalLayer.schema.Schema |
getOutputSchema(org.apache.pig.impl.logicalLayer.schema.Schema input)
Specify the output schema as in {link EvalFunc#outputSchema(Schema)}. |
Methods inherited from class datafu.pig.util.AliasableEvalFunc |
getBag, getBoolean, getDouble, getDouble, getFieldAliases, getFloat, getFloat, getInteger, getInteger, getLong, getLong, getObject, getPosition, getPosition, getPrefixedAliasName, getString, getString, outputSchema |
Methods inherited from class org.apache.pig.EvalFunc |
finish, getArgToFuncMapping, getCacheFiles, getInputSchema, getLogger, getPigLogger, getReporter, getReturnType, getSchemaName, isAsynchronous, progress, setInputSchema, setPigLogger, setReporter, warn |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
BagLeftOuterJoin
public BagLeftOuterJoin()
exec
public org.apache.pig.data.DataBag exec(org.apache.pig.data.Tuple input)
throws java.io.IOException
- Specified by:
exec
in class org.apache.pig.EvalFunc<org.apache.pig.data.DataBag>
- Throws:
java.io.IOException
getOutputSchema
public org.apache.pig.impl.logicalLayer.schema.Schema getOutputSchema(org.apache.pig.impl.logicalLayer.schema.Schema input)
- Description copied from class:
AliasableEvalFunc
- Specify the output schema as in {link EvalFunc#outputSchema(Schema)}.
- Specified by:
getOutputSchema
in class AliasableEvalFunc<org.apache.pig.data.DataBag>
- Returns:
- outputSchema
Matthew Hayes, Sam Shah