Class BagGroup

  extended by org.apache.pig.EvalFunc<T>
      extended by datafu.pig.util.ContextualEvalFunc<T>
          extended by datafu.pig.util.AliasableEvalFunc<>
              extended by datafu.pig.bags.BagGroup

public class BagGroup
extends AliasableEvalFunc<>

Performs an in-memory group operation on a bag. The first argument is the bag. The second argument is a projection of that bag to the keys to group by.

The following example groups input_bag by k. The output is a bag having tuples consisting of the group key k and a bag with the corresponding (k,v) tuples from input_bag for that key.

 define BagGroup datafu.pig.bags.BagGroup();

 data = LOAD 'input' AS (input_bag: bag {T: tuple(k: int, v: chararray)});
 -- ({(1,A),(1,B),(2,A),(2,B),(2,C),(3,A)})

 -- Group input_bag by k
 data2 = FOREACH data GENERATE BagGroup(input_bag, input_bag.(k)) as grouped;
 -- data2: {grouped: {(group: int,input_bag: {T: (k: int,v: chararray)})}}
 -- ({(1,{(1,A),(1,B)}),(2,{(2,A),(2,B),(2,C)}),(3,{(3,A)})})

If the key k is not needed within the input_bag for the output, it can be projected out like so:

 data3 = FOREACH data2 {
   -- project only the value
   projected = FOREACH grouped GENERATE group, input_bag.(v);
   GENERATE projected as grouped;

 -- data3: {grouped: {(group: int,input_bag: {T: (k: int,v: chararray)})}}
 -- ({(1,{(A),(B)}),(2,{(A),(B),(C)}),(3,{(A)})})

Field Summary
Fields inherited from class org.apache.pig.EvalFunc
log, pigLogger, reporter, returnType
Constructor Summary
Method Summary exec( input)
 org.apache.pig.impl.logicalLayer.schema.Schema getOutputSchema(org.apache.pig.impl.logicalLayer.schema.Schema input)
          Specify the output schema as in {link EvalFunc#outputSchema(Schema)}.
Methods inherited from class datafu.pig.util.AliasableEvalFunc
getBag, getBoolean, getDouble, getDouble, getFieldAliases, getFloat, getFloat, getInteger, getInteger, getLong, getLong, getObject, getPosition, getPosition, getPrefixedAliasName, getString, getString, outputSchema
Methods inherited from class datafu.pig.util.ContextualEvalFunc
getContextProperties, getInstanceName, getInstanceProperties, setUDFContextSignature
Methods inherited from class org.apache.pig.EvalFunc
finish, getArgToFuncMapping, getCacheFiles, getInputSchema, getLogger, getPigLogger, getReporter, getReturnType, getSchemaName, isAsynchronous, progress, setInputSchema, setPigLogger, setReporter, warn
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Detail


public BagGroup()
Method Detail


public org.apache.pig.impl.logicalLayer.schema.Schema getOutputSchema(org.apache.pig.impl.logicalLayer.schema.Schema input)
Description copied from class: AliasableEvalFunc
Specify the output schema as in {link EvalFunc#outputSchema(Schema)}.

Specified by:
getOutputSchema in class AliasableEvalFunc<>


public exec( input)
Specified by:
exec in class org.apache.pig.EvalFunc<>