Class BagGroup

  extended by org.apache.pig.EvalFunc<T>
      extended by datafu.pig.util.ContextualEvalFunc<T>
          extended by datafu.pig.util.AliasableEvalFunc<>
              extended by datafu.pig.bags.BagGroup

public class BagGroup
extends AliasableEvalFunc<>

Performs an in-memory group operation on a bag. The first argument is the bag. The second argument is a projection of that bag to the keys to group by.

The following example groups input_bag by k. The output is a bag having tuples consisting of the group key k and a bag with the corresponding (k,v) tuples from input_bag for that key.

 define BagGroup datafu.pig.bags.BagGroup();

 data = LOAD 'input' AS (input_bag: bag {T: tuple(k: int, v: chararray)});
 -- ({(1,A),(1,B),(2,A),(2,B),(2,C),(3,A)})

 -- Group input_bag by k
 data2 = FOREACH data GENERATE BagGroup(input_bag, input_bag.(k)) as grouped;
 -- data2: {grouped: {(group: int,input_bag: {T: (k: int,v: chararray)})}}
 -- ({(1,{(1,A),(1,B)}),(2,{(2,A),(2,B),(2,C)}),(3,{(3,A)})})

If the key k is not needed within the input_bag for the output, it can be projected out like so:

 data3 = FOREACH data2 {
   -- project only the value
   projected = FOREACH grouped GENERATE group, input_bag.(v);
   GENERATE projected as grouped;

 -- data3: {grouped: {(group: int,input_bag: {T: (k: int,v: chararray)})}}
 -- ({(1,{(A),(B)}),(2,{(A),(B),(C)}),(3,{(A)})})

Field Summary
Constructor Summary
Method Summary exec( input)
 org.apache.pig.impl.logicalLayer.schema.Schema getOutputSchema(org.apache.pig.impl.logicalLayer.schema.Schema input)
          Specify the output schema as in {link EvalFunc#outputSchema(Schema)}.
Constructor Detail


public BagGroup()
Method Detail


public org.apache.pig.impl.logicalLayer.schema.Schema getOutputSchema(org.apache.pig.impl.logicalLayer.schema.Schema input)
Description copied from class: AliasableEvalFunc
Specify the output schema as in {link EvalFunc#outputSchema(Schema)}.

Specified by:
getOutputSchema in class AliasableEvalFunc<>


public exec( input)
Specified by:
exec in class org.apache.pig.EvalFunc<>