datafu.pig.bags
Class BagSplit

java.lang.Object
  extended by org.apache.pig.EvalFunc<org.apache.pig.data.DataBag>
      extended by datafu.pig.bags.BagSplit

public class BagSplit
extends org.apache.pig.EvalFunc<org.apache.pig.data.DataBag>

Splits a bag of tuples into a bag of bags, where the inner bags collectively contain the tuples from the original bag. This can be used to split a bag into a set of smaller bags.

Example:

 define BagSplit datafu.pig.bags.BagSplit();
 
 -- input:
 -- ({(1),(2),(3),(4),(5),(6),(7)})
 -- ({(1),(2),(3),(4),(5)})
 -- ({(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11)})
 input = LOAD 'input' AS (B:bag{T:tuple(val1:INT,val2:INT)});
 
 -- ouput:
 -- ({{(1),(2),(3),(4),(5)},{(6),(7)}})
 -- ({{(1),(2),(3),(4),(5)},{(6),(7),(8),(9),(10)},{(11)}})
 output = FOREACH input GENERATE BagSplit(5,B);
 
 


Field Summary
 
Fields inherited from class org.apache.pig.EvalFunc
log, pigLogger, reporter, returnType
 
Constructor Summary
BagSplit()
           
BagSplit(java.lang.String appendBagNum)
           
 
Method Summary
 org.apache.pig.data.DataBag exec(org.apache.pig.data.Tuple arg0)
           
 org.apache.pig.impl.logicalLayer.schema.Schema outputSchema(org.apache.pig.impl.logicalLayer.schema.Schema input)
           
 
Methods inherited from class org.apache.pig.EvalFunc
finish, getArgToFuncMapping, getCacheFiles, getInputSchema, getLogger, getPigLogger, getReporter, getReturnType, getSchemaName, isAsynchronous, progress, setInputSchema, setPigLogger, setReporter, setUDFContextSignature, warn
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

BagSplit

public BagSplit()

BagSplit

public BagSplit(java.lang.String appendBagNum)
Method Detail

exec

public org.apache.pig.data.DataBag exec(org.apache.pig.data.Tuple arg0)
                                 throws java.io.IOException
Specified by:
exec in class org.apache.pig.EvalFunc<org.apache.pig.data.DataBag>
Throws:
java.io.IOException

outputSchema

public org.apache.pig.impl.logicalLayer.schema.Schema outputSchema(org.apache.pig.impl.logicalLayer.schema.Schema input)
Overrides:
outputSchema in class org.apache.pig.EvalFunc<org.apache.pig.data.DataBag>


Matthew Hayes, Sam Shah