datafu.pig.bags
Class ReverseEnumerate

java.lang.Object
  extended by org.apache.pig.EvalFunc<T>
      extended by datafu.pig.util.SimpleEvalFunc<org.apache.pig.data.DataBag>
          extended by datafu.pig.bags.ReverseEnumerate

public class ReverseEnumerate
extends SimpleEvalFunc<org.apache.pig.data.DataBag>

Enumerate a bag, appending to each tuple its index within the bag, with indices being produced in descending order.

For example:

   {(A),(B),(C),(D)} => {(A,3),(B,2),(C,1),(D,0)}
 
The first constructor parameter (optional) dictates the starting index of the counting. As the UDF requires the size of the bag for reverse counting, this UDF does not implement the accumulator interface and suffers from the slight performance penalty of DataBag materialization.

Example:

 define ReverseEnumerate datafu.pig.bags.ReverseEnumerate('1');

 -- input:
 -- ({(100),(200),(300),(400)})
 input = LOAD 'input' as (B: bag{T: tuple(v2:INT)});

 -- output:
 -- ({(100,4),(200,3),(300,2),(400,1)})
 output = FOREACH input GENERATE ReverseEnumerate(B);
 
 


Field Summary
 
Fields inherited from class org.apache.pig.EvalFunc
log, pigLogger, reporter, returnType
 
Constructor Summary
ReverseEnumerate()
           
ReverseEnumerate(java.lang.String start)
           
 
Method Summary
 org.apache.pig.data.DataBag call(org.apache.pig.data.DataBag inputBag)
           
 org.apache.pig.impl.logicalLayer.schema.Schema outputSchema(org.apache.pig.impl.logicalLayer.schema.Schema input)
          Override outputSchema so we can verify the input schema at pig compile time, instead of runtime
 
Methods inherited from class datafu.pig.util.SimpleEvalFunc
exec, getReturnType
 
Methods inherited from class org.apache.pig.EvalFunc
finish, getArgToFuncMapping, getCacheFiles, getInputSchema, getLogger, getPigLogger, getReporter, getSchemaName, isAsynchronous, progress, setInputSchema, setPigLogger, setReporter, setUDFContextSignature, warn
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

ReverseEnumerate

public ReverseEnumerate()

ReverseEnumerate

public ReverseEnumerate(java.lang.String start)
Method Detail

call

public org.apache.pig.data.DataBag call(org.apache.pig.data.DataBag inputBag)
                                 throws java.io.IOException
Throws:
java.io.IOException

outputSchema

public org.apache.pig.impl.logicalLayer.schema.Schema outputSchema(org.apache.pig.impl.logicalLayer.schema.Schema input)
Description copied from class: SimpleEvalFunc
Override outputSchema so we can verify the input schema at pig compile time, instead of runtime

Overrides:
outputSchema in class SimpleEvalFunc<org.apache.pig.data.DataBag>
Parameters:
input - input schema
Returns:
call to super.outputSchema in case schema was defined elsewhere


Matthew Hayes, Sam Shah