public class CountDistinctUpTo
extends org.apache.pig.AccumulatorEvalFunc<java.lang.Integer>
implements org.apache.pig.Algebraic
DEFINE CountDistinctUpTo10 datafu.pig.bags.CountDistinctUpTo('10');
DEFINE CountDistinctUpTo3 datafu.pig.bags.CountDistinctUpTo('3');
-- input:
-- {(A),(B),(D),(A),(C),(E),(A),(B),(A),(B)}
input = LOAD 'input' AS (B: bag {T: tuple(alpha:CHARARRAY)});
-- output:
-- (5)
output = FOREACH input GENERATE CountDistinctUpTo10(B);
-- output2:
-- (3)
output2 = FOREACH input GENERATE CountDistinctUpTo3(B);
Modifier and Type | Class and Description |
---|---|
static class |
CountDistinctUpTo.Final
Receives output either from initial results or intermediate
Outputs an integer with the number of distinct tuples, up to the maximum desired.
|
static class |
CountDistinctUpTo.Initial
Outputs a tuple containing a DataBag containing a single tuple T (the original schema) or an empty bag
T -> ({T})
|
static class |
CountDistinctUpTo.Intermediate
Receives a bag of bags, each containing a single tuple with the original input schema T
Outputs a bag of distinct tuples each with the original schema T: {({T}),({T}),({T})} -> ({T, T, T})
or if the maximum is reached, null: {({T}),({T}),({T}) ..} -> (null)
|
Constructor and Description |
---|
CountDistinctUpTo(java.lang.String maxAmount) |
Modifier and Type | Method and Description |
---|---|
void |
accumulate(org.apache.pig.data.Tuple tuple) |
void |
cleanup() |
java.lang.String |
getFinal() |
java.lang.String |
getInitial() |
java.lang.String |
getIntermed() |
java.lang.Integer |
getValue() |
org.apache.pig.impl.logicalLayer.schema.Schema |
outputSchema(org.apache.pig.impl.logicalLayer.schema.Schema input) |
allowCompileTimeCalculation, finish, getArgToFuncMapping, getCacheFiles, getInputSchema, getLogger, getPigLogger, getReporter, getReturnType, getSchemaName, getSchemaType, getShipFiles, isAsynchronous, progress, setInputSchema, setPigLogger, setReporter, setUDFContextSignature, warn
public void accumulate(org.apache.pig.data.Tuple tuple) throws java.io.IOException
accumulate
in interface org.apache.pig.Accumulator<java.lang.Integer>
accumulate
in class org.apache.pig.AccumulatorEvalFunc<java.lang.Integer>
java.io.IOException
public void cleanup()
cleanup
in interface org.apache.pig.Accumulator<java.lang.Integer>
cleanup
in class org.apache.pig.AccumulatorEvalFunc<java.lang.Integer>
public java.lang.Integer getValue()
getValue
in interface org.apache.pig.Accumulator<java.lang.Integer>
getValue
in class org.apache.pig.AccumulatorEvalFunc<java.lang.Integer>
public java.lang.String getInitial()
getInitial
in interface org.apache.pig.Algebraic
public java.lang.String getIntermed()
getIntermed
in interface org.apache.pig.Algebraic
public java.lang.String getFinal()
getFinal
in interface org.apache.pig.Algebraic
public org.apache.pig.impl.logicalLayer.schema.Schema outputSchema(org.apache.pig.impl.logicalLayer.schema.Schema input)
outputSchema
in class org.apache.pig.EvalFunc<java.lang.Integer>