datafu.pig.bags
Class DistinctBy
java.lang.Object
org.apache.pig.EvalFunc<T>
org.apache.pig.AccumulatorEvalFunc<org.apache.pig.data.DataBag>
datafu.pig.bags.DistinctBy
- All Implemented Interfaces:
- org.apache.pig.Accumulator<org.apache.pig.data.DataBag>
public class DistinctBy
- extends org.apache.pig.AccumulatorEvalFunc<org.apache.pig.data.DataBag>
Get distinct elements in a bag by a given set of field positions.
The input and output schemas will be identical.
The first tuple containing each distinct combination of these fields will be taken.
This operation is order preserving. If both A and B appear in the output,
and A appears before B in the input, then A will appear before B in the output.
Example:
define DistinctBy datafu.pig.bags.DistinctBy('0');
-- input:
-- ({(a, 1),(a,1),(b, 2),(b,22),(c, 3),(d, 4)})
input = LOAD 'input' AS (B: bag {T: tuple(alpha:CHARARRAY, numeric:INT)});
output = FOREACH input GENERATE DistinctBy(B);
-- output:
-- ({(a,1),(b,2),(c,3),(d,4)})
Fields inherited from class org.apache.pig.EvalFunc |
log, pigLogger, reporter, returnType |
Constructor Summary |
DistinctBy(java.lang.String... fields)
|
Method Summary |
void |
accumulate(org.apache.pig.data.Tuple input)
|
void |
cleanup()
|
org.apache.pig.data.DataBag |
getValue()
|
org.apache.pig.impl.logicalLayer.schema.Schema |
outputSchema(org.apache.pig.impl.logicalLayer.schema.Schema input)
|
Methods inherited from class org.apache.pig.AccumulatorEvalFunc |
exec |
Methods inherited from class org.apache.pig.EvalFunc |
finish, getArgToFuncMapping, getCacheFiles, getInputSchema, getLogger, getPigLogger, getReporter, getReturnType, getSchemaName, isAsynchronous, progress, setInputSchema, setPigLogger, setReporter, setUDFContextSignature, warn |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
DistinctBy
public DistinctBy(java.lang.String... fields)
accumulate
public void accumulate(org.apache.pig.data.Tuple input)
throws java.io.IOException
- Specified by:
accumulate
in interface org.apache.pig.Accumulator<org.apache.pig.data.DataBag>
- Specified by:
accumulate
in class org.apache.pig.AccumulatorEvalFunc<org.apache.pig.data.DataBag>
- Throws:
java.io.IOException
cleanup
public void cleanup()
- Specified by:
cleanup
in interface org.apache.pig.Accumulator<org.apache.pig.data.DataBag>
- Specified by:
cleanup
in class org.apache.pig.AccumulatorEvalFunc<org.apache.pig.data.DataBag>
getValue
public org.apache.pig.data.DataBag getValue()
- Specified by:
getValue
in interface org.apache.pig.Accumulator<org.apache.pig.data.DataBag>
- Specified by:
getValue
in class org.apache.pig.AccumulatorEvalFunc<org.apache.pig.data.DataBag>
outputSchema
public org.apache.pig.impl.logicalLayer.schema.Schema outputSchema(org.apache.pig.impl.logicalLayer.schema.Schema input)
- Overrides:
outputSchema
in class org.apache.pig.EvalFunc<org.apache.pig.data.DataBag>
Matthew Hayes, Sam Shah