datafu.pig.stats
Class HyperLogLogPlusPlus

java.lang.Object
  extended by org.apache.pig.EvalFunc<T>
      extended by org.apache.pig.AccumulatorEvalFunc<java.lang.Long>
          extended by datafu.pig.stats.HyperLogLogPlusPlus
All Implemented Interfaces:
org.apache.pig.Accumulator<java.lang.Long>

public class HyperLogLogPlusPlus
extends org.apache.pig.AccumulatorEvalFunc<java.lang.Long>

A UDF that applies the HyperLogLog++ cardinality estimation algorithm.

This uses the implementation of HyperLogLog++ from stream-lib. The HyperLogLog++ algorithm is an enhanced version of HyperLogLog as described in here.

This is a streaming implementation, and therefore the input data does not need to be sorted.

Author:
mhayes

Field Summary
 
Fields inherited from class org.apache.pig.EvalFunc
log, pigLogger, reporter, returnType
 
Constructor Summary
HyperLogLogPlusPlus()
          Constructs a HyperLogLog++ estimator.
HyperLogLogPlusPlus(java.lang.String p)
          Constructs a HyperLogLog++ estimator.
 
Method Summary
 void accumulate(org.apache.pig.data.Tuple arg0)
           
 void cleanup()
           
 java.lang.Long getValue()
           
 org.apache.pig.impl.logicalLayer.schema.Schema outputSchema(org.apache.pig.impl.logicalLayer.schema.Schema input)
           
 
Methods inherited from class org.apache.pig.AccumulatorEvalFunc
exec
 
Methods inherited from class org.apache.pig.EvalFunc
finish, getArgToFuncMapping, getCacheFiles, getInputSchema, getLogger, getPigLogger, getReporter, getReturnType, getSchemaName, isAsynchronous, progress, setInputSchema, setPigLogger, setReporter, setUDFContextSignature, warn
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

HyperLogLogPlusPlus

public HyperLogLogPlusPlus()
Constructs a HyperLogLog++ estimator.


HyperLogLogPlusPlus

public HyperLogLogPlusPlus(java.lang.String p)
Constructs a HyperLogLog++ estimator.

Parameters:
p - precision value
Method Detail

accumulate

public void accumulate(org.apache.pig.data.Tuple arg0)
                throws java.io.IOException
Specified by:
accumulate in interface org.apache.pig.Accumulator<java.lang.Long>
Specified by:
accumulate in class org.apache.pig.AccumulatorEvalFunc<java.lang.Long>
Throws:
java.io.IOException

cleanup

public void cleanup()
Specified by:
cleanup in interface org.apache.pig.Accumulator<java.lang.Long>
Specified by:
cleanup in class org.apache.pig.AccumulatorEvalFunc<java.lang.Long>

getValue

public java.lang.Long getValue()
Specified by:
getValue in interface org.apache.pig.Accumulator<java.lang.Long>
Specified by:
getValue in class org.apache.pig.AccumulatorEvalFunc<java.lang.Long>

outputSchema

public org.apache.pig.impl.logicalLayer.schema.Schema outputSchema(org.apache.pig.impl.logicalLayer.schema.Schema input)
Overrides:
outputSchema in class org.apache.pig.EvalFunc<java.lang.Long>


Matthew Hayes, Sam Shah