datafu.pig.stats
Class StreamingMedian

java.lang.Object
  extended by org.apache.pig.EvalFunc<T>
      extended by org.apache.pig.AccumulatorEvalFunc<org.apache.pig.data.Tuple>
          extended by datafu.pig.stats.StreamingQuantile
              extended by datafu.pig.stats.StreamingMedian
All Implemented Interfaces:
org.apache.pig.Accumulator<org.apache.pig.data.Tuple>

public class StreamingMedian
extends StreamingQuantile

Computes the approximate median for a (not necessarily sorted) input bag, using the Munro-Paterson algorithm. This is a convenience wrapper around StreamingQuantile.

N.B., all the data is pushed to a single reducer per key, so make sure some partitioning is done (e.g., group by 'day') if the data is too large. That is, this isn't distributed median.

See Also:
StreamingQuantile

Field Summary
 
Fields inherited from class org.apache.pig.EvalFunc
log, pigLogger, reporter, returnType
 
Constructor Summary
StreamingMedian()
           
 
Method Summary
 
Methods inherited from class datafu.pig.stats.StreamingQuantile
accumulate, cleanup, getValue, outputSchema
 
Methods inherited from class org.apache.pig.AccumulatorEvalFunc
exec
 
Methods inherited from class org.apache.pig.EvalFunc
finish, getArgToFuncMapping, getCacheFiles, getInputSchema, getLogger, getPigLogger, getReporter, getReturnType, getSchemaName, isAsynchronous, progress, setInputSchema, setPigLogger, setReporter, setUDFContextSignature, warn
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

StreamingMedian

public StreamingMedian()


Matthew Hayes, Sam Shah