Apache DataFu Pig - Guide


In some cases you don't need exact results. Estimates may be sufficient if it results in more efficient execution. With this in mind Apache DataFu has UDFs for computing estimates of certain quantities.

Median and Quantiles

StreamingMedian and StreamingQuantile can compute estimates of the median and quantiles of bags. The advantage of these methods is they do not require the input bags to be sorted. See Statistics for more details.