In some cases you don't need exact results. Estimates may be sufficient if it results in more efficient execution. With this in mind Apache DataFu has UDFs for computing estimates of certain quantities.
StreamingMedian and StreamingQuantile can compute estimates of the median and quantiles of bags. The advantage of these methods is they do not require the input bags to be sorted. See Statistics for more details.