Package datafu.pig.sampling

Sampling UDFs, including weighted sample, reservoir sampling, sampling by key, etc.

See:
          Description

Class Summary
ReservoirSample Performs a simple random sample using an in-memory reservoir to produce a uniformly random sample of a given size.
ReservoirSample.Final  
ReservoirSample.Initial  
ReservoirSample.Intermediate  
SampleByKey Provides a way of sampling tuples based on certain fields.
SimpleRandomSample Scalable simple random sampling.
SimpleRandomSample.Final  
SimpleRandomSample.Initial  
SimpleRandomSample.Intermediate  
WeightedSample Performs weighted bernoulli sampling on a bag.
 

Package datafu.pig.sampling Description

Sampling UDFs, including weighted sample, reservoir sampling, sampling by key, etc.



Matthew Hayes, Sam Shah