Package datafu.pig.sampling

Sampling UDFs, including weighted sample, reservoir sampling, sampling by key, etc.

See:
          Description

Class Summary
ReservoirSample Performs a simple random sample using an in-memory reservoir to produce a uniformly random sample of a given size.
ReservoirSample.Final  
ReservoirSample.Initial  
ReservoirSample.Intermediate  
SampleByKey Provides a way of sampling tuples based on certain fields.
SimpleRandomSample Scalable simple random sampling.
SimpleRandomSample.Final  
SimpleRandomSample.Initial  
SimpleRandomSample.Intermediate  
SimpleRandomSampleWithReplacementElect Select the candidate with the smallest score for each position from the candidates proposed by SimpleRandomSampleWithReplacementVote.
SimpleRandomSampleWithReplacementElect.Final  
SimpleRandomSampleWithReplacementElect.Initial  
SimpleRandomSampleWithReplacementElect.Intermediate  
SimpleRandomSampleWithReplacementVote Scalable simple random sampling with replacement (ScaSRSWR).
WeightedSample Performs weighted bernoulli sampling on a bag.
 

Package datafu.pig.sampling Description

Sampling UDFs, including weighted sample, reservoir sampling, sampling by key, etc.



Matthew Hayes, Sam Shah