datafu.pig.sets
Class SetDifference

java.lang.Object
  extended by org.apache.pig.EvalFunc<org.apache.pig.data.DataBag>
      extended by datafu.pig.sets.SetOperationsBase
          extended by datafu.pig.sets.SetDifference

public class SetDifference
extends datafu.pig.sets.SetOperationsBase

Computes the set difference of two or more bags. Duplicates are eliminated. The input bags must be sorted.

If bags A and B are provided, then this computes A-B, i.e. all elements in A that are not in B. If bags A, B and C are provided, then this computes A-B-C, i.e. all elements in A that are not in B or C.

Example:

 define SetDifference datafu.pig.sets.SetDifference();

 -- input:
 -- ({(1),(2),(3),(4),(5),(6)},{(3),(4)})
 input = LOAD 'input' AS (B1:bag{T:tuple(val:int)},B2:bag{T:tuple(val:int)});

 input = FOREACH input {
   B1 = ORDER B1 BY val ASC;
   B2 = ORDER B2 BY val ASC;

   -- output:
   -- ({(1),(2),(5),(6)})
   GENERATE SetDifference(B1,B2);
 }
 


Field Summary
 
Fields inherited from class org.apache.pig.EvalFunc
log, pigLogger, reporter, returnType
 
Constructor Summary
SetDifference()
           
 
Method Summary
 int countMatches(java.util.PriorityQueue<datafu.pig.sets.SetDifference.Pair> pq)
          Counts how many elements in the priority queue match the element at the front of the queue, which should be from the first bag.
 org.apache.pig.data.DataBag exec(org.apache.pig.data.Tuple input)
           
 
Methods inherited from class datafu.pig.sets.SetOperationsBase
outputSchema
 
Methods inherited from class org.apache.pig.EvalFunc
finish, getArgToFuncMapping, getCacheFiles, getInputSchema, getLogger, getPigLogger, getReporter, getReturnType, getSchemaName, isAsynchronous, progress, setInputSchema, setPigLogger, setReporter, setUDFContextSignature, warn
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

SetDifference

public SetDifference()
Method Detail

countMatches

public int countMatches(java.util.PriorityQueue<datafu.pig.sets.SetDifference.Pair> pq)
Counts how many elements in the priority queue match the element at the front of the queue, which should be from the first bag.

Parameters:
pq - priority queue
Returns:
number of matches

exec

public org.apache.pig.data.DataBag exec(org.apache.pig.data.Tuple input)
                                 throws java.io.IOException
Specified by:
exec in class org.apache.pig.EvalFunc<org.apache.pig.data.DataBag>
Throws:
java.io.IOException


Matthew Hayes, Sam Shah