datafu.pig.sets
Class SetDifference
java.lang.Object
org.apache.pig.EvalFunc<org.apache.pig.data.DataBag>
datafu.pig.sets.SetOperationsBase
datafu.pig.sets.SetDifference
public class SetDifference
- extends datafu.pig.sets.SetOperationsBase
Computes the set difference of two or more bags. Duplicates are eliminated. The input bags must be sorted.
If bags A and B are provided, then this computes A-B, i.e. all elements in A that are not in B.
If bags A, B and C are provided, then this computes A-B-C, i.e. all elements in A that are not in B or C.
Example:
define SetDifference datafu.pig.sets.SetDifference();
-- input:
-- ({(1),(2),(3),(4),(5),(6)},{(3),(4)})
input = LOAD 'input' AS (B1:bag{T:tuple(val:int)},B2:bag{T:tuple(val:int)});
input = FOREACH input {
B1 = ORDER B1 BY val ASC;
B2 = ORDER B2 BY val ASC;
-- output:
-- ({(1),(2),(5),(6)})
GENERATE SetDifference(B1,B2);
}
Fields inherited from class org.apache.pig.EvalFunc |
log, pigLogger, reporter, returnType |
Method Summary |
int |
countMatches(java.util.PriorityQueue<datafu.pig.sets.SetDifference.Pair> pq)
Counts how many elements in the priority queue match the
element at the front of the queue, which should be from the first bag. |
org.apache.pig.data.DataBag |
exec(org.apache.pig.data.Tuple input)
|
Methods inherited from class datafu.pig.sets.SetOperationsBase |
outputSchema |
Methods inherited from class org.apache.pig.EvalFunc |
finish, getArgToFuncMapping, getCacheFiles, getInputSchema, getLogger, getPigLogger, getReporter, getReturnType, getSchemaName, isAsynchronous, progress, setInputSchema, setPigLogger, setReporter, setUDFContextSignature, warn |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
SetDifference
public SetDifference()
countMatches
public int countMatches(java.util.PriorityQueue<datafu.pig.sets.SetDifference.Pair> pq)
- Counts how many elements in the priority queue match the
element at the front of the queue, which should be from the first bag.
- Parameters:
pq
- priority queue
- Returns:
- number of matches
exec
public org.apache.pig.data.DataBag exec(org.apache.pig.data.Tuple input)
throws java.io.IOException
- Specified by:
exec
in class org.apache.pig.EvalFunc<org.apache.pig.data.DataBag>
- Throws:
java.io.IOException
Matthew Hayes, Sam Shah