datafu.pig.sets
Class SetIntersect
java.lang.Object
org.apache.pig.EvalFunc<org.apache.pig.data.DataBag>
datafu.pig.sets.SetOperationsBase
datafu.pig.sets.SetIntersect
public class SetIntersect
- extends datafu.pig.sets.SetOperationsBase
Computes the set intersection of two or more bags. Duplicates are eliminated. The input bags must be sorted.
Example:
define SetIntersect datafu.pig.sets.SetIntersect();
-- input:
-- ({(1,10),(2,20),(3,30),(4,40)},{(2,20),(4,40),(8,80)})
input = LOAD 'input' AS (B1:bag{T:tuple(val1:int,val2:int)},B2:bag{T:tuple(val1:int,val2:int)});
input = FOREACH input {
B1 = ORDER B1 BY val1 ASC, val2 ASC;
B2 = ORDER B2 BY val1 ASC, val2 ASC;
-- output:
-- ({(2,20),(4,40)})
GENERATE SetIntersect(B1,B2);
}
Fields inherited from class org.apache.pig.EvalFunc |
log, pigLogger, reporter, returnType |
Method Summary |
boolean |
all_equal(java.util.PriorityQueue<datafu.pig.sets.SetIntersect.pair> pq)
|
org.apache.pig.data.DataBag |
exec(org.apache.pig.data.Tuple input)
|
Methods inherited from class datafu.pig.sets.SetOperationsBase |
outputSchema |
Methods inherited from class org.apache.pig.EvalFunc |
finish, getArgToFuncMapping, getCacheFiles, getInputSchema, getLogger, getPigLogger, getReporter, getReturnType, getSchemaName, isAsynchronous, progress, setInputSchema, setPigLogger, setReporter, setUDFContextSignature, warn |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
SetIntersect
public SetIntersect()
all_equal
public boolean all_equal(java.util.PriorityQueue<datafu.pig.sets.SetIntersect.pair> pq)
exec
public org.apache.pig.data.DataBag exec(org.apache.pig.data.Tuple input)
throws java.io.IOException
- Specified by:
exec
in class org.apache.pig.EvalFunc<org.apache.pig.data.DataBag>
- Throws:
java.io.IOException
Matthew Hayes, Sam Shah