datafu.pig.bags
Class BagConcat

java.lang.Object
  extended by org.apache.pig.EvalFunc<org.apache.pig.data.DataBag>
      extended by datafu.pig.bags.BagConcat

public class BagConcat
extends org.apache.pig.EvalFunc<org.apache.pig.data.DataBag>

Unions all input bags to produce a single bag containing all tuples.

This UDF accepts two forms of input:

  1. a tuple of 2 or more elements where each element is a bag with the same schema
  2. a single bag where each element of that bag is a bag and all of these bags have the same schema

Example 1:

 define BagConcat datafu.pig.bags.BagConcat();
 -- This example illustrates the use on a tuple of bags
 
 -- input:
 -- ({(1),(2),(3)},{(3),(4),(5)})
 -- ({(20),(25)},{(40),(50)})
 input = LOAD 'input' AS (A: bag{T: tuple(v:INT)}, B: bag{T: tuple(v:INT)});
 
 -- output:
 -- ({(1),(2),(3),(3),(4),(5)})
 -- ({(20),(25),(40),(50)})
 output = FOREACH input GENERATE BagConcat(A,B); 
 
 

Example 2:

 define BagConcat datafu.pig.bags.BagConcat();
 -- This example illustrates the use on a bag of bags
 
 -- input:
 -- ({({(1),(2),(3)}),({(3),(4),(5)})})
 -- ({({(20),(25)}),({(40),(50)})})
 input = LOAD 'input' AS (A: bag{T: tuple(bag{T2: tuple(v:INT)})});
 
 -- output:
 -- ({(1),(2),(3),(3),(4),(5)})
 -- ({(20),(25),(40),(50)})
 output = FOREACH input GENERATE BagConcat(A);
 
 

Author:
wvaughan

Field Summary
 
Fields inherited from class org.apache.pig.EvalFunc
log, pigLogger, reporter, returnType
 
Constructor Summary
BagConcat()
           
 
Method Summary
 org.apache.pig.data.DataBag exec(org.apache.pig.data.Tuple input)
           
 org.apache.pig.impl.logicalLayer.schema.Schema outputSchema(org.apache.pig.impl.logicalLayer.schema.Schema input)
           
 
Methods inherited from class org.apache.pig.EvalFunc
finish, getArgToFuncMapping, getCacheFiles, getInputSchema, getLogger, getPigLogger, getReporter, getReturnType, getSchemaName, isAsynchronous, progress, setInputSchema, setPigLogger, setReporter, setUDFContextSignature, warn
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

BagConcat

public BagConcat()
Method Detail

exec

public org.apache.pig.data.DataBag exec(org.apache.pig.data.Tuple input)
                                 throws java.io.IOException
Specified by:
exec in class org.apache.pig.EvalFunc<org.apache.pig.data.DataBag>
Throws:
java.io.IOException

outputSchema

public org.apache.pig.impl.logicalLayer.schema.Schema outputSchema(org.apache.pig.impl.logicalLayer.schema.Schema input)
Overrides:
outputSchema in class org.apache.pig.EvalFunc<org.apache.pig.data.DataBag>


Matthew Hayes, Sam Shah