datafu.pig.util
Class SimpleEvalFunc<T>
java.lang.Object
org.apache.pig.EvalFunc<T>
datafu.pig.util.SimpleEvalFunc<T>
- Direct Known Subclasses:
- AppendToBag, BoolToInt, FirstTupleFromBag, HaversineDistInMiles, IntToBool, MD5, PrependToBag, Quantile, RandInt, ReverseEnumerate, SHA, UserAgentClassify, WilsonBinConf
public abstract class SimpleEvalFunc<T>
- extends org.apache.pig.EvalFunc<T>
Uses reflection to makes writing simple wrapper Pig UDFs easier.
For example, writing a simple string trimming UDF might look like
this:
public class TRIM extends EvalFunc<String>
{
public String exec(Tuple input) throws IOException
{
if (input.size() != 1)
throw new IllegalArgumentException("requires a parameter");
try {
Object o = input.get(0);
if (!(o instanceof String))
throw new IllegalArgumentException("expected a string");
String str = (String)o;
return (str == null) ? null : str.trim();
}
catch (Exception e) {
throw WrappedIOException.wrap("error...", e);
}
}
}
There is a lot of boilerplate to check the number of arguments and
the parameter types in the tuple.
Instead, with this class, you can derive from SimpleEvalFunc and
create a call()
method (not exec!), just specifying the
arguments as a regular function. The class handles all the argument
checking and exception wrapping for you. So your code would be:
public class TRIM2 extends SimpleEvalFunc<String>
{
public String call(String s)
{
return (s != null) ? s.trim() : null;
}
}
An example of this UDF in action with Pig:
grunt> a = load 'test' as (x:chararray, y:chararray); dump a;
(1 , 2)
grunt> b = foreach a generate TRIM2(x); dump b;
(1)
grunt> c = foreach a generate TRIM2((int)x); dump c;
datafu.pig.util.TRIM2(java.lang.String): argument type
mismatch [#1]; expected java.lang.String, got java.lang.Integer
grunt> d = foreach a generate TRIM2(x, y); dump d;
datafu.pig.util.TRIM2(java.lang.String): got 2 arguments, expected 1.
Fields inherited from class org.apache.pig.EvalFunc |
log, pigLogger, reporter, returnType |
Method Summary |
T |
exec(org.apache.pig.data.Tuple input)
|
java.lang.reflect.Type |
getReturnType()
|
org.apache.pig.impl.logicalLayer.schema.Schema |
outputSchema(org.apache.pig.impl.logicalLayer.schema.Schema inputSchema)
Override outputSchema so we can verify the input schema at pig compile time, instead of runtime |
Methods inherited from class org.apache.pig.EvalFunc |
finish, getArgToFuncMapping, getCacheFiles, getInputSchema, getLogger, getPigLogger, getReporter, getSchemaName, isAsynchronous, progress, setInputSchema, setPigLogger, setReporter, setUDFContextSignature, warn |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
SimpleEvalFunc
public SimpleEvalFunc()
getReturnType
public java.lang.reflect.Type getReturnType()
- Overrides:
getReturnType
in class org.apache.pig.EvalFunc<T>
exec
public T exec(org.apache.pig.data.Tuple input)
throws java.io.IOException
- Specified by:
exec
in class org.apache.pig.EvalFunc<T>
- Throws:
java.io.IOException
outputSchema
public org.apache.pig.impl.logicalLayer.schema.Schema outputSchema(org.apache.pig.impl.logicalLayer.schema.Schema inputSchema)
- Override outputSchema so we can verify the input schema at pig compile time, instead of runtime
- Overrides:
outputSchema
in class org.apache.pig.EvalFunc<T>
- Parameters:
inputSchema
- input schema
- Returns:
- call to super.outputSchema in case schema was defined elsewhere
Matthew Hayes, Sam Shah