Class SimpleEvalFunc<T>

  extended by org.apache.pig.EvalFunc<T>
      extended by datafu.pig.util.SimpleEvalFunc<T>
Direct Known Subclasses:
AppendToBag, BoolToInt, FirstTupleFromBag, HaversineDistInMiles, IntToBool, MD5, PrependToBag, Quantile, RandInt, ReverseEnumerate, SHA, UserAgentClassify, WilsonBinConf

public abstract class SimpleEvalFunc<T>
extends org.apache.pig.EvalFunc<T>

Uses reflection to makes writing simple wrapper Pig UDFs easier. For example, writing a simple string trimming UDF might look like this:

  public class TRIM extends EvalFunc<String> 
    public String exec(Tuple input) throws IOException 
      if (input.size() != 1)
        throw new IllegalArgumentException("requires a parameter");

      try {
        Object o = input.get(0);
        if (!(o instanceof String))
          throw new IllegalArgumentException("expected a string");

        String str = (String)o;
        return (str == null) ? null : str.trim();
      catch (Exception e) {
        throw WrappedIOException.wrap("error...", e);
There is a lot of boilerplate to check the number of arguments and the parameter types in the tuple. Instead, with this class, you can derive from SimpleEvalFunc and create a call() method (not exec!), just specifying the arguments as a regular function. The class handles all the argument checking and exception wrapping for you. So your code would be:
  public class TRIM2 extends SimpleEvalFunc<String> 
    public String call(String s)
      return (s != null) ? s.trim() : null;
An example of this UDF in action with Pig:
  grunt> a = load 'test' as (x:chararray, y:chararray); dump a;
    (1 , 2)

  grunt> b = foreach a generate TRIM2(x); dump b;

  grunt> c = foreach a generate TRIM2((int)x); dump c;
    datafu.pig.util.TRIM2(java.lang.String): argument type 
    mismatch [#1]; expected java.lang.String, got java.lang.Integer

  grunt> d = foreach a generate TRIM2(x, y); dump d;
    datafu.pig.util.TRIM2(java.lang.String): got 2 arguments, expected 1.

Field Summary
Fields inherited from class org.apache.pig.EvalFunc
log, pigLogger, reporter, returnType
Constructor Summary
Method Summary
 T exec( input)
 java.lang.reflect.Type getReturnType()
 org.apache.pig.impl.logicalLayer.schema.Schema outputSchema(org.apache.pig.impl.logicalLayer.schema.Schema inputSchema)
          Override outputSchema so we can verify the input schema at pig compile time, instead of runtime
Methods inherited from class org.apache.pig.EvalFunc
finish, getArgToFuncMapping, getCacheFiles, getInputSchema, getLogger, getPigLogger, getReporter, getSchemaName, isAsynchronous, progress, setInputSchema, setPigLogger, setReporter, setUDFContextSignature, warn
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Detail


public SimpleEvalFunc()
Method Detail


public java.lang.reflect.Type getReturnType()
getReturnType in class org.apache.pig.EvalFunc<T>


public T exec( input)
Specified by:
exec in class org.apache.pig.EvalFunc<T>


public org.apache.pig.impl.logicalLayer.schema.Schema outputSchema(org.apache.pig.impl.logicalLayer.schema.Schema inputSchema)
Override outputSchema so we can verify the input schema at pig compile time, instead of runtime

outputSchema in class org.apache.pig.EvalFunc<T>
inputSchema - input schema
call to super.outputSchema in case schema was defined elsewhere

Matthew Hayes, Sam Shah