datafu.pig.util
Class SimpleEvalFunc<T>

java.lang.Object
  extended by org.apache.pig.EvalFunc<T>
      extended by datafu.pig.util.SimpleEvalFunc<T>
Direct Known Subclasses:
AppendToBag, BoolToInt, FirstTupleFromBag, HaversineDistInMiles, IntToBool, MD5, PrependToBag, Quantile, RandInt, ReverseEnumerate, SHA, UserAgentClassify, WilsonBinConf

public abstract class SimpleEvalFunc<T>
extends org.apache.pig.EvalFunc<T>

Uses reflection to makes writing simple wrapper Pig UDFs easier. For example, writing a simple string trimming UDF might look like this:

  public class TRIM extends EvalFunc<String> 
  {
    public String exec(Tuple input) throws IOException 
    {
      if (input.size() != 1)
        throw new IllegalArgumentException("requires a parameter");

      try {
        Object o = input.get(0);
        if (!(o instanceof String))
          throw new IllegalArgumentException("expected a string");

        String str = (String)o;
        return (str == null) ? null : str.trim();
      } 
      catch (Exception e) {
        throw WrappedIOException.wrap("error...", e);
      }
    }
  }
  
  
There is a lot of boilerplate to check the number of arguments and the parameter types in the tuple. Instead, with this class, you can derive from SimpleEvalFunc and create a call() method (not exec!), just specifying the arguments as a regular function. The class handles all the argument checking and exception wrapping for you. So your code would be:
  public class TRIM2 extends SimpleEvalFunc<String> 
  {
    public String call(String s)
    {
      return (s != null) ? s.trim() : null;
    }
  }
  
  
An example of this UDF in action with Pig:
  grunt> a = load 'test' as (x:chararray, y:chararray); dump a;
    (1 , 2)

  grunt> b = foreach a generate TRIM2(x); dump b;
    (1)

  grunt> c = foreach a generate TRIM2((int)x); dump c;
    datafu.pig.util.TRIM2(java.lang.String): argument type 
    mismatch [#1]; expected java.lang.String, got java.lang.Integer

  grunt> d = foreach a generate TRIM2(x, y); dump d;
    datafu.pig.util.TRIM2(java.lang.String): got 2 arguments, expected 1.
  
  


Field Summary
 
Fields inherited from class org.apache.pig.EvalFunc
log, pigLogger, reporter, returnType
 
Constructor Summary
SimpleEvalFunc()
           
 
Method Summary
 T exec(org.apache.pig.data.Tuple input)
           
 java.lang.reflect.Type getReturnType()
           
 org.apache.pig.impl.logicalLayer.schema.Schema outputSchema(org.apache.pig.impl.logicalLayer.schema.Schema inputSchema)
          Override outputSchema so we can verify the input schema at pig compile time, instead of runtime
 
Methods inherited from class org.apache.pig.EvalFunc
finish, getArgToFuncMapping, getCacheFiles, getInputSchema, getLogger, getPigLogger, getReporter, getSchemaName, isAsynchronous, progress, setInputSchema, setPigLogger, setReporter, setUDFContextSignature, warn
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

SimpleEvalFunc

public SimpleEvalFunc()
Method Detail

getReturnType

public java.lang.reflect.Type getReturnType()
Overrides:
getReturnType in class org.apache.pig.EvalFunc<T>

exec

public T exec(org.apache.pig.data.Tuple input)
       throws java.io.IOException
Specified by:
exec in class org.apache.pig.EvalFunc<T>
Throws:
java.io.IOException

outputSchema

public org.apache.pig.impl.logicalLayer.schema.Schema outputSchema(org.apache.pig.impl.logicalLayer.schema.Schema inputSchema)
Override outputSchema so we can verify the input schema at pig compile time, instead of runtime

Overrides:
outputSchema in class org.apache.pig.EvalFunc<T>
Parameters:
inputSchema - input schema
Returns:
call to super.outputSchema in case schema was defined elsewhere


Matthew Hayes, Sam Shah