datafu.pig.util
Class AliasableEvalFunc<T>

java.lang.Object
  extended by org.apache.pig.EvalFunc<T>
      extended by datafu.pig.util.ContextualEvalFunc<T>
          extended by datafu.pig.util.AliasableEvalFunc<T>
Type Parameters:
T -
Direct Known Subclasses:
BagGroup, BagLeftOuterJoin, Coalesce, TransposeTupleToBag

public abstract class AliasableEvalFunc<T>
extends ContextualEvalFunc<T>

Makes implementing and using UDFs easier by enabling named parameters.

This works by capturing the schema of the input tuple on the front-end and storing it into the UDFContext. It provides an easy means of referencing the parameters on the back-end to aid in writing schema-based UDFs.

A related class is SimpleEvalFunc. However they are actually fairly different. The primary purpose of SimpleEvalFunc is to skip the boilerplate under the assumption that the arguments in and out are well... simple. It also assumes that these arguments are in a well-defined positional ordering.

AliasableEvalFunc allows the UDF writer to avoid dealing with all positional assumptions and instead reference fields by their aliases. This practice allows for more readable code since the alias names should have more meaning to the reader than the position. This approach is also less error prone since it creates a more explicit contract for what input the UDF expects and prevents simple mistakes that positional-based UDFs could not easily catch, such as transposing two fields of the same type. If this contract is violated, say, by attempting to reference a field that is not present, a meaningful error message may be thrown.

Example: This example computes the monthly payments for mortgages depending on interest rate.

 public class MortgagePayment extends AliasableEvalFunc<DataBag> {
    ...
    public DataBag exec(Tuple input) throws IOException {
      DataBag output = BagFactory.getInstance().newDefaultBag();
      
      Double principal = getDouble(input, "principal"); // get a value from the input tuple by alias
      Integer numPayments = getInteger(input, "num_payments");
      DataBag interestRates = getBag(input, "interest_rates");
    
      for (Tuple interestTuple : interestRates) {
        Double interest = getDouble(interestTuple, getPrefixedAliasName("interest_rates", "interest_rate"));  // get a value from the inner bag tuple by alias
        double monthlyPayment = computeMonthlyPayment(principal, numPayments, interest);
        output.add(TupleFactory.getInstance().newTuple(monthlyPayment));
      }
      return output;
    }
  }
 
 

Author:
wvaughan

Field Summary
 
Fields inherited from class org.apache.pig.EvalFunc
log, pigLogger, reporter, returnType
 
Constructor Summary
AliasableEvalFunc()
           
 
Method Summary
 org.apache.pig.data.DataBag getBag(org.apache.pig.data.Tuple tuple, java.lang.String alias)
           
 java.lang.Boolean getBoolean(org.apache.pig.data.Tuple tuple, java.lang.String alias)
           
 java.lang.Double getDouble(org.apache.pig.data.Tuple tuple, java.lang.String alias)
           
 java.lang.Double getDouble(org.apache.pig.data.Tuple tuple, java.lang.String alias, java.lang.Double defaultValue)
           
 java.util.Map<java.lang.String,java.lang.Integer> getFieldAliases()
          Field aliases are generated from the input schema
Each alias maps to a bag position
Inner bags/tuples will have alias of outer.inner.foo
 java.lang.Float getFloat(org.apache.pig.data.Tuple tuple, java.lang.String alias)
           
 java.lang.Float getFloat(org.apache.pig.data.Tuple tuple, java.lang.String alias, java.lang.Float defaultValue)
           
 java.lang.Integer getInteger(org.apache.pig.data.Tuple tuple, java.lang.String alias)
           
 java.lang.Integer getInteger(org.apache.pig.data.Tuple tuple, java.lang.String alias, java.lang.Integer defaultValue)
           
 java.lang.Long getLong(org.apache.pig.data.Tuple tuple, java.lang.String alias)
           
 java.lang.Long getLong(org.apache.pig.data.Tuple tuple, java.lang.String alias, java.lang.Long defaultValue)
           
 java.lang.Object getObject(org.apache.pig.data.Tuple tuple, java.lang.String alias)
           
abstract  org.apache.pig.impl.logicalLayer.schema.Schema getOutputSchema(org.apache.pig.impl.logicalLayer.schema.Schema input)
          Specify the output schema as in {link EvalFunc#outputSchema(Schema)}.
 java.lang.Integer getPosition(java.lang.String alias)
           
 java.lang.Integer getPosition(java.lang.String prefix, java.lang.String alias)
           
 java.lang.String getPrefixedAliasName(java.lang.String prefix, java.lang.String alias)
           
 java.lang.String getString(org.apache.pig.data.Tuple tuple, java.lang.String alias)
           
 java.lang.String getString(org.apache.pig.data.Tuple tuple, java.lang.String alias, java.lang.String defaultValue)
           
 org.apache.pig.impl.logicalLayer.schema.Schema outputSchema(org.apache.pig.impl.logicalLayer.schema.Schema input)
          A wrapper method which captures the schema and then calls getOutputSchema
 
Methods inherited from class datafu.pig.util.ContextualEvalFunc
getContextProperties, getInstanceName, getInstanceProperties, setUDFContextSignature
 
Methods inherited from class org.apache.pig.EvalFunc
exec, finish, getArgToFuncMapping, getCacheFiles, getInputSchema, getLogger, getPigLogger, getReporter, getReturnType, getSchemaName, isAsynchronous, progress, setInputSchema, setPigLogger, setReporter, warn
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

AliasableEvalFunc

public AliasableEvalFunc()
Method Detail

outputSchema

public org.apache.pig.impl.logicalLayer.schema.Schema outputSchema(org.apache.pig.impl.logicalLayer.schema.Schema input)
A wrapper method which captures the schema and then calls getOutputSchema

Overrides:
outputSchema in class org.apache.pig.EvalFunc<T>

getOutputSchema

public abstract org.apache.pig.impl.logicalLayer.schema.Schema getOutputSchema(org.apache.pig.impl.logicalLayer.schema.Schema input)
Specify the output schema as in {link EvalFunc#outputSchema(Schema)}.

Parameters:
input -
Returns:
outputSchema

getPrefixedAliasName

public java.lang.String getPrefixedAliasName(java.lang.String prefix,
                                             java.lang.String alias)

getFieldAliases

public java.util.Map<java.lang.String,java.lang.Integer> getFieldAliases()
Field aliases are generated from the input schema
Each alias maps to a bag position
Inner bags/tuples will have alias of outer.inner.foo

Returns:
A map of field alias to field position

getPosition

public java.lang.Integer getPosition(java.lang.String alias)

getPosition

public java.lang.Integer getPosition(java.lang.String prefix,
                                     java.lang.String alias)

getInteger

public java.lang.Integer getInteger(org.apache.pig.data.Tuple tuple,
                                    java.lang.String alias)
                             throws org.apache.pig.backend.executionengine.ExecException
Throws:
org.apache.pig.backend.executionengine.ExecException

getInteger

public java.lang.Integer getInteger(org.apache.pig.data.Tuple tuple,
                                    java.lang.String alias,
                                    java.lang.Integer defaultValue)
                             throws org.apache.pig.backend.executionengine.ExecException
Throws:
org.apache.pig.backend.executionengine.ExecException

getLong

public java.lang.Long getLong(org.apache.pig.data.Tuple tuple,
                              java.lang.String alias)
                       throws org.apache.pig.backend.executionengine.ExecException
Throws:
org.apache.pig.backend.executionengine.ExecException

getLong

public java.lang.Long getLong(org.apache.pig.data.Tuple tuple,
                              java.lang.String alias,
                              java.lang.Long defaultValue)
                       throws org.apache.pig.backend.executionengine.ExecException
Throws:
org.apache.pig.backend.executionengine.ExecException

getFloat

public java.lang.Float getFloat(org.apache.pig.data.Tuple tuple,
                                java.lang.String alias)
                         throws org.apache.pig.backend.executionengine.ExecException
Throws:
org.apache.pig.backend.executionengine.ExecException

getFloat

public java.lang.Float getFloat(org.apache.pig.data.Tuple tuple,
                                java.lang.String alias,
                                java.lang.Float defaultValue)
                         throws org.apache.pig.backend.executionengine.ExecException
Throws:
org.apache.pig.backend.executionengine.ExecException

getDouble

public java.lang.Double getDouble(org.apache.pig.data.Tuple tuple,
                                  java.lang.String alias)
                           throws org.apache.pig.backend.executionengine.ExecException
Throws:
org.apache.pig.backend.executionengine.ExecException

getDouble

public java.lang.Double getDouble(org.apache.pig.data.Tuple tuple,
                                  java.lang.String alias,
                                  java.lang.Double defaultValue)
                           throws org.apache.pig.backend.executionengine.ExecException
Throws:
org.apache.pig.backend.executionengine.ExecException

getString

public java.lang.String getString(org.apache.pig.data.Tuple tuple,
                                  java.lang.String alias)
                           throws org.apache.pig.backend.executionengine.ExecException
Throws:
org.apache.pig.backend.executionengine.ExecException

getString

public java.lang.String getString(org.apache.pig.data.Tuple tuple,
                                  java.lang.String alias,
                                  java.lang.String defaultValue)
                           throws org.apache.pig.backend.executionengine.ExecException
Throws:
org.apache.pig.backend.executionengine.ExecException

getBoolean

public java.lang.Boolean getBoolean(org.apache.pig.data.Tuple tuple,
                                    java.lang.String alias)
                             throws org.apache.pig.backend.executionengine.ExecException
Throws:
org.apache.pig.backend.executionengine.ExecException

getBag

public org.apache.pig.data.DataBag getBag(org.apache.pig.data.Tuple tuple,
                                          java.lang.String alias)
                                   throws org.apache.pig.backend.executionengine.ExecException
Throws:
org.apache.pig.backend.executionengine.ExecException

getObject

public java.lang.Object getObject(org.apache.pig.data.Tuple tuple,
                                  java.lang.String alias)
                           throws org.apache.pig.backend.executionengine.ExecException
Throws:
org.apache.pig.backend.executionengine.ExecException


Matthew Hayes, Sam Shah