|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.apache.hadoop.conf.Configured datafu.hourglass.jobs.AbstractJob datafu.hourglass.jobs.TimeBasedJob datafu.hourglass.jobs.IncrementalJob
public abstract class IncrementalJob
Base class for incremental jobs. Incremental jobs consume day-partitioned input data.
Implementations of this class must provide key, intermediate value, and output value schemas. The key and intermediate value schemas define the output for the mapper and combiner. The key and output value schemas define the output for the reducer.
This class has the same configuration and methods as TimeBasedJob
.
In addition it also recognizes the following properties:
Constructor Summary | |
---|---|
IncrementalJob()
Initializes the job. |
|
IncrementalJob(java.lang.String name,
java.util.Properties props)
Initializes the job with a job name and properties. |
Method Summary | |
---|---|
protected abstract org.apache.avro.Schema |
getIntermediateValueSchema()
Gets the Avro schema for the intermediate value. |
protected abstract org.apache.avro.Schema |
getKeySchema()
Gets the Avro schema for the key. |
java.lang.Integer |
getMaxIterations()
Gets the maximum number of iterations for the job. |
java.lang.Integer |
getMaxToProcess()
Gets the maximum number of days of input data to process in a single run. |
protected abstract org.apache.avro.Schema |
getOutputValueSchema()
Gets the Avro schema for the output data. |
protected TaskSchemas |
getSchemas()
Gets the schemas. |
protected void |
initialize()
Initialization required before running job. |
boolean |
isFailOnMissing()
Gets whether the job should fail if input data within the desired range is missing. |
void |
setFailOnMissing(boolean failOnMissing)
Sets whether the job should fail if input data within the desired range is missing. |
void |
setMaxIterations(java.lang.Integer maxIterations)
Sets the maximum number of iterations for the job. |
void |
setMaxToProcess(java.lang.Integer maxToProcess)
Sets the maximum number of days of input data to process in a single run. |
void |
setProperties(java.util.Properties props)
Sets the configuration properties. |
Methods inherited from class datafu.hourglass.jobs.TimeBasedJob |
---|
getDaysAgo, getEndDate, getNumDays, getStartDate, setDaysAgo, setEndDate, setNumDays, setStartDate, validate |
Methods inherited from class datafu.hourglass.jobs.AbstractJob |
---|
config, createRandomTempPath, ensurePath, getCountersParentPath, getFileSystem, getInputPaths, getName, getNumReducers, getOutputPath, getProperties, getRetentionCount, getTempPath, isUseCombiner, randomTempPath, run, setCountersParentPath, setInputPaths, setName, setNumReducers, setOutputPath, setRetentionCount, setTempPath, setUseCombiner |
Methods inherited from class org.apache.hadoop.conf.Configured |
---|
getConf, setConf |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public IncrementalJob()
public IncrementalJob(java.lang.String name, java.util.Properties props)
name
- job nameprops
- configuration propertiesMethod Detail |
---|
public void setProperties(java.util.Properties props)
AbstractJob
setProperties
in class TimeBasedJob
props
- Propertiesprotected void initialize()
AbstractJob
initialize
in class AbstractJob
protected abstract org.apache.avro.Schema getKeySchema()
This is also used as the key for the map output.
protected abstract org.apache.avro.Schema getIntermediateValueSchema()
This is also used for the value for the map output.
protected abstract org.apache.avro.Schema getOutputValueSchema()
protected TaskSchemas getSchemas()
public java.lang.Integer getMaxToProcess()
public void setMaxToProcess(java.lang.Integer maxToProcess)
maxToProcess
- maximum number of days to processpublic java.lang.Integer getMaxIterations()
public void setMaxIterations(java.lang.Integer maxIterations)
maxIterations
- maximum number of iterationspublic boolean isFailOnMissing()
public void setFailOnMissing(boolean failOnMissing)
failOnMissing
- true if the job should fail on missing data
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |