IncrementalJob (DataFu Hourglass)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

datafu.hourglass.jobs
Class IncrementalJob

java.lang.Object
  org.apache.hadoop.conf.Configured
      datafu.hourglass.jobs.AbstractJob
          datafu.hourglass.jobs.TimeBasedJob
              datafu.hourglass.jobs.IncrementalJob

All Implemented Interfaces:: org.apache.hadoop.conf.Configurable

Direct Known Subclasses:: AbstractPartitionCollapsingIncrementalJob, AbstractPartitionPreservingIncrementalJob

public abstract class IncrementalJob
extends TimeBasedJob
extends TimeBasedJob

Base class for incremental jobs. Incremental jobs consume day-partitioned input data.

Implementations of this class must provide key, intermediate value, and output value schemas. The key and intermediate value schemas define the output for the mapper and combiner. The key and output value schemas define the output for the reducer.

This class has the same configuration and methods as TimeBasedJob. In addition it also recognizes the following properties:

max.iterations - maximum number of iterations for the job
max.days.to.process - maximum number of days of input data to process in a single run
fail.on.missing - whether the job should fail if input data within the desired range is missing

Author:: "Matthew Hayes"

Constructor Summary
`IncrementalJob()` Initializes the job.
`IncrementalJob(java.lang.String name, java.util.Properties props)` Initializes the job with a job name and properties.

Method Summary
`protected abstract org.apache.avro.Schema`	`getIntermediateValueSchema()` Gets the Avro schema for the intermediate value.
`protected abstract org.apache.avro.Schema`	`getKeySchema()` Gets the Avro schema for the key.
`java.lang.Integer`	`getMaxIterations()` Gets the maximum number of iterations for the job.
`java.lang.Integer`	`getMaxToProcess()` Gets the maximum number of days of input data to process in a single run.
`protected abstract org.apache.avro.Schema`	`getOutputValueSchema()` Gets the Avro schema for the output data.
`protected TaskSchemas`	`getSchemas()` Gets the schemas.
`protected void`	`initialize()` Initialization required before running job.
`boolean`	`isFailOnMissing()` Gets whether the job should fail if input data within the desired range is missing.
`void`	`setFailOnMissing(boolean failOnMissing)` Sets whether the job should fail if input data within the desired range is missing.
`void`	`setMaxIterations(java.lang.Integer maxIterations)` Sets the maximum number of iterations for the job.
`void`	`setMaxToProcess(java.lang.Integer maxToProcess)` Sets the maximum number of days of input data to process in a single run.
`void`	`setProperties(java.util.Properties props)` Sets the configuration properties.

Methods inherited from class datafu.hourglass.jobs.TimeBasedJob
`getDaysAgo, getEndDate, getNumDays, getStartDate, setDaysAgo, setEndDate, setNumDays, setStartDate, validate`

Methods inherited from class datafu.hourglass.jobs.AbstractJob
`config, createRandomTempPath, ensurePath, getCountersParentPath, getFileSystem, getInputPaths, getName, getNumReducers, getOutputPath, getProperties, getRetentionCount, getTempPath, isUseCombiner, randomTempPath, run, setCountersParentPath, setInputPaths, setName, setNumReducers, setOutputPath, setRetentionCount, setTempPath, setUseCombiner`

Methods inherited from class org.apache.hadoop.conf.Configured
`getConf, setConf`

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Constructor Detail

IncrementalJob

public IncrementalJob()

Initializes the job.

IncrementalJob

public IncrementalJob(java.lang.String name,
                      java.util.Properties props)

Initializes the job with a job name and properties.

Parameters:: name - job name; props - configuration properties

Method Detail

setProperties

public void setProperties(java.util.Properties props)

Description copied from class: AbstractJob

Sets the configuration properties.

Overrides:: setProperties in class TimeBasedJob

Parameters:: props - Properties

initialize

protected void initialize()

Description copied from class: AbstractJob

Initialization required before running job.

Overrides:: initialize in class AbstractJob

getKeySchema

protected abstract org.apache.avro.Schema getKeySchema()

Gets the Avro schema for the key.

This is also used as the key for the map output.

Returns:: key schema.

getIntermediateValueSchema

protected abstract org.apache.avro.Schema getIntermediateValueSchema()

Gets the Avro schema for the intermediate value.

This is also used for the value for the map output.

Returns:: intermediate value schema

getOutputValueSchema

protected abstract org.apache.avro.Schema getOutputValueSchema()

Gets the Avro schema for the output data.

Returns:: output data schema

getSchemas

protected TaskSchemas getSchemas()

Gets the schemas.

Returns:: schemas

getMaxToProcess

public java.lang.Integer getMaxToProcess()

Gets the maximum number of days of input data to process in a single run.

Returns:: maximum number of days to process

setMaxToProcess

public void setMaxToProcess(java.lang.Integer maxToProcess)

Sets the maximum number of days of input data to process in a single run.

Parameters:: maxToProcess - maximum number of days to process

getMaxIterations

public java.lang.Integer getMaxIterations()

Gets the maximum number of iterations for the job. Multiple iterations will only occur when there is a maximum set for the number of days to process in a single run. An error should be thrown if this number will be exceeded.

Returns:: maximum number of iterations

setMaxIterations

public void setMaxIterations(java.lang.Integer maxIterations)

Sets the maximum number of iterations for the job. Multiple iterations will only occur when there is a maximum set for the number of days to process in a single run. An error should be thrown if this number will be exceeded.

Parameters:: maxIterations - maximum number of iterations

isFailOnMissing

public boolean isFailOnMissing()

Gets whether the job should fail if input data within the desired range is missing.

Returns:: true if the job should fail on missing data

setFailOnMissing

public void setFailOnMissing(boolean failOnMissing)

Sets whether the job should fail if input data within the desired range is missing.

Parameters:: failOnMissing - true if the job should fail on missing data

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

Matthew Hayes

datafu.hourglass.jobs Class IncrementalJob

IncrementalJob

IncrementalJob

setProperties

initialize

getKeySchema

getIntermediateValueSchema

getOutputValueSchema

getSchemas

getMaxToProcess

setMaxToProcess

getMaxIterations

setMaxIterations

isFailOnMissing

setFailOnMissing

datafu.hourglass.jobs
Class IncrementalJob