|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object datafu.hourglass.jobs.ExecutionPlanner datafu.hourglass.jobs.PartitionPreservingExecutionPlanner
public class PartitionPreservingExecutionPlanner
Execution planner used by AbstractPartitionPreservingIncrementalJob
and its derived classes.
This creates a plan to process partitioned input data and produce partitioned output data.
To use this class, the input and output paths must be specified. In addition the desired input date
range can be specified through several methods. Then createPlan()
can be called and the
execution plan will be created. The inputs to process will be available from getInputsToProcess()
,
the number of reducers to use will be available from getNumReducers()
, and the input schemas
will be available from getInputSchemas()
.
Configuration properties are used to configure a ReduceEstimator
instance. This is used to
calculate how many reducers should be used.
The number of reducers to use is based on the input data size and the
num.reducers.bytes.per.reducer property.
Check ReduceEstimator
for more details on how the properties are used.
Constructor Summary | |
---|---|
PartitionPreservingExecutionPlanner(org.apache.hadoop.fs.FileSystem fs,
java.util.Properties props)
Initializes the execution planner. |
Method Summary | |
---|---|
void |
createPlan()
Create the execution plan. |
java.util.List<java.util.Date> |
getDatesToProcess()
Gets the input dates which are to be processed. |
java.util.List<org.apache.avro.Schema> |
getInputSchemas()
Gets the input schemas. |
java.util.Map<java.lang.String,org.apache.avro.Schema> |
getInputSchemasByPath()
Gets a map from input path to schema. |
java.util.List<DatePath> |
getInputsToProcess()
Gets the inputs which are to be processed. |
boolean |
getNeedsAnotherPass()
Gets whether another pass will be required. |
int |
getNumReducers()
Get the number of reducers to use based on the input data size. |
Methods inherited from class datafu.hourglass.jobs.ExecutionPlanner |
---|
determineAvailableInputDates, determineDateRange, getAvailableInputsByDate, getDailyData, getDatedData, getDateRange, getDaysAgo, getEndDate, getFileSystem, getInputPaths, getMaxToProcess, getNumDays, getOutputPath, getProps, getStartDate, isFailOnMissing, loadInputData, setDaysAgo, setEndDate, setFailOnMissing, setInputPaths, setMaxToProcess, setNumDays, setOutputPath, setStartDate |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public PartitionPreservingExecutionPlanner(org.apache.hadoop.fs.FileSystem fs, java.util.Properties props)
fs
- file systemprops
- configuration propertiesMethod Detail |
---|
public void createPlan() throws java.io.IOException
java.io.IOException
public int getNumReducers()
createPlan()
first.
public java.util.List<org.apache.avro.Schema> getInputSchemas()
createPlan()
first.
public java.util.Map<java.lang.String,org.apache.avro.Schema> getInputSchemasByPath()
createPlan()
first.
public boolean getNeedsAnotherPass()
createPlan()
first.
public java.util.List<DatePath> getInputsToProcess()
createPlan()
first.
public java.util.List<java.util.Date> getDatesToProcess()
createPlan()
first.
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |