org.kiji.mapreduce.produce
Class KijiProduceJobBuilder

java.lang.Object
  extended by org.kiji.mapreduce.framework.MapReduceJobBuilder<T>
      extended by org.kiji.mapreduce.framework.KijiTableInputJobBuilder<KijiProduceJobBuilder>
          extended by org.kiji.mapreduce.produce.KijiProduceJobBuilder

@ApiAudience.Public
@ApiStability.Stable
public final class KijiProduceJobBuilder
extends KijiTableInputJobBuilder<KijiProduceJobBuilder>

Builds jobs that run a producer over a Kiji table.


Field Summary
 
Fields inherited from class org.kiji.mapreduce.framework.MapReduceJobBuilder
ADD_CLASSPATH_TO_JOB_DCACHE_PROPERTY, JOB_LIB_PROPERTY
 
Method Summary
protected  KijiMapReduceJob build(org.apache.hadoop.mapreduce.Job job)
          Wraps a Hadoop MR job in a MapReduceJob.
protected  void configureJob(org.apache.hadoop.mapreduce.Job job)
          Configures a Hadoop MR job.
protected  void configureMapper(org.apache.hadoop.mapreduce.Job job)
          Configures the MapReduce mapper for the job.
static KijiProduceJobBuilder create()
          Creates a new builder for Kiji produce jobs.
protected  KijiReducer<?,?,?,?> getCombiner()
          Gets an instance of the MapReduce combiner to be used for this job.
protected  org.kiji.schema.KijiDataRequest getDataRequest()
          Subclasses must override this to provide a Kiji data request for the input table.
protected  Class<?> getJarClass()
          Returns a class that should be used to determine which Java jar archive will be automatically included on the classpath of the MR tasks.
protected  KijiMapper<?,?,?,?> getMapper()
          Gets an instance of the MapReduce mapper to be used for this job.
protected  KijiReducer<?,?,?,?> getReducer()
          Gets an instance of the MapReduce reducer to be used for this job.
protected  Map<String,KeyValueStore<?,?>> getRequiredStores()
          Method for job components to declare required KeyValueStore entries (and their default implementations).
protected  void validateInputTable(org.kiji.schema.KijiTable inputTable)
          Validates the input table.
 KijiProduceJobBuilder withNumThreads(int numThreads)
          Sets the number of threads to use for running the producer in parallel.
 KijiProduceJobBuilder withOutput(KijiTableMapReduceJobOutput jobOutput)
          Configures the producer output.
 KijiProduceJobBuilder withOutput(MapReduceJobOutput jobOutput)
          Configures the job output.
 KijiProduceJobBuilder withProducer(Class<? extends KijiProducer> producerClass)
          Configures the job with the Kiji producer to run.
 
Methods inherited from class org.kiji.mapreduce.framework.KijiTableInputJobBuilder
getInputTableURI, getJobInput, withFilter, withInputTable, withJobInput, withLimitRow, withStartRow
 
Methods inherited from class org.kiji.mapreduce.framework.MapReduceJobBuilder
addJarDirectory, addJarDirectory, addJarDirectory, build, configureAvro, configureCombiner, configureHTableInput, configureInput, configureJars, configureOutput, configureReducer, configureStores, getConf, getJobOutput, mergeStores, withConf, withStore, withStoreBindings, withStoreBindingsFile
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

create

public static KijiProduceJobBuilder create()
Creates a new builder for Kiji produce jobs.

Returns:
a new Kiji produce job builder.

withProducer

public KijiProduceJobBuilder withProducer(Class<? extends KijiProducer> producerClass)
Configures the job with the Kiji producer to run.

Parameters:
producerClass - The producer class.
Returns:
This builder instance so you may chain configuration method calls.

withOutput

public KijiProduceJobBuilder withOutput(KijiTableMapReduceJobOutput jobOutput)
Configures the producer output.

Parameters:
jobOutput - Output table of the producer must match the input table.
Returns:
this builder instance so you may chain configuration method calls.

withOutput

public KijiProduceJobBuilder withOutput(MapReduceJobOutput jobOutput)
Configures the job output.

Classes that override this method should call super.getJobOutput or override getJobOutput also.

Overrides:
withOutput in class MapReduceJobBuilder<KijiProduceJobBuilder>
Parameters:
jobOutput - Output table of the producer must match the input table. Must be an instance of KijiTableMapReduceJobOutput or a subclass.
Returns:
This builder instance so you may chain configuration method calls.

withNumThreads

public KijiProduceJobBuilder withNumThreads(int numThreads)
Sets the number of threads to use for running the producer in parallel.

You may use this setting to run multiple instances of your producer in parallel within each map task of the job. This may useful for increasing throughput when your producer is not CPU bound.

Parameters:
numThreads - The number of produce-runner threads to use per mapper.
Returns:
This build instance so you may chain configuration method calls.

configureJob

protected void configureJob(org.apache.hadoop.mapreduce.Job job)
                     throws IOException
Configures a Hadoop MR job.

Overrides:
configureJob in class KijiTableInputJobBuilder<KijiProduceJobBuilder>
Parameters:
job - The Hadoop MR job.
Throws:
IOException - If there is an error.

configureMapper

protected void configureMapper(org.apache.hadoop.mapreduce.Job job)
                        throws IOException
Configures the MapReduce mapper for the job.

Overrides:
configureMapper in class MapReduceJobBuilder<KijiProduceJobBuilder>
Parameters:
job - The Hadoop MR job.
Throws:
IOException - If there is an error.

getRequiredStores

protected Map<String,KeyValueStore<?,?>> getRequiredStores()
                                                    throws IOException
Method for job components to declare required KeyValueStore entries (and their default implementations).

Classes inheriting from MapReduceJobBuilder should override this method.

Overrides:
getRequiredStores in class MapReduceJobBuilder<KijiProduceJobBuilder>
Returns:
a map of required names to default KeyValueStore implementations to add. These will be used to augment the existing map if any names are not defined by withStore().
Throws:
IOException - if there is an error configuring stores.

build

protected KijiMapReduceJob build(org.apache.hadoop.mapreduce.Job job)
Wraps a Hadoop MR job in a MapReduceJob.

Specified by:
build in class MapReduceJobBuilder<KijiProduceJobBuilder>
Parameters:
job - The Hadoop MR job.
Returns:
The built MapReduceJob.

getDataRequest

protected org.kiji.schema.KijiDataRequest getDataRequest()
Subclasses must override this to provide a Kiji data request for the input table.

Specified by:
getDataRequest in class KijiTableInputJobBuilder<KijiProduceJobBuilder>
Returns:
the Kiji data request to configure the input table scanner with.

validateInputTable

protected void validateInputTable(org.kiji.schema.KijiTable inputTable)
                           throws IOException
Validates the input table. Sub-classes may override this method to perform additional validation requiring an active connection to the input table.

Overrides:
validateInputTable in class KijiTableInputJobBuilder<KijiProduceJobBuilder>
Parameters:
inputTable - Input table.
Throws:
IOException - on I/O error.

getMapper

protected KijiMapper<?,?,?,?> getMapper()
Gets an instance of the MapReduce mapper to be used for this job.

Specified by:
getMapper in class MapReduceJobBuilder<KijiProduceJobBuilder>
Returns:
An instance of the mapper.

getCombiner

protected KijiReducer<?,?,?,?> getCombiner()
Gets an instance of the MapReduce combiner to be used for this job.

Specified by:
getCombiner in class MapReduceJobBuilder<KijiProduceJobBuilder>
Returns:
An instance of the combiner, or null if this job should not use a combiner.

getReducer

protected KijiReducer<?,?,?,?> getReducer()
Gets an instance of the MapReduce reducer to be used for this job.

Specified by:
getReducer in class MapReduceJobBuilder<KijiProduceJobBuilder>
Returns:
An instance of the reducer, or null if this job should be map-only.

getJarClass

protected Class<?> getJarClass()
Returns a class that should be used to determine which Java jar archive will be automatically included on the classpath of the MR tasks.

Specified by:
getJarClass in class MapReduceJobBuilder<KijiProduceJobBuilder>
Returns:
A class contained in the jar to be included on the MR classpath.


Copyright © 2012-2014 WibiData, Inc.. All Rights Reserved.