org.apache.pig.backend.hadoop.executionengine.mapreduceExec
Class PigMapReduce

java.lang.Object
  extended by org.apache.pig.backend.hadoop.executionengine.mapreduceExec.PigMapReduce
All Implemented Interfaces:
Closeable, org.apache.hadoop.mapred.JobConfigurable, org.apache.hadoop.mapred.MapRunnable<org.apache.hadoop.io.WritableComparable,Tuple,org.apache.hadoop.io.WritableComparable,org.apache.hadoop.io.Writable>, org.apache.hadoop.mapred.Reducer<Tuple,IndexedTuple,org.apache.hadoop.io.WritableComparable,org.apache.hadoop.io.Writable>

public class PigMapReduce
extends Object
implements org.apache.hadoop.mapred.MapRunnable<org.apache.hadoop.io.WritableComparable,Tuple,org.apache.hadoop.io.WritableComparable,org.apache.hadoop.io.Writable>, org.apache.hadoop.mapred.Reducer<Tuple,IndexedTuple,org.apache.hadoop.io.WritableComparable,org.apache.hadoop.io.Writable>

This class is a wrapper of sorts for Pig Map/Reduce jobs. Both the Mapper and the Reducer are implemented by this class. The methods of this class are driven by job configuration variables:

pig.inputs
A semi-colon separated list of inputs. If an input uses a special parser, it will be specified by adding a colon and the name of the parser to the input. For example: /tmp/names.txt;/tmp/logs.dat:com.yahoo.research.pig.parser.LogParser will parse /tmp/names.txt using the default parser and /tmp/logs.dat using com.yahoo.research.pig.parser.LogParser.
pig.mapFuncs
A semi-colon separated list of functions-specification to be applied to the inputs in the Map phase. This list must have the same number of items as pig.inputs because the each functions-spectification will be matched to the corresponding input.
pig.groupFuncs
A semi-colon separated list of group functions. As with pig.mapFuncs, this list must have the same number of items as pig.inputs because the each group function will be matched to the corresponding input.
pig.reduceFuncs
functions-specification to be applied to the tuples passed into the Reduce phase.

Author:
breed

Field Summary
static org.apache.hadoop.mapred.Reporter reporter
           
 
Constructor Summary
PigMapReduce()
           
 
Method Summary
 void close()
          Nothing happens here.
 void closeSideFiles()
           
 void configure(org.apache.hadoop.mapred.JobConf jobConf)
           
static PigContext getPigContext()
           
 void reduce(Tuple key, Iterator<IndexedTuple> values, org.apache.hadoop.mapred.OutputCollector<org.apache.hadoop.io.WritableComparable,org.apache.hadoop.io.Writable> output, org.apache.hadoop.mapred.Reporter reporter)
           
 void run(org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.WritableComparable,Tuple> input, org.apache.hadoop.mapred.OutputCollector<org.apache.hadoop.io.WritableComparable,org.apache.hadoop.io.Writable> output, org.apache.hadoop.mapred.Reporter reporter)
          This function is called in MapTask by Hadoop as the Mapper.run() method.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

reporter

public static org.apache.hadoop.mapred.Reporter reporter
Constructor Detail

PigMapReduce

public PigMapReduce()
Method Detail

run

public void run(org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.WritableComparable,Tuple> input,
                org.apache.hadoop.mapred.OutputCollector<org.apache.hadoop.io.WritableComparable,org.apache.hadoop.io.Writable> output,
                org.apache.hadoop.mapred.Reporter reporter)
         throws IOException
This function is called in MapTask by Hadoop as the Mapper.run() method. We basically pull the tuples from our PigRecordReader (see ugly ThreadLocal hack), pipe the tuples through the function pipeline and then close the writer.

Specified by:
run in interface org.apache.hadoop.mapred.MapRunnable<org.apache.hadoop.io.WritableComparable,Tuple,org.apache.hadoop.io.WritableComparable,org.apache.hadoop.io.Writable>
Throws:
IOException

reduce

public void reduce(Tuple key,
                   Iterator<IndexedTuple> values,
                   org.apache.hadoop.mapred.OutputCollector<org.apache.hadoop.io.WritableComparable,org.apache.hadoop.io.Writable> output,
                   org.apache.hadoop.mapred.Reporter reporter)
            throws IOException
Specified by:
reduce in interface org.apache.hadoop.mapred.Reducer<Tuple,IndexedTuple,org.apache.hadoop.io.WritableComparable,org.apache.hadoop.io.Writable>
Throws:
IOException

configure

public void configure(org.apache.hadoop.mapred.JobConf jobConf)
Specified by:
configure in interface org.apache.hadoop.mapred.JobConfigurable

close

public void close()
           throws IOException
Nothing happens here.

Specified by:
close in interface Closeable
Throws:
IOException

getPigContext

public static PigContext getPigContext()

closeSideFiles

public void closeSideFiles()


Copyright © ${year} The Apache Software Foundation