org.apache.pig.impl.builtin
Class RandomSampleLoader

java.lang.Object
  extended by org.apache.pig.builtin.BinStorage
      extended by org.apache.pig.impl.builtin.RandomSampleLoader
All Implemented Interfaces:
LoadFunc, ReversibleLoadStoreFunc, StoreFunc

public class RandomSampleLoader
extends BinStorage


Field Summary
static int defaultNumSamples
           
 
Fields inherited from class org.apache.pig.builtin.BinStorage
end, in
 
Constructor Summary
RandomSampleLoader()
           
 
Method Summary
 void bindTo(OutputStream os)
          Specifies the OutputStream to write to.
 void bindTo(String fileName, BufferedPositionedInputStream is, long offset, long end)
          Specifies a portion of an InputStream to read tuples.
 Tuple getNext()
          Retrieves the next tuple to be processed.
 
Methods inherited from class org.apache.pig.builtin.BinStorage
equals, finish, putNext
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

defaultNumSamples

public static int defaultNumSamples
Constructor Detail

RandomSampleLoader

public RandomSampleLoader()
Method Detail

bindTo

public void bindTo(String fileName,
                   BufferedPositionedInputStream is,
                   long offset,
                   long end)
            throws IOException
Description copied from interface: LoadFunc
Specifies a portion of an InputStream to read tuples. Because the starting and ending offsets may not be on record boundaries it is up to the implementor to deal with figuring out the actual starting and ending offsets in such a way that an arbitrarily sliced up file will be processed in its entirety.

A common way of handling slices in the middle of records is to start at the given offset and, if the offset is not zero, skip to the end of the first record (which may be a partial record) before reading tuples. Reading continues until a tuple has been read that ends at an offset past the ending offset.

The load function should not do any buffering on the input stream. Buffering will cause the offsets returned by is.getPos() to be unreliable.

Specified by:
bindTo in interface LoadFunc
Overrides:
bindTo in class BinStorage
Parameters:
fileName - the name of the file to be read
is - the stream representing the file to be processed, and which can also provide its position.
offset - the offset to start reading tuples.
end - the ending offset for reading.
Throws:
IOException

getNext

public Tuple getNext()
              throws IOException
Description copied from interface: LoadFunc
Retrieves the next tuple to be processed.

Specified by:
getNext in interface LoadFunc
Overrides:
getNext in class BinStorage
Returns:
the next tuple to be processed or null if there are no more tuples to be processed.
Throws:
IOException

bindTo

public void bindTo(OutputStream os)
            throws IOException
Description copied from interface: StoreFunc
Specifies the OutputStream to write to. This will be called before store(Tuple) is invoked.

Specified by:
bindTo in interface StoreFunc
Overrides:
bindTo in class BinStorage
Parameters:
os - The stream to write tuples to.
Throws:
IOException


Copyright © ${year} The Apache Software Foundation