org.apache.pig.data
Class SortedDataBag

java.lang.Object
  extended by org.apache.pig.data.Datum
      extended by org.apache.pig.data.DataBag
          extended by org.apache.pig.data.SortedDataBag
All Implemented Interfaces:
Comparable, Iterable<Tuple>, Spillable

public class SortedDataBag
extends DataBag

An ordered collection of Tuples (possibly) with multiples. Data is stored unsorted in an ArrayList as it comes in, and only sorted when it is time to dump it to a file or when the first iterator is requested. Experementation found this to be the faster than storing it sorted to begin with. We allow a user defined comparator, but provide a default comparator in cases where the user doesn't specify one.


Nested Class Summary
 
Nested classes/interfaces inherited from class org.apache.pig.data.DataBag
DataBag.BagDelimiterTuple, DataBag.EndBag, DataBag.StartBag
 
Field Summary
 
Fields inherited from class org.apache.pig.data.DataBag
endBag, MAX_SPILL_FILES, mContents, mMemSize, mMemSizeChanged, mSize, mSpillFiles, startBag
 
Fields inherited from class org.apache.pig.data.Datum
ATOM, BAG, MAP, OBJECT_SIZE, RECORD_1, RECORD_2, RECORD_3, REF_SIZE, TUPLE
 
Constructor Summary
SortedDataBag(EvalSpec spec)
           
 
Method Summary
 boolean isDistinct()
          Find out if the bag is distinct.
 boolean isSorted()
          Find out if the bag is sorted.
 Iterator<Tuple> iterator()
          Get an iterator to the bag.
 long spill()
          Instructs an object to spill whatever it can to disk and release references to any data structures it spills.
 
Methods inherited from class org.apache.pig.data.DataBag
add, addAll, cardinality, clear, compareTo, content, equals, finalize, getMemorySize, getSpillFile, hashCode, markStale, reportProgress, size, toString, write
 
Methods inherited from class java.lang.Object
clone, getClass, notify, notifyAll, wait, wait, wait
 

Constructor Detail

SortedDataBag

public SortedDataBag(EvalSpec spec)
Parameters:
spec - EvalSpec to use to do the sorting. spec.getComparator() will be called to populate our mComp field. If null, DefaultComparator will be used.
Method Detail

isSorted

public boolean isSorted()
Description copied from class: DataBag
Find out if the bag is sorted.

Specified by:
isSorted in class DataBag

isDistinct

public boolean isDistinct()
Description copied from class: DataBag
Find out if the bag is distinct.

Specified by:
isDistinct in class DataBag

iterator

public Iterator<Tuple> iterator()
Description copied from class: DataBag
Get an iterator to the bag. For default and distinct bags, no particular order is guaranteed. For sorted bags the order is guaranteed to be sorted according to the provided comparator.

Specified by:
iterator in interface Iterable<Tuple>
Specified by:
iterator in class DataBag

spill

public long spill()
Description copied from interface: Spillable
Instructs an object to spill whatever it can to disk and release references to any data structures it spills.



Copyright © ${year} The Apache Software Foundation