org.apache.pig.data
Class SortedDataBag
java.lang.Object
org.apache.pig.data.Datum
org.apache.pig.data.DataBag
org.apache.pig.data.SortedDataBag
- All Implemented Interfaces:
- Comparable, Iterable<Tuple>, Spillable
public class SortedDataBag
- extends DataBag
An ordered collection of Tuples (possibly) with multiples. Data is
stored unsorted in an ArrayList as it comes in, and only sorted when it
is time to dump
it to a file or when the first iterator is requested. Experementation
found this to be the faster than storing it sorted to begin with.
We allow a user defined comparator, but provide a default comparator in
cases where the user doesn't specify one.
Method Summary |
boolean |
isDistinct()
Find out if the bag is distinct. |
boolean |
isSorted()
Find out if the bag is sorted. |
Iterator<Tuple> |
iterator()
Get an iterator to the bag. |
long |
spill()
Instructs an object to spill whatever it can to disk and release
references to any data structures it spills. |
Methods inherited from class org.apache.pig.data.DataBag |
add, addAll, cardinality, clear, compareTo, content, equals, finalize, getMemorySize, getSpillFile, hashCode, markStale, reportProgress, size, toString, write |
SortedDataBag
public SortedDataBag(EvalSpec spec)
- Parameters:
spec
- EvalSpec to use to do the sorting. spec.getComparator()
will be called to populate our mComp field. If null,
DefaultComparator will be used.
isSorted
public boolean isSorted()
- Description copied from class:
DataBag
- Find out if the bag is sorted.
- Specified by:
isSorted
in class DataBag
isDistinct
public boolean isDistinct()
- Description copied from class:
DataBag
- Find out if the bag is distinct.
- Specified by:
isDistinct
in class DataBag
iterator
public Iterator<Tuple> iterator()
- Description copied from class:
DataBag
- Get an iterator to the bag. For default and distinct bags,
no particular order is guaranteed. For sorted bags the order
is guaranteed to be sorted according
to the provided comparator.
- Specified by:
iterator
in interface Iterable<Tuple>
- Specified by:
iterator
in class DataBag
spill
public long spill()
- Description copied from interface:
Spillable
- Instructs an object to spill whatever it can to disk and release
references to any data structures it spills.
Copyright © ${year} The Apache Software Foundation