|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.apache.pig.data.Datum
org.apache.pig.data.DataBag
org.apache.pig.data.DistinctDataBag
public class DistinctDataBag
An unordered collection of Tuples with no multiples. Data is stored without duplicates as it comes in. When it is time to spill, that data is sorted and written to disk. It must also be sorted upon the first read, otherwise if a spill happened after that the iterators would have no way to find their place in the new file. The data is stored in a HashSet. When it is time to sort it is placed in an ArrayList and then sorted. Dispite all these machinations, this was found to be faster than storing it in a TreeSet.
Nested Class Summary |
---|
Nested classes/interfaces inherited from class org.apache.pig.data.DataBag |
---|
DataBag.BagDelimiterTuple, DataBag.EndBag, DataBag.StartBag |
Field Summary |
---|
Fields inherited from class org.apache.pig.data.DataBag |
---|
endBag, MAX_SPILL_FILES, mContents, mMemSize, mMemSizeChanged, mSize, mSpillFiles, startBag |
Fields inherited from class org.apache.pig.data.Datum |
---|
ATOM, BAG, MAP, OBJECT_SIZE, RECORD_1, RECORD_2, RECORD_3, REF_SIZE, TUPLE |
Constructor Summary | |
---|---|
DistinctDataBag()
|
Method Summary | |
---|---|
void |
add(Tuple t)
Add a tuple to the bag. |
void |
addAll(DataBag b)
Add contents of a bag to the bag. |
boolean |
isDistinct()
Find out if the bag is distinct. |
boolean |
isSorted()
Find out if the bag is sorted. |
Iterator<Tuple> |
iterator()
Get an iterator to the bag. |
long |
size()
Get the number of elements in the bag, both in memory and on disk. |
long |
spill()
Instructs an object to spill whatever it can to disk and release references to any data structures it spills. |
Methods inherited from class org.apache.pig.data.DataBag |
---|
cardinality, clear, compareTo, content, equals, finalize, getMemorySize, getSpillFile, hashCode, markStale, reportProgress, toString, write |
Methods inherited from class java.lang.Object |
---|
clone, getClass, notify, notifyAll, wait, wait, wait |
Constructor Detail |
---|
public DistinctDataBag()
Method Detail |
---|
public boolean isSorted()
DataBag
isSorted
in class DataBag
public boolean isDistinct()
DataBag
isDistinct
in class DataBag
public long size()
DataBag
size
in class DataBag
public Iterator<Tuple> iterator()
DataBag
iterator
in interface Iterable<Tuple>
iterator
in class DataBag
public void add(Tuple t)
DataBag
add
in class DataBag
t
- tuple to add.public void addAll(DataBag b)
DataBag
addAll
in class DataBag
b
- bag to add contents of.public long spill()
Spillable
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |