Intras row scanning can be done using ColumnRangeFilter. Other filters such as ColumnPrefixFilter or MultipleColumnPrefixFilter might also be handy for this. All three filters have in common that they can provide scanners (see scanning in hbase) with what I will call "seek hints". These hints allow a scanner to seek to the next column, the next row, or an arbitrary next cell determined by the filter. This is far more efficient than having a dumb filter that is passed each cell and determines whether the cell is included in the result or not.
Many other filters also provide these "seek hints". The exception here are filters that filter on column values, as there is no inherent ordering between column values; these filters need to look at the value for each column.
For example check out this code in MultipleColumnPrefixFilter (ASF 2.0 license):
TreeSet<byte []> lesserOrEqualPrefixes =
(TreeSet<byte []>) sortedPrefixes.headSet(qualifier, true);
if (lesserOrEqualPrefixes.size() != 0) {
byte [] largestPrefixSmallerThanQualifier = lesserOrEqualPrefixes.last();
if (Bytes.startsWith(qualifier, largestPrefixSmallerThanQualifier)) {
return ReturnCode.INCLUDE;
}
if (lesserOrEqualPrefixes.size() == sortedPrefixes.size()) {
return ReturnCode.NEXT_ROW;
} else {
hint = sortedPrefixes.higher(largestPrefixSmallerThanQualifier);
return ReturnCode.SEEK_NEXT_USING_HINT;
}
} else {
hint = sortedPrefixes.first();
return ReturnCode.SEEK_NEXT_USING_HINT;
}
(the <hint> is used later to skip ahead to that column prefix)
See how this code snippet allows the filter to
- seek to the next row if all prefixes are know to be less or equal the current qualifier (and the largest didn't match the passed column qualifier). Note that a single seek to the next row can potentially skip millions of columns with a single seek operation.
- seek to the next larger prefix if there are more prefixes, but the current does not match the qualifier.
- seek to the first prefix (the smallest) if none the prefixes are less or equal to the current qualifier.
I'm in the process of adding more information for these Filter to the HBase
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.