Thursday, December 22, 2011

HBase "Raw" scans

In HBase Scans (and by extension Gets) do not retrieve deleted cells or the tombstone markers that mark them as deleted.
Sometimes is useful for trouble shooting (or backup - there will be a separate blog post about that soon) to see all cells including deleted cells and the tombstone markers.

HBASE-4536 introduces "raw" Scans (only available in HBase trunk - not the upcoming 0.92). In the Java client these are enabled by Scan.setRaw(true).

The HBase shell also supports this by adding RAW=>true to a scan.

Once raw mode is enabled the returned result contains not only the standard KeyValues, but also KeyValues for deleted cells and for tombstone markers (which are just special types of KeyValues, more on delete markers can be found here).

Here's an example of what it would look like in the shell:
hbase(main):001:0> scan 'x2', {RAW=>true, VERSIONS=>10}
ROW                   COLUMN+CELL                                               
 r1                   column=f:c, timestamp=1323323611106, value=v3             
 r1                   column=f:c, timestamp=1323323609988, type=DeleteColumn    
 r1                   column=f:c, timestamp=1323323609988, value=v2             
 r1                   column=f:c, timestamp=1323323608554, value=v1             
 r2                   column=f:c, timestamp=1323323617759, value=v3             
 r2                   column=f:c, timestamp=1323323616226, value=v2             
 r2                   column=f:c, timestamp=1323323614496, value=v1             
2 row(s) in 0.6380 seconds
In this the above example values 'v2' and 'v1' for row key 'r1' have been deleted with a column delete marker.
hbase(main):005:0> scan 'x1', {RAW=>true, VERSIONS=>10}
ROW                   COLUMN+CELL                                               
 r2                   column=f:, timestamp=1323323616226, type=DeleteFamily     
 r2                   column=f:c, timestamp=1323323617759, value=v3             
 r2                   column=f:c, timestamp=1323323616226, value=v2             
 r2                   column=f:c, timestamp=1323323614496, value=v1             
2 row(s) in 0.0500 seconds

Here 'v2' and 'v1' of row key 'r2' have been deleted with family delete marker.

Notice how the column marker is sorted in line with the cells it affects (it sorted after the cell for value 'v3'), but that the family marker is sorted before all cell of the affected row key.
The sort order was carefully designed to allow HBase to identify all cells affected by a delete marker in single forward scan through the store files(s).

2 comments: