Wednesday, December 21, 2011

Deletion in HBase

When a Delete command is issued through the HBase client, no data is actually deleted. Instead a tombstone marker is set, making the deleted cells effectively invisible.

User Scans and Gets automatically filter deleted cells until they get removed.
HBase periodically removes deleted cells during compactions.

The tombstone markers are only deleted during major compactions (which compacts all store files to a single one), because in order to prove that a tombstone marker has no effect HBase needs to look at all cells.

There are three types of tombstone markers:
  1. version delete marker
    Marks a single version of a column for deletion
  2. column delete marker
    Marks all versions of a column for deletion
  3. family delete marker
    Marks all versions of all columns for a column family for deletion
It is also possible to add a maximum time stamp to column and family delete markers, in which case only versions with a lower timestamp are affected by the delete marker.

HBase allows to perform timerange queries in order to see only the versions in a specified range of time. For example to see the data "as of time T" the range would be set to [0,T+1) (T+1, because in HBase the end time is exclusive).

There is one snag, though. Once a delete marker is set, all cells affected by that marker are no longer visible. If a Put for a column C was issued at time T and is followed by a column delete at time T+X, issuing a time range scan for [0, T+1) will return no data, as deleted cells are never shown.

HBASE-4536 addresses that issue. It is now possible to instruct a column family to retain deleted cells and treat them exactly like ordinary undelete cells (which means they will still contribute to version counts, and can expire with a TTL was set for the column family). This can be done in the Java client by calling HColumnDescriptor.setKeepDeletedCells(true) or through the HBase shell by setting KEEP_DELETED_CELLS=>true for a column family.

When this setting is enabled for a column family, deleted cells are visible to time range scans and gets as long as the requested range does not include the delete marker.

So in the case above a Scan or Get for [0, T+1) will return the Put that was marked as deleted. A Scan or Get for the range [0, T+X+1) will not return the Put as the range does include the delete marker.

This is very useful to provide full "as-of time" queries, for example on back up replicas for production data in case a user accidentally deleted some data.

No comments:

Post a Comment