HBase: More HBase GC tuning

By Lars Hofhansl

My article on hbase-gc-tuning-observations explores how to configure the garbage collector for HBase.

There is actually a bit more to it, especially when block encoding is enabled for a column family and the predominant access is via the scan API with row caching.

Block encoding currently requires HBase to materialize each KeyValue after decoding during scanning, and hence this has the potential to produce a lot of garbage for each scan RPC, especially when the scan response is large as might be the case when scanner caching is set to larger value (see Scan.getCaching())

My experiments show that in that case it is better to run with a larger young gen of 512MB (-Xmn512m) and - crucially - make sure that all per RPC garbage across all handlers actively performing scans fits into the survivor space. (Note that this statement is true whether or not block encoding is used. Block encoding just produces a lot more garbage).

HBase actually has a way to limit the size of an individual scan response by setting hbase.client.scanner.max.result.size.

Quick recap:

The Hotspot JVM divides the heap into PermGen, Tenured Gen, and the YoungGen. YoungGen itself is divided into Eden and two survivor spaces.

By default the survivor ratio is 8 (i.e. each survivor space is 1/8 of Eden; together their sizes add up to the configured young gen size)

What to do?

With -Xmn512m this comes to ~51MB for each of the two survivor spaces.

Now you want to set hbase.client.scanner.max.result.size such that the expected number of a handler threads times the max.result.size is less than each of the survivor spaces.

With 30 handlers (default in HBase as of 0.98) this comes to 1.7MB, since not all handlers will always scan using the full buffer 2MB is probably a good setting.

Makes sense, doesn't? If per scan results across all active handlers cannot fit into the survivor space the collector has no choice but to promote to the tenured generation. That is exactly the scenario one would like to avoid as we would slowly polute the tenured gen with per PRC garbage, eventually requiring a full GC to defragment.

A 2MB response also happens to be a good size for 1ge networks. 2MB will take at least 16ms to be transmitted, which is higher than typical intra-datacenter latency of 0.1-1ms. With 10ge this would need to reviewed as 2MB can be send over 10ge is 1.6ms.

TL;DR:

When using block encoding make sure #handlers * max.results.size < survivor space, and use a slightly larger young generation:

-Xmn512mb (in hbase-env.sh)

hbase.client.scanner.max.result.size = 2097152 (in hbase-site.xml)

Update, January 31st, 2015
Since HBase versions 0.98 and later produce a little bit more garbage than 0.94 due to using protobuf, I am now generally recommending a young gen of 512mb for those versions of HBase.

And the same reasoning goes for writes, when batching writes make sure the batch sizes are around 2MB, so that they can temporarily fir into the survivor generation.

HBase

Monday, January 12, 2015

More HBase GC tuning

Quick recap:

What to do?

TL;DR:

No comments:

Post a Comment