Wednesday, September 12, 2012

HBase client timeouts

The HBase client is a somewhat jumbled mess of layers with unintended nested retries, nested connection pools, etc. among others. Mixed in are connections to the Zookeeper ensemble.

It is important to realize that the client directly handles all communication with the RegionServers, there is no proxy at the server side. Consequently the client needs to do the service discovery and caching as well as the connection and thread management necessary. And hence some of the complexity is understandable: The client is part of the cluster.

See also this blog post. Before HBASE-5682 a client would potentially never recover when it could not reach the cluster. And before HBASE-4805 and HBASE-6326, a client could not - with good conscience - be used in a long running ApplicationServer.

An important aspect of any client library is what I like to call "time to exception". If things go wrong the client should (at least as an option) fail fast and let the calling application - which has the necessary semantic context - decide how to handle this situation.

Unfortunately the HBase and Zookeeper clients were not designed with this in mind.

Among the various time outs are:
  • ZK session timeout (zookeeper.session.timeout)
  • RPC timeout (hbase.rpc.timeout)
  • RecoverableZookeeper retry count and retry wait (zookeeper.recovery.retry, zookeeper.recovery.retry.intervalmill)
  • Client retry count and wait (hbase.client.retries.number, hbase.client.pause)
In some error paths these retry loops are nested, so that in the default setting if both ZK and HBase are down a client will throw an Exception after a whooping 20 Minutes! The application has no chance to react to outages in any meaningful way.

HBASE-6326 fixes one issue, where .META. and -ROOT- lookups would be nested, each time causing a ZK timeout N^2 times (N being the client retry count, 10 by default), which itself would be retried by RecoverableZookeeper (3 by default).

The defaults for some of these settings are optimized for the various server side components. If the network "blips" for five seconds the RegionServers should not abort themselves. So a session timeout of 180s makes sense there.

For clients running inside a stateless ApplicationServer the design goals are different. Short timeouts of five seconds seem reasonable. A failure is quickly detected and the application can react (potentially by controlled retrying).

With the fixes in the various jiras mentioned above, it is now possible (in HBase 0.94+) to set the various retry counts and timeouts to low values and get reasonably short timespans after which the client would report a connection error to calling application thread.
And this is in fact what should done when the HBaseClient (HTable, etc) is used inside an ApplicationServer for HBase requests that are synchronous in the calling thread (for example a web server serving data from HBase).

7 comments:

  1. Good day! Great article, but can you publish the example of configuration which sets this timeouts, because I use HBase 0.94.2 and we still have this huge connection periods...

    ReplyDelete
  2. We have the following setting for some of our clients:

    hbase.client.retries.number = 3
    hbase.client.pause = 1000
    zookeeper.recovery.retry = 1 (i.e. no retry)

    We want client to fail fast when the cluster is down, but still be able to ride over moved regions.
    Lately we also switched to a circuit breaker design, where we asynchronously check the availability of the HBase cluster and fail a client immediately when it is down.

    ReplyDelete
    Replies
    1. Hey Lars! Can you expand on how you implemented the circuit breaker design?

      Delete
    2. Bit belated. Here's an article about this: http://doc.akka.io/docs/akka/snapshot/common/circuitbreaker.html

      Delete
  3. What about HTable.{get,set}OperationTimeout() ?
    How does this per-table setting interact with the global settings discussed above ?
    I can find very little documentation on {get,set}OperationTimeout() beyond a listing
    of their signatures in the javadoc API documentation.

    ReplyDelete
  4. The operation timeout can be set globally through hbase.client.operation.timeout, or on a per-table basis via HTable.{get,set}OperationTimeout() (as you wrote).
    It allows to override hbase.rpc.timeout selectively.

    Looking at the HTable code... It is not used for all operations, which is very confusing.

    ReplyDelete
  5. good article, i did setup my hbase on cluster with 3 slave node + the master, when i run ./start-hbase.sh on one of these nodes (slave 2)i get connection time out.
    can u please help to solve this issue, thanks

    ReplyDelete