Thursday, October 23, 2014

Branching - Managing Open Source and Corporate goals (HBase)

By Lars Hofhansl

Company and Open Source goals are often at odds or at least not completely aligned.
Here's how we do things for HBase (and dependent projects) at Salesforce.

  1. We do not fork any of the projects. A "fork" here being a departure from the open source repository significant enough to prevent us from contributing patches back to the open source branches or to use open source updates against our repository.
  2. We do (almost) all work against the open source branches (0.98 currently).
  3. We have internal copies of the HBase repository and all dependent projects (Hadoop, ZooKeeper, etc).
  4. We have minimal patches in our own repositories. Mostly pom changes to defined where to pull dependencies from - for example we want to build our HBase against our build of Hadoop.
    Sometimes we have an odd patch or two that have not made it back to open source.
  5. We attach internal version numbers to our builds such as 0.98.4-sfdc-2.2.1, to indicate the exact version of what we're running in production.
  6. Everything we run in production is build automatically (via jenkins jobs) from source against these internal repositories. This allows to be agile in case of emergencies.
  7. Updates to the internal repository are manual (by design). We do not track the open source branches automatically. At our own pace, when we are ready, we move to a new upstream version, which most of the time allows us to remove some of one-off patches we had applied locally. For example we stayed at 0.98.4 for a while with some patches on top, and recently moved to 0.98.7, to which we had contributed all of the patches.
  8. All internal patches are eventually cleaned up and contributed back to open source, so that we can follow along the release train of minor version (0.98.4, 0.98.5, etc).
  9. Of course we keep an eye on and spend a lot of time with the open source releases to make sure they are stable and suitable for us to use a future internal release.
With this simple model we avoid forking, tack along with the open source releases, remain agile, and remain in full control over what exactly is deployed, completely at our own pace. Open source and corporate goals do not have to be at odds.

This might all be obvious; a bit of diligence is required to support both the open source goals for a project as well as the specific corporate goals.

1 comment:

  1. thakyou it vry nice blog for beginners
    https://www.emexotechnologies.com/courses/big-data-analytics-training/big-data-hadoop-training/

    ReplyDelete