What to look for in a Git management system that won’t limit your DevOps ambitions

Take into consideration performance, Git sprawl and repository hosting options

This vendor-written tech primer has been edited by Network World to eliminate product promotion, but readers should note it will likely favor the submitter’s approach.

DevOps adoption has exploded with the changes in product development lifecycles, but the expansion has not been without collateral damage for companies using Git, the popular code management system.  Here are some specific challenges Git poses for DevOps and suggestions for addressing those challenges.

But let’s start by giving Git its due. Git has become ubiquitous for good reason. Developers love its flexibility, speed, and ability to branch locally and in-place for juggling lots of tasks. But things are a bit different when you stir in DevOps, as we’ll see:

* Performance. Git’s protocol for transferring data over the network is very efficient, so initial DevOps impressions can be highly favorable. But a number of performance-related issues tend to bubble up from the details of Git’s internal implementation over time, especially when handling large files or large numbers of files.

For example, working with content in Git alone is an all-or-nothing proposition—that is, every clone of a repository includes every file in that repository. Many users have long desired support for narrow cloning, the ability to limit the files and folders included in a clone/fetch operation, but Git still doesn’t support this feature. This issue isn’t typically a problem for small projects, but larger projects can easily chew through disk space and time as a result, especially with frequent builds.

Second, Git doesn’t handle large binary assets well. Such content can quickly bloat the size of a repository. Worse, because some important Git functions require calculating and/or recalculating content hash values, large binary assets can cause unexpected and even pathological slowdowns with certain commands while others execute almost instantly.

Best Practices:

  • Standardize on a Git management system that offers narrow cloning
  • Use shallow cloning as much as possible to limit file revisions
  • Employ an external store for your large binary assets such as Git LFS

* Organization. The performance problems just mentioned often lead to “Git sprawl,” which refers to the proliferation of small repositories in order to maintain acceptable performance. Again, this factor generally isn’t a problem for small projects, but medium-size or large projects can require dozens, hundreds, or even thousands of repositories.

Git sprawl may not complicate developers’ lives all that much at first, particularly if care is given to divide content along logical project boundaries. But its burden falls immediately on DevOps’ shoulders because having everything in the right place at the right time is essential for builds, testing and the rest of the development pipeline. Getting a fresh copy of hundreds or thousands of repositories for a single build can be a lot of data transfer, even under the best of circumstances.

Organizations that practice component-based development will also “enjoy” the additional complexity of handling component versions. It’s one thing to stitch together many repositories, but the burden of making sure you subsequently consume only the correct component versions in every case is potentially worse. Git’s ability to define sub-repositories is interesting and can be useful in simple cases, but it leaves a lot to be desired for substantive projects, so use it with care (if at all).

There also remains the question of overall branching strategy given the need for consistency across all of those projects, components and repositories. Git by itself is the unadulterated Wild West when it comes to branching. But what makes sense to individual developers, or even separate teams, can drive DevOps crazy trying to keep up with different branching structures across so many repositories. It is therefore crucial to define a branching strategy and workflow and stick with them.

Best Practices:

  • Look for ahead for handling large numbers of small (usually < 1 GB) repositories
  • Build a branching strategy around the shape of your teams and internal processes
  • Use build artifact management systems to simplify component referencing issues

* Hosting. One of the key questions any organization embracing Git must face is how to host and manage repositories, and the resulting answers greatly affect DevOps and information security. Developers want hosting that makes it easy to create new projects, clone or fork and push their work, preferably with great review tools and a simple process to deliver units of work to the master branch.

But this desire for simplicity is often at odds with DevOps’ need for scalable hosting. Shrinking continuous delivery cycles, in conjunction with Git sprawl, can easily lead to servers failing under the load. And Git management systems vary widely in this regard, some offering only single-server topology and others providing clustering and even high availability. A robust DevOps pipeline requires considering all of these factors as well as a plan to handle disaster recovery.

Developers’ wishes may also contradict the need to secure the intellectual property they create. Git’s design offers only authentication, not authorization. That is, Git offers a mechanism to ensure that the person committing is who he or she claims to be, but it leaves the question of what that person can do entirely to the file system. That’s great for teams building open source software—unsurprising, given that’s the task for which Git was designed—but it doesn’t work so well for the enterprise.

You’ll want to consider how to shape your security battlefield to accommodate all the necessary roles and permissions. This is especially true if you’re relying on third-party teams for outsourcing because it can be critical to ensure that teams access only what they’re supposed to see. Those in regulated industries should consider carefully the ramifications of choosing Git in light of its ability to destroy history.

Git hosting grows even more difficult for organizations spanning the globe. Although Git’s protocol works well over the Internet, the tendency toward Git sprawl more than nullifies that advantage. Be sure your Git hosting solution makes it easy to synchronize work across servers in different locations around the world; anything else is just setting up your DevOps team to fail.

Best Practices:

  • Prefer a Git hosting solution with broad scalability options and high availability
  • Don’t treat disaster recovery as an option; have a solid plan and exercise it often
  • Make sure your Git hosting solution gives you all the security and flexibility you need

Git management solutions are increasingly popular and plentiful, but they are not all created equal. These differences may be most apparent when it comes to DevOps’ needs. Developers often have the luxury of working with just a small set of the overall content that goes into many modern projects, but DevOps has to manage everything to produce all the intermediary and final results. Failing to consider DevOps’ needs when adopting Git can only lead to trouble.