Is that code really yours?

Black Duck's protexIP analyses open source code to mitigate intellectual property risks

As open source software pushes its way further into the enterprise, a new set of risks has arisen regarding IP (intellectual property). The problem is that developers happily borrow code from various projects to save themselves from having to reinvent it. This help is all well and good as long as the resulting software complies with the licenses of the donor projects. The problem managers have is that they cannot know what parts of their code base comes from open source projects. A code snippet reused from a newsgroup posting could actually have come from a copyrighted open source project. And its use could legally require the company to open source its entire product. If the company is an ISV, it might even be faced with being required to offer its product at no cost.

Until recently, managers had to rely on their developers to avoid this problem. Now they can automate the process of checking their code with protexIP 4.0 from Black Duck Software. The product compares in-house code with many code sources, such as open source projects, and reports on matches it finds. A supporting component enables managers and legal counsel to approve the use of specific open source licenses for borrowed code. The solution manages the license agreements and provides a bill of materials that shows the company's obligations for the open source it uses. The license management portion of the product is robust and well-designed. The code analysis and identification, however, leave much to be desired.

Prepping for flight

Black Duck offers protexIP in two basic flavours. One version, protexIP/developer, resides locally at the customer site, with two separate editions available (Enterprise and Professional), differing in functionality. I reviewed Enterprise, the higher-end version of this product. The second version, called protexIP/on-demand, is a hosted edition of the same software that is typically used by IP attorneys and acquisition specialists who need to verify the provenance of software they're examining. Due to the size of the open source database, the on-site installation tends to require its own server. This server runs only on Linux.

In all cases, the client software is Java-based, so it runs on many platforms. To evaluate a code base, you marshal the code into a directory and point protexIP there. Unfortunately, protexIP does not integrate with source-code management systems, although an SDK enables developers to write interfaces should they wish to.

The software can analyse code in many programming languages and even compare binaries with known open products. It does this analysis by creating fingerprints of the source code and comparing them to the database of code prints the company has developed over the years. It then returns a summary of its findings (see screen image) in which it identifies files as being either green (no problem), yellow (awaiting identification), blue (pending approval), or red (definite problem). These colors refer to protexIP's view of how tolerable the applicable licensing terms are to a given site. For example, the Apache license might be acceptable to many sites, whereas the viral provisions of the GPL (General Public License) might lead some companies to preclude its use. A screen used by managers or legal counsel enables approvals to be set for every kind of open source software license requirements and thereby enable protexIP to raise a warning if a match on GPL code is found, for example.

The solution also flags situations in which licenses require conflicting actions from the user. To do this, it relies on a database of more than 650 open source licenses in which it has logged all requirements of the terms of use. This license management works well and will certainly help managers who rely on elements from open source projects know what their responsibilities are.

