Open source projects have produced some of the most sophisticated pieces of software while defying conventional wisdom about collaborative projects, according to researchers at the University of California Davis. The interdisciplinary research team, consisting of academics from the fields of computer science, mechanical and aeronautical engineering and management, has recently been awarded a three-year, $US750,000 grant from the US National Science Foundation (NSF) to investigate the open source phenomenon.
The researchers suspect that the structure of open source software will affect the way its developers are organized, and vice versa. The study will focus on the operation of developers of the Apache Web server, the PostgreSQL database and the Python scripting language, through information from message boards, bug reports and e-mail discussions.
Liz Tay speaks with professor of computer science and principal investigator of the study, Premkumar Devanbu.
How would you describe open source projects?
Open source projects adopt the approach of making the source code freely available to anyone who wants to read it. This flies in the face of most commercial software development, which regards the source code as the "family jewels" that must be protected at all costs from the prying eyes of competitors. While this approach may prima facie seem subversive or "socialistic", in fact it follows in a long tradition in scientific research of freely publishing and discussing ideas. If you regard software as nothing more than "hard-coded ideas", it seems completely natural. I'm certainly simplifying a lot here; I would refer people to books by Professor Weber of the University of California in Berkeley (an economist) and Professor Benkler (a Law professor) who have much more authoritative words on the matter. This is just my personal perspective as a computer scientist.
What is most interesting about how open source software is written?
To me, the most interesting part of it is how well they [open source projects] work. They are immensely successful - look at Apache, MySQL, Linux, Perl, etc - they are basically taking over the Web on the server side. I think this success is attributable for the same reasons why most software succeeds: they deliver the features customers want, with great speed and high quality (and of course at low cost). The reason for this, according to Eric Raymond (and others who have taken up this issue since) is the free flow of information about the system, via the source code, to the members of the community. There is a belief that this leads to rapid isolation, diagnosis and remedy of defects that would take traditional projects much longer to fix. Beyond defect isolation, it also makes possible for users to become developers...if you want a feature, you can figure out how to add it, and put it in.
From a researcher's perspective, OSS projects, by their nature, expose comprehensive longitudinal narratives of artifact evolution, social structure of artifact creators, and interactions with general users. This narrative is a valuable source of data for testing hypotheses relevant to software engineering practice.
What prompted you and your team to research this topic? Have you any personal interest in open source?
I have been writing software, teaching software engineering and researching software engineering tools and processes for more than 30 years now, and the phenomenon of open source confounds so many things that we've learned and taught over the years. One striking phenomenon in (traditional) software engineering projects is what is called "Conway's Law": essentially, it states that artifact structure recapitulates social structure. Thus, if you give an organization with two sub-teams the task of writing a compiler, they'll produce a two-pass compiler; if there are three teams, they'll produce a three-pass compiler, and so on. My colleagues and I are eager to see how this phenomenon plays out in open source projects.
First, in open source, the organization is not created by fiat, but evolves organically; second, whatever organization exists, it is more fully observable via the e-mail archives and IRC archives. In traditional projects, people always find ways of doing an end-run around the organizational structures that exist, in order to get their job done. In open source, the interaction between organizational structure and social structure is explicitly observable; the longitudinal study of this, is the goal of our project.
How do you plan on carrying out the research?
It's fairly traditional empirical software engineering - formulate hypotheses, extract data, and test it. The one difference is that we have a very high-powered team, with expertise in complexity physics, bio-informatics, and statistics. We hope to use novel methods in network theory, linear algebra, and physics of complex systems (that have yielded fruit in other areas like biology and social science) to study open source software systems.
What do you expect to find?
We hope to understand how/why some open source projects succeed and others don't; we hope to understand why some open source systems are highly innovative and dynamic while others are not; we hope to understand the process by which people are attracted to, and retained by, open source systems; we hope to understand how the social structure influences the redesign of the system, and vice versa.
It should be noted that while these phenomena are more easily observed and studied in open source projects, the lessons learned are universally applicable to software projects, and perhaps more broadly to complex human endeavours. Efforts are under way to bring the OSS approach to news creation, knowledge creation, etc ... our results would be relevant to these endeavours.
How do you feel about being awarded the grant?
Delighted. Funding for NSF and other research programs in the US have been dwindling, while the competition for grants has increased. Many colleagues have become discouraged, and I know of several who have even left the US and moved to Canada and Europe. We are therefore very grateful to the NSF for its support of our work in these constrained times.