simon_phipps
Columnist

GitHub needs to take open source seriously

analysis
Nov 30, 20128 mins

The legal details of copyright licensing are complex and off-putting -- but that doesn't mean they should be ignored

Some of the would-be cool kids of software say we are in the “post open source” world. Several weeks ago, James Governor, founder of analyst firm RedMonk, put it this way on Twitter: “younger devs today are about POSS – Post open source software. f*** the license and governance, just commit to github.”

But as Outercurve Foundation’s CTO Stephen Walli replied, “promiscuous sharing w/out a license leads to software transmitted diseases.” Since then, I have heard more and more people mention this trend of regarding the copyright and collaboration terms of a project as irrelevant bureaucracy. Appealing as it may be to treat the wisdom of the years as pointless, doing so creates a problem for the future.

[ GitHub CEO: We’re helping software eat the world | Track the latest trends in open source with InfoWorld’s Technology: Open Source newsletter. ]

I’ve seen this devil-may-care attitude crop up in many contexts, but one of the most important places it has manifested itself has been the popular source-code hosting site GitHub, which offers version control and project hosting using the Git version control system pioneered by Linus Torvalds for the Linux kernel. While GitHub is a commercial system, it offers unlimited free-of-charge usage for public projects. But what are the legal terms under which those public projects are made available?

Whose code is it, anyway? A casual survey of the projects on GitHub by a specialized analyst revealed that as many as half include no easily identifiable copyright licensing information. About 30 percent include some sort of licensing information in the source files, and around 20 percent have a clear license or notice file that makes it obvious under what terms the code is made available.

You don’t have to include a copyright statement for your creative work to be under copyright. In any country that’s a signatory to the Berne Convention, copyright — or stronger — is the default as soon as something is created. If you completely ignore the subject, all your work is copyrighted to you (or to your employer in many cases), and anyone who copies it to use or improve it is in breach of your copyright.

What are the terms under which the code in all those GitHub projects is made available? A precise answer depends on your jurisdiction and would require a lawyer’s advice, but it’s likely that the answer for most people is “all rights reserved” — in other words, you have no rights to use the code. GitHub does not include any useful default licensing terms in its terms of service; the most likely scenario is that any use of the copyrighted material in one of those no-license projects is formally a breach of copyright. Under copyright law, code without a license cannot be legally shared, as the default for copyrighted materials is that all rights are reserved.

Brian Doll, GitHub’s VP of Marketing, confirmed this arrangement is intentional:

Code without an explicit license is protected by copyright and is by default All Rights Reserved. The person or people who wrote the code are protected as such. Any time you’re using software you didn’t write, licensing should be considered and abided.

Ironically, this situation exists because the founders of GitHub want to ease code sharing. They were worried that selecting a license for a new project was so difficult that requiring new project initiators would be a barrier to the adoption of GitHub. But completely ignoring the issue is just as bad, because it exposes every participant in an unlicensed GitHub project to the risk that subsequently, license terms will be imposed that they don’t like and would not have accepted at the inception of the project. Worse, it introduces the risk that using the code in the project could result in litigation for copyright infringement in the future.

This is exactly the problem open source was invented to solve. Open source licenses provide a copyright that gives everyone the freedom to make copies so that they can use, study, improve, and share it without asking permission. When code is licensed under open source, the situation for those collaborating with you over your source code is as simple as it can possibly be. You don’t have to be familiar with open source licenses; you can just rest assured that, if there’s an OSI-approved license applied to the code, you may legally and freely use and improve the code for any purpose.

Risky business Who is at risk? People who fork projects on GitHub believing that gives them the right to use the code — it doesn’t. People who accept pull requests believing that gives them the right to use the code — it probably doesn’t. In both cases, a user with little care for or understanding of copyright licensing may well believe that sharing is fine, then discover down the road they’ve been violating someone’s “all rights reserved” copyright.

Doll pointed out some default language in the terms of service: “People put code on GitHub in public repositories because they want to share them with the world. That is what GitHub is for, collaboration around software projects. The expectation, then, further clarified by our terms of service, is that by placing code on GitHub they are allowing anyone to view and fork those repositories.”

However, since GitHub has not elaborated or defined those terms “view” and “fork,” its users can have no certainty about their use of the code. Can they use it to start a business? Can they publish it in a book? Can they give training courses based upon it? The questions are endless, and the language used doesn’t give any answers strong enough to rely on without expensive case-by-case legal advice. It creates a severe imbalance empowering the initial project copyright owner at the expense of collaborators.

The terms also appear to make no mention of the status of pull requests. Once again, the absence of either an open source license around each pull transaction or of any form of certification of ownership and originality means there’s massive uncertainty that one day will “blow someone’s leg off.”

All the time everyone is friendly, things appear to be good. But then one day something will go badly wrong. When Oracle sued SAP for copyright infringement based on materials that had been assumed to be reasonably shared between two other companies, PeopleSoft and TomorrowNow, the cause was exactly this sort of ill-defined sharing between friends, which — when shifted by acquisitions into conflict between competitors — cost SAP an enormous financial and market penalty.

By “hiding” copyright licensing issues and delivering capabilities in ways that make people believe a concern for licensing is outdated, GitHub has encouraged platform growth by appealing to younger developers’ “licensing is for losers” sensibilities, at the expense of the long-term consequences those users face from potential copyright infringement.

The open source solution There are several levels of potential solution to this problem. At the most basic, GitHub could modify its terms of service so that all materials made available through the service are licensed by default under the likes of the broad and permissive Apache license or the Creative Commons Attribution license (preferably both). Text like “you agree your copyrights are licensed to all users under the Apache v2 license unless you assert otherwise” would set a safe baseline, while allowing the default to be easily overridden by any project.

At a more advanced level, new projects could be asked to pick from three preferred licenses (Apache v2, MPLv2 and GPLv3) with the option to write in a different choice or select it from a list structured using the OSI’s antiproliferation sorting order. This is the approach taken by most open source “forges.” Thirdly and optimally, all this information would also be encoded in a machine-readable way using a standard like SPDX with the project data so that third-party systems like Ohloh could automatically evaluate project terms and governance.

Any of these three would fix the problem; the third would serve the wider open source community well. I made these suggestions to Brian Doll, who told me, “We’re always improving GitHub, so it’s entirely possible that some day we may make license selection more prominent within the GitHub experience.”

Meanwhile, several of the people I’ve interviewed suggested a grassroots temporary fix. If you want to participate in a project on GitHub and discover it has no copyright license, simply make a pull request adding one — the Apache License v2 could be a good choice, offering maximum flexibility while also ensuring mutual patent safety for participants. If the omission is a simple oversight on the part of the project owner, they will probably accept the request, solving the problem for everyone. If they don’t, stay well clear.

It’s good to simplify the process of sharing on the Internet, and GitHub has created an enormously important resource. But ignoring a significant and serious aspect of code curation — the copyright license under which it is shared — is the wrong answer. It’s certain to end in serious problems for someone, and I hope GitHub will rapidly take action to address this gap in its system. Until then, tread carefully and avoid projects with no license terms.

This article, “GitHub needs to take open source seriously,” was originally published at InfoWorld.com. Read more of the Open Sources blog and follow the latest developments in open source at InfoWorld.com. For the latest business technology news, followInfoWorld.com on Twitter.

simon_phipps

Simon Phipps is a well-known and respected leader in the free software community, having been involved at a strategic level in some of the world's leading technology companies and open source communities. He worked with open standards in the 1980s, on the first commercial collaborative conferencing software in the 1990s, helped introduce both Java and XML at IBM and as head of open source at Sun Microsystems opened their whole software portfolio including Java. Today he's managing director of Meshed Insights Ltd and president of the Open Source Initiative and a directory of the Open Rights Group and the Document Foundation. All opinions expressed are his own.

More from this author