"Do you have experience with open source projects?"

In 2017, the Flemish government was searching for someone to get involved in a European open-source project called UnionVMS. This is a software product which supports the government of the fishery sector. I thought this could be an exciting project, so I took the chance to apply for the job.

During the interview, the project manager asked me:

Do you have any experience with open source projects?

At first, I thought that was a strange question. I have experience with "closed source projects", is there a big difference with open source projects?. The code is made public, yeah, but what else? At the moment of writing, we are half a year later in the project. And man, was I wrong! There is quite the difference between a "traditional" project and an open source project.

A lot of best practices, that are nice to have in a regular project, become a lot more important, even obligatory, in an open source project. Ignoring these practices cause confusion, or even frustration.

This is a lessons-learned dump from my early experiences with working on open source software.

What is an open source project?

In an open source project, the code is publicly available, to be read and adapted.

Open source projects can be backed by individuals, as well as by companies. This is possible in several ways: allocating time of you developers to work on it, paying external developers to develop features, writing documentation, ...

Open source projects are alive and kicking. Some great examples are Angular, Wildfly, Docker, ...

Why should open source projects be managed differently?

Starting and maintaining a project is hard. In the case of open source projects, this becomes even more difficult, especially from a project management view.

Let's compare a typical traditional project with a typical open source project.

Typical traditional project	Typical open source project
The project manager composes the team.	The team consists of multiple teams, and independent developers working in their free time.
Team members are, most of the times, involved for a longer period. The team size rarely changes. Their availability is assured and predictable.	External team members come and go, as they wish. Their availability is not assured.
A lead dev knows who writes code, and often knows the entire code base.	Everyone can add code. Chances are that nobody knows the entire code base.
A lead functional analyst decides on the functionality. He or she knows the functionality of the whole project.	Functionality can be added by anyone. Possibly, nobody has the complete functional picture.
The project manager decides on the agenda and deadlines. The team has a common goal.	Multiple teams/developers are contributing to the project to serve their own agenda. Some do it for fun, some do it to replace internal software. Every team has their own deadline.

How can you collaborate when you don’t have any guarantees about the members of the team, their availability, their goals or their deadlines?

Being dependent on open source software is a risk. You can lower the risk by following best practices.

Goal 1: lower barriers to contribute

Detailed documentation

I don’t even know where to start?

What’s the architecture that we need to respect? How does this product work? Where can I add functionality?

Outside the codebase

When multiple persons create and edit documentation, a wiki-like solution is needed. If your code is shared on Github, creating a Github Wiki is only a click away.

Github wiki

If you need some more functionality (like adding attachments, video, a more complete WYSIWYG editor), a strong and popular product is Atlassian's Confluence. Just like Github, Atlassian offers its tool for free when using it for an open source project.

Inside the codebase

Some argue, however, they want to keep the technical documentation within the codebase. After all, that's the place where the developers will be. If you choose to do that, ideally, the documentation is kept in a non-binary format, to allow merging changes in multiple branches.

My personal preference goes to keeping the documentation in a Markdown/Asciidoc format. These are simple syntaxes that can be converted to all kinds of documents: HTML, PDF, Word, ... As a plus, Github understands these syntaxes and renders them in a nice format.

Imagine having a short README.md file, which gives a short introduction to the purpose your software and how to use it. This is the first page that is seen on your Github page, right underneath your code. In that first page, the user can click through, to more detailled documentation pages.

The only thing that the technical writers/developers need to do, is to write content in a Markdown syntax. These Markdown files can be treated just like code: they are in the same place, can be checked in, merged, ...

Sounds ideal, right?

As a plus, generating a PDF/Word version of your documentation can be made a build step (Maven plugin for Markdown, Maven plugin for Asciidoc) of your project.

Then, the documentation will always be a part of the delivered software. That's the ideal place for an installation guide, is it not?

Have a clear installation guide

I have no idea how to set up this thing. Why doesn’t it just run!?

Installing the software is the very first step for everyone who is going to contribute to this software. If that’s already hard, who is going to take the effort to start coding?

Next to describing the necessary installation steps and software to be installed, you could go the extra mile to make the installation super-easy: have your project Dockerized, or have it put in a VM for Vagrant. These tools allow you to script the necessary project set-up. That script can be run with one simple command, in any (supported) environment.

Have as less dependencies as possible

Integrations with external systems

"Cannot deploy: web service not reachable." We don’t have access to that web service, let alone use it. I cannot use this software!?

Let's say you want to integrate the project with an (internal) system. What is the impact for users that don't have access to, or don't want to use, that system?

An architectural solution could be to support "modules", or "plugins". The core of the project can be used by anyone. If you want integrations with other systems, you can install the necessary plugin.

Choose your technologies wisely

What is that script doing here… selenium.py? Do I need Python to run this? Hmm then I need to install some software, figure out how to run it. Meh… these tests will not be broken after my changes, will they? Someone will tell me if they are.

Be picky on the used technologies. A developer needs to have these installed. This also applies to build- and test tools. Don't use one for module A, and another for module B

Not everyone has a C: drive

"Cannot deploy: path C:/temp is invalid." Yeah, duuuh, I’m not running on Windows. Why would anyone even hardcode that, stupid Windows users. Now I need to adapt this code to get it simply running, ugh.Fancy Mac-fanboy and developer

Don’t add Windows-specific stuff, like hardcoded references to C:. Yes, a lot of developers use Windows as operating system, especially in professional context. But there are UNIX (OXS, Linux) developers out there too. Don’t shut them out!

Disclaimer: written by a fervent Mac-user :)

Goal 2: keep an eye on quality

Invest in deep unit test- and end-to end test coverage

Aaaaand… the other team has broken our code once again. Congratulations, you egoistic piece of ****

Since developers don’t know the entire codebase or all functionality, it’s easy to break something. And as effort is been made voluntarily, bugs may not be fixed if they have no time / it is not in their interest.

Therefore, it's very profitable to build a good safety net. This net consists of both unit- as integration- as end-to-end tests, which can be run easily (effortless) by any developer, before and after modifying the code.

Run tests (automatically) before and after merging code

There are failing tests! Is this code correct? Is my environment set up properly? Is it ok to change this code?

Writing tests is great, but know that tests have zero value if nobody runs them.

At least before and after changing code, tests should run. I would go even further: make it impossible to merge code without running all the tests successfully!

In order to do that, you first need a continuous integration server, like Jenkins or Travis. This is the machine that will build the code and run the tests.

The positive effects of a great friendhip

The next step is to introduce your version control system to the continuous integration server.

Github, meet Jenkins. Jenkins, this is Github.

Imagine that your source control and your build server become great friends: they call each other up to go to a movie, build new code, run some tests. Awesome!

Whenever code is being merged to one of the main branches (dev, master), the continuous integration server gets a call from his friend source control to build the new code and run the tests!

Travis Jenkins

What is source control wants to buy a shirt that really doesn't suit him? Or wants to merge code that's broken. His friend, the build server, warns him not to do it! What a team!

Not convinced on the need of a continuous integration server? Please have a look at my blogpost "Dear manager, we need a continuous integration server"

Release attentfully and properly

From which codebase should I start? Is this code stable?

What I understand under this, is:

having clear version numbers, and making a distinction between finished and unfinished functionality. Users need to feel safe being dependent on a release.
it should be simple to go to the codebase of a certain version.
the latest stable release should be found on the master branch.
having the code of the next release, which you are working on, on the dev branch
having clear tags, to easily jump to the codebase of one version to the other.

Agree on, and document, the expectations

The other team is really lazy. They don’t document anything! They don’t write enough tests!

What coding style needs to be respected? What documentation should I write? Which are the minimal requirements regarding metrics (test coverage etc.)?

An example:

the code should compile and deploy all the time
all tests should pass at all times*
all business logic should be thoroughly unit tested
a minimum percentage of unit test coverage is 60%
for all main futures, end-to-end tests should exist for all sunny cases and rainy cases
at least one code review should happen before accepting a pull request
when Sonar reports critical issues, these need to be fixed
all methods of interfaces should be explained with comments
For every module, an installation guide and list of dependencies should be provided in a README.md file

A nice way to brag about the quality of the your project, is with badges. With shields.io, you can create badges for all kinds of tools, from the build status in Travis to the number of stars you get on the Google Play Store. This tutorial from Egghead.io is a great start.

Shields

Goal 3: open communication

The most important tool for collaboration is understanding each other. How many bad decisions could have been avoided if people just took the time to sit down and listen to each other's needs?

Keep an overview of what has been implemented, what’s being implemented and what we would like to have implemented. A bug list.

Can we trust this software? Is it stable? Does it meet our needs?Project manager

Who is working on what? What does the future holds? Are there any known bugs reported? These are all important questions for someone who wants to use your software.

Don't leave your user dwelling on the unknown. A good open source project has some kind of overview of the planned features and known bugs.

If your code is on Github, the first place where people will log bugs/change request is on the Issues page.

When your project is being built/maintained by a full-time team, you will probably want to manage work in iterations. In that case, having a public Jira can help. Bugs and change request can be reported on the backlog, and your team picks them when the time is right, starting a sprint.

If your project is on Github, and you choose to use another system to track work items, don't forget to disable the GitHub Issues page to avoid having bugs reports and change requests spread around.

Github issue list Jira issue list

Create an overview of deadlines. Even better, make and keep commitments!

What is that other team doing! We shouldn’t focus on that! This feature needs to be implemented earlier, we won’t meet our deadline otherwise!Project manager

Can we trust the other party to deliver in time?Project manager

Collaboration means: being dependent on each other. In order to do that, trust is needed.

Lead by example: voice your commitments loud and clear, and keep them! A regular update about how it goes is always welcome for your collaborators.

Isn’t anyone else developing the same feature?

We will use someone else’s module, but how should we use it? What’s the contract?

This gives a clear indication of where you are going, what to expect in the next release.

To be able to do this, you need a clear distinction between stable and fresh-out-of-the-oven, possibly unstable, code. In other words: agree on a good branching model, like this one.

Support!

I have no clue what this thing does. Can we reuse anything? Does someone else also have these problems with this dependency?

Can you dedicate a part of your time to providing support for the code that you have written?

Be clear about what kind of support, and how much, people can expect of you. How can they reach the team members?

If you can, be available on a chat service, like Slack or Gitter. That's real service!

Formalize agreements

What was that stupid-$*#@ developer even thinking. LEARN TO PROGRAM PROPERLY!!!!

The best way to tackle frustration, is to support understanding. Why have things been built this way? What was being discussed and decided on?

Best practices in open source projects