Building Monorepos

Feb 25, 2019 07:07 · 732 words · 4 minute read Services Tools Practices

For my day job, I work on a brand-new monorepo-thing, where we have decided to shove all the different stuffs. This brings particular challenges to the surface, one of the first ones being “how do you build only the things you need to build?“. I foolishly decided to go the way of the Giant Shell Script of Doom for reasons I do not fathom to this day. Well. Okay, it’s a bit harsh, the approach comes with a bunch of advantages. First up though, strategy and description of said approach.

Directories all the way down

So let’s say you have a hypothetical monorepo project that has a structure that looks like this:

root/
     services/
              service-a/
              service-b/
              service-c/
     clients/
             cli/
             gui/

You have a CI thing that is going to build that directory structure. Thing is, you only want to build the portions that were changed (however unwise that might sound (believe me, it should sound unwise, I’ll cover that later)).

Git allows you to check that out fairly easily. Running git diff --name-only $(git merge-base origin/$(git rev-parse --abbrev-ref HEAD) origin/master) should give you a list of files changed files between your branch and the most recent common ancestor with master. This is what you want if you’re in CI, otherwise you run the risk of having more recently merged files output in your list.

Build files

That is all well and good, but how do you build specifically the subprojects? There’s many possible answers to that, what I picked for the sake of consistency was “more shell scripts”, because I wanted maximum flexibility. There’s a fair chance that “standardized makefiles” is a better choice. You can always leave it to the implementer to actually extract their stuff from the Makefile in a different shell script if they want to. ANYWAY.

I ended up with predetermined shell scripts named “build.sh” and “deploy.sh” in all the places where I wanted those things to happen

root/
     services/
              service-a/
                        build.sh
                        deploy.sh
              service-b/
                        build.sh
                        deploy.sh
              service-c/
                        build.sh
                        deploy.sh
     clients/
             cli/
                 build.sh
                 deploy.sh
             gui/
                 build.sh
                 deploy.sh

The “master builder script” is configured through an env var to look for a given file (by default build.sh), and it filters the files it found against the list of changes we generated earlier. It then just runs the build files it found.

The good thing about this is I can arbitrarily build things destined for essentially anything. Cloud functions, projects in different languages, an Android/iOS project, anything I can think up can potentially be built by the master script.

When I need to deploy, I reconfigure the build file I’m looking for to be the deploy.sh one. Fairly easy. It’s actually extremely flexible, too, on the condition that your script lives in the root of the directory of the stuff you’re trying to build.

Caveat Emptor

  • Shell scripts, even under 100 LoC, will scare people.

The implementors of things that live in your monorepo might be afraid of dealing with shell scripts. Most people I know are afraid of complex shell scripts. People will pick the path of least resistance, and if they’re afraid, the path of least resistance will not include “understanding the shell script”, however small it might be.

  • It’s a bad idea to not build everything every time

One of the cool things about monorepos is that you have all the code in the same place. For services, this is cool, because the clients you maintain for your service probably also live along in your monorepo. This means you have the unbelievable luxury of building everything against the latest versions you just committed and detect early if you have breaking change. The cost to this, potentially, is more expensive builds, either in terms of computation power or time. At the end of the day, it’s all tradeoffs; if you could manage a generalized way of building only the stuff you changed on which you also depend, that would probably be the best, but you probably need some form of declarative way to say “if this guy builds, also build me.” I do not have that right now, and when doing things like that, I start to wish for a more powerful programming language than Bash.

  • You probably don’t really need a monorepo.

Think about it hard, there might be ways around this that do not require you to pull a Frankenstein. Simplicity is probably preferable. I prefer simplicity.