89

Leandro Latorre España

A citizen from a pale blue dot

Leandro Latorre

9-Minute Read

Archivadores

Introduction

When creating a new project or migrating an existing one during a roadmap change in architecture, a question that often generates doubts and confusion among teams is whether to use a monorepo or a multirepo. The decision criteria for this often leans towards believing in the modernity of one of the options or in the non-existent association between the project’s own architecture and the way the code is organized. It also depends on the experience with repositories that has been gained for continuous integration circuits.

Hey, didn’t you know that Facebook uses a monorepo?
What? But how can they have such a structure for so much code? Don’t they know the problems and outdatedness of the solution? They must be the only ones!
Google also does it, as well as countless other companies that handle huge amounts of code.
Well, we will use a multirepo, we want to go to SaaS and we have multiple teams.
Those are not sufficient conditions to make that decision.

With the implementation of SOLID and DRY principles, improvements in continuous integration tools, GIT repositories, and methodologies for working in large and divided teams, the number of projects that have adapted to working with multirepositories has rapidly increased. This involves assigning people with permissions to these repositories according to the layer, technology or responsibility in which they are found. The use of microservices and SaaS orientation has also encouraged the division of repositories. But does using a monorepo imply having a monolith?

Monolith vs Components

An application with a monolithic structure is one in which there is no separation of components or layers. In contrast, in a structure based on components (such as microservices), there is a separation between components, services or layers of the application.

However, not everything is black or white. We can have applications that are not a single monolith, or that are not completely divided between layers and components, but something in between. For example, a monolith of the frontend layer and a microservices application in the backend layer, or a frontend application with web components and a monolithic backend application.

Organizing Your Project

Repositories are the way we organize our code. Depending on the specific situation of our application, we can decide whether it tends towards a monolith or division, and choose to host our code in:

monorepo -> monolith
monorepo -> components
multirepo -> components

Some examples of how to divide our application could be:

image repositories

Let’s look at the advantages and disadvantages of using multi-repositories, and then the implications of using monorepositories.

Advantages of multirepo

Independent cloning

There is no need to download the entire project to work, but only the parts needed for each task. This decreases the complexity of a large project when using version control functionality.

Permission management and atomization of responsibility

Permissions can be assigned by repository. Only the people involved in each repository will have the corresponding permissions to view or modify the code. In this way, you only have permissions to act on your area of responsibility.

Independent evolution in versioning

Each component can evolve independently, giving teams the freedom to continue their work in a more isolated way, which provides the ability to make decisions about their library without having to wait or make decisions together.

It is possible to build a piece of software within the continuous integration pipeline, providing its own development and testing lifecycle.

Code reuse

The use of independent libraries is easily reusable, both by components of the same project and by other projects.

For example, a project could have a repository for an architecture component or a utility that can be reused in several other projects, even migrating its code to another repository.

Multirepo disadvantages

Loss of global vision

If your scope only covers some parts of the project, or a small part, you lose the focus on global goals and the functional knowledge of part of the application. The ability to prioritize tasks is also affected, since you only have knowledge relative to the part you work with.

Difficulty in introducing to the project

For someone new to the project, having to set up all the necessary pieces to work locally can be a pain compared to the option of having everything downloaded, visible, and functional through a single download.

Division of teams

The more divided the teams are, the more they will try to reach their own objectives, affecting the global prioritization of tasks.

The party of dependencies

The independent versioning evolution of libraries means that components have to be constantly on alert to adapt to new evolutions. This often results in components that remain outdated using old dependencies, or worse, in conflicts between versions of dependencies and the diamond problem. In general, maintaining a system with many versioned artifacts adds a lot of complexity to the layers.

Cross-project changes

Teams need to constantly update their dependencies and know the state of the rest, which adds complexity to the system, as there is no visibility on the rest. Most of the time, it will involve complicated manual management between different teams and/or the creation of scripts that inform, which does not end with the problems produced. Since each team can advance at a different speed, there is a risk of ending up using versions that are not updated but should be.

Monorepo advantages

From the disadvantages of using multirepo, the advantages of using monorepo can be inferred.

Unified versioning and single source of truth

Centralized organization means that teams have an easier time getting into and setting up the project, as well as a wider functional understanding of the project. A single repository provides unified version control and a single source of truth. There is no confusion about which repository hosts the authorized version of a file. If one team wants to depend on another team’s code, they can depend on it directly.

Code reuse

Containing a considerable amount of useful libraries in one place leads to greater knowledge exchange as well as greater code reuse.

Simplified dependency management

Dependency management can easily be simplified by grouping common versioning of artifacts or projects.

Updating various features of the project is also much better, which helps immediately detect any updates that need to be made in your library in the project with respect to the advances made by other teams. There are tools that can trigger a reconstruction of dependent code. We avoid the diamond problem of dependencies, since there is a single source of truth and the problem of independent version control of dependencies disappears.

image diamond problem

The diamond problem occurs when A depends on B and C, and both B and C depend on D, but B requires version D.1 and C requires version D.2. In most cases, it is now impossible to compile A. For the base library D, it can be very difficult to release a new version without causing breakages, as all components that use it must be updated at the same time. Updating is difficult when the components that call the library are hosted in different repositories.

Tip! Dependency mediation in Maven: When Maven encounters multiple versions of the same dependency, it uses the version of the dependency closest to your project in the dependency tree. You can ensure a version by explicitly declaring it in your project’s POM. If dependencies are transitive, if two dependency versions are at the same depth in the dependency tree, Maven will use the first one declared.

With the use of a monorepo, changes made to a dependency are immediately propagated throughout the tree to all products that use it, making it much easier for the person in charge to update all involved code, avoiding technical debt when problems arise in the future.

Atomic Changes

The ability to make atomic changes is also a very powerful feature of the monolithic model. In large-scale refactorings, a developer can make a significant change by touching hundreds or thousands of files in the repository in a single operation. For example, a developer can change the name of a class or function in a single commit and still not break any compilation or testing.

For larger changes, such as increasing the version of a framework, a person or team can also take on the refactoring more effectively, as having the complete tree in the repository means that changes that need to be made due to code breakage can be easily located.

Code Visibility and Collaboration Between Teams

Another attribute of a monolithic repository is that the design of the code base is better understood, as it is organized in a single tree. People involved in each layer or component of the code have an overview of the project, know where they are, and can more comfortably collaborate, change the boundaries of their scope of access to layers/components, or move between different teams.

Monorepo Disadvantages

Tooling

Using a monorepo involves tooling for its management. The most well-known tools are sufficient for performing the continuous integration pipeline, but special configuration is necessary, as you don’t want to build and deploy all the parts that make up the project. Loading a large project into your IDE can be heavy, although there are special tools for handling this.

Code Maintenance

While it is easier to detect and perform required updates and technical debt, the obligation to keep your entire project up to date requires greater constant effort, even though it will be compensated later. Using branches for long-term tasks involves a greater effort in merging and updating the main branch, given the many changes that the main development branch may have undergone.

Conclusion

When choosing which type of repository to use, an option should be agreed upon by the project’s architecture team. Large companies such as Google, Facebook, Microsoft, Uber, Airbnb, or Twitter use monorepos, although they have adopted the necessary stack and trained their teams for it.

In Google’s own opinion,

“The monolithic model of source code management is not for everyone. It is better suited for organizations like Google, with an open and collaborative culture. It would not work well for organizations where a large portion of the codebase is private or hidden among groups. At Google, we have found that, with some investment, the monolithic model of source management can successfully scale to a codebase with over a billion files, 35 million commits, and thousands of users worldwide.”

comments powered by Disqus

Recent Posts

Categories