Fast and stable MongoDB-based tests in Spring

Piotr Kubowicz - December 7, 2020

If your Spring application uses MongoDB, there is one question you will have to answer at some point: how to set up a database instance for your tests. Until early 2020 there was not much choice and the default was to use ‘embedded Mongo’ – my team was no exception. At some point, however, we realised that embedded Mongo was turning our builds into a nightmare. This ultimately prompted us to migrate our code to a tool getting much hype recently – Testcontainers.

In this article I would like to share some of the problems connected with Flapdoodle Embedded Mongo and talk briefly about Spring tests to explain what causes those problems. I will also guide you through the setup process of MongoDB using Testcontainers, and offer some tricks to improve Testcontainers performance. Finally, I will talk about how sar and other Linux tools can be used to measure the load on your machine.

Introducing Flapdoodle Embedded Mongo

If you search the web for ‘test spring boot mongo’ or similar terms, you will very likely end up with instructions which may harm your codebase. Flapdoodle Embedded Mongo was immensely popular at some point and many tutorials were written about using it. Those articles still rank surprisingly high in search engines – things don’t just disappear from the Internet when they cease to be useful. In fact, Embedded Mongo is still recommended by the recent Spring Boot documentation.

At a first glance it looks fantastic: you don’t need to modify your production code, you don’t even need to configure anything, just add a dependency:

testRuntimeOnly('de.flapdoodle.embed:de.flapdoodle.embed.mongo:2.2.0')

…And voilà! Your tests connect to a database started just for the time of your tests being executed.

Under the hood, Spring Boot has logic detecting the presence of this library on classpath, and modifies MongoDB connections to point at an instance started by the library. The library itself handles downloading and caching MongoDB binaries suitable for your machine, and starting/stopping the server.

Our problems with Embedded Mongo

We have been following setup procedures as described in the Spring Documentation for some time. At some point, however, developers started complaining that when they executed unit tests, their computer fans started roaring and the machine became painfully slow to use. It wasn’t just a matter of perception: we found out that during tests multiple MongoDB instances were running, reaching a peak of 8 instances for a standard build or even 15 when gradle --parallel was used. Very often, after tests finished successfully, we found that embedded MongoDB processes were still running and consuming lots of CPU. It’s not fun to work if 75% of your CPU is consumed by ‘zombie’ MongoDBs.

We started to experience random CI build failures: a pipeline that usually took around 15 minutes timed out after 40 minutes, it looked like unit tests never finished.

Finally, we found ourselves at a dead end. Adding a dependency on AWS S3 client library broke our tests. They never finished, but not just on CI server – they never finished on any machine. There was some kind of classpath conflict, preventing us from implementing a business feature. Everything broken by adding a single dependency:

implementation('software.amazon.awssdk:netty-nio-client:2.6.5')

and one Spring bean:

@Bean
fun asyncHttpClient() = NettyNioAsyncHttpClient.builder().build()

What is wrong with Embedded Mongo

At the beginning of 2020 it became clear that the library is dead. You could not configure it to run a version of MongoDB newer than 4.0.2. Hardly anything happened with the library source code. Only recently, in October 2020, something started to change, but the newly released library version conflicts with all stable releases of Spring Boot, so still you are not able to test your code with MongoDB 4.2.

Another serious issue is that the library can make your tests extremely resource-hungry by starting a horde of MongoDB instances. And it’s a consequence of library design, not a mistake in the implementation.

The library integration with Spring Boot works by starting an embedded MongoDB instance when Spring context is started. Because it would take ages to execute tests with a context started for each test, Spring offers a caching mechanism. A started context is kept between tests and re-used. However, a cached context cannot be simply used in all situations – when a test expects a context with a different configuration, a fresh context has to be started. Now, the problem is that there are lots of situations where Spring considers a fresh context is needed, and these include things you do very often when writing tests:

  • you use a @MockBean
  • you enable a profile with @ActiveProfiles
  • you override some properties using @TestPropertySource

You can find a more comprehensive list in the Spring Framework documentation.

All in all, there will be many opportunities in your tests to start a fresh context, each holding its own MongoDB instance. What about ‘old’ contexts? They aren’t closed, Spring keeps them in case there is a test that will need them. If you don’t touch the spring.test.context.cache.maxSize property, Spring will keep up to 32 contexts. This means that potentially you can have 32 instances of MongoDB running simultaneously when you use Flapdoodle Embedded Mongo. Moreover, each test worker is an independent JVM, so if you have 2 subprojects and run gradle --parallel you can have up to 64 MongoDB instances, or 96 instances for 3 subprojects and so on.

MongoDB with Testcontainers

Testcontainers is a Java library allowing to start different services for use in tests. It uses Docker to fetch, start and stop those services. In our experience, this approach is way more stable than the one from Flapdoodle Embed Mongo – no more zombie processes eating CPU.

The library is a generic tool rather than a plug-in that will do anything for you under the hood, as it is the case with the Flapdoodle one. You will need a bit of code to connect Spring tests and Testcontainers. Here is how we approached this:

We have a context initializer that starts a Docker container with MongoDB when the context starts and replaces the connection URL with a one pointing at the container. To avoid starting the database for each test, we cache the container in a lazy Kotlin property (following advice from Best Practices for Unit Testing in Kotlin by Philipp Hauer).

We also defined an annotation:

which we use instead of @SpringBootTest in each test class:

Testcontainers performance

Simply replacing Flapdoodle Embed Mongo with Testcontainers makes tests execute much faster. In a small test project this makes a 30% difference.

Test execution time

We wanted to see if we can go even faster. One optimization opportunity appears in start time. By default, even if you cache the container between test runs, the container is stopped once all tests finish. So if you execute just one test from your IDE, each time you do so, you have to wait for the container to start. In this scenario Testcontainers become slower than Flapdoodle Embed Mongo by 2 seconds. It’s painful if you run your unit tests very often as you do TDD.

The situation can be improved by enabling container reuse. This mechanism marks a container as eligible to be picked by subsequent test runs and does not stop it after a test run finishes. If you go back to the code of MongoContainerSingleton, you will see the reuse feature is enabled there. This is however not enough: you have to additionally set testcontainers.reuse.enable=true in ~/.testcontainers.properties. We encourage our developers to enable it, but keep this option disabled on CI servers.

As you can see on the graph above, it improves overall execution time a bit (results with “-r” have container reuse enabled). Startup time of a single test becomes roughly the same as with Flapdoodle Embed Mongo. If you want to learn more about container reuse, check this article by Paweł Pluta.

One challenge when enabling container reuse is parallel build with multiple subprojects. If a test from project1 runs and a test from project2 is started, it will connect to the same reuse-allowing MongoDB container project1 uses. There is a danger those two tests will simultaneously write to the same collections. We solve this by making sure each subproject gets a separate database inside the same container.

We are very happy with the improvement in build stability. After moving the tests to Testcontainers, developers no longer have zombie MongoDB instances on their machines. CI builds fail much less often (well, they still do at times when Testcontainers fail to connect to Docker breaking the build). Additionally, the CI build time is much more consistent now.

Measuring the operating system load

When deciding whether to move to Testcontainers or not, we wanted to understand the consequences of switching. Some of the team members used Linux and others used MacOS. We knew that test execution time and the subjectively perceived machine load was different from person to person. There were multiple open questions:

  • Is the problem with too many Mongo instances repeatable?
  • What if Testcontainers are faster on Linux where Docker is a ‘native’ mechanism, but slower on MacOS?
  • What if Testcontainers make tests execute faster but eat all CPU and memory, so developers aren’t able to do anything while executing tests?

We decided to create a test script, assuring that performance is measured in the same way on different machines. Finding out how fast a build executes is easy: you can do it using the built-in time shell command.

However, it is not trivial to compare CPU and memory usage during test execution as there are multiple processes interacting with each other. Firstly, Gradle has a separate process executed from the command line, another one running tasks (Gradle daemon) and a separate test execution worker. Then there are processes for embedded Mongo. Testcontainers run not only MongoDB Docker containers, but also a separate container called ryuk, which is responsible for cleaning. We decided to measure the load of the whole operating system to have a big picture of the impact of the changes on the build process.

Linux has a great tool for such a purpose called sar available in package sysstat. It can collect a wide range of parameters on system performance in a reliable way, without adding much overhead. You can start it in background:

sar -o sar.binary 2 15 &

Here it will measure performance every 2 seconds, 15 times, and save results to a binary file. Then, after measurement is finished, you can extract data on CPU usage:

% sar -f sar.binary -u
12:36:05        CPU     %user     %nice   %system   %iowait    %steal     %idle
12:36:07        all      0,19      0,00      0,25      0,00      0,00     99,56
12:36:09        all     70,38      0,00      3,28      0,00      0,00     26,34
12:36:11        all     85,47      0,00      4,09      0,57      0,00      9,87

or memory usage:

% sar -f sar.binary -r | awk '{print $1 "\t" $4}'
12:36:05        kbmemused
12:36:07        1587988
12:36:09        1990692

With built-in shell tools it’s easy to quickly summarize data, for example by counting min, max and average:

cat memory.dat | awk 'BEGIN { min=99999999 } { total += $2; count++; if($2<min) min=$2; if ($2>max) max=$2; } END { print "Max\t" max "\tMin\t" min "\tAverage\t" int(total/count) }'

However, it’s hard to understand the big picture looking at the numbers alone. Visualisation can be a great help. We wanted to make the whole benchmark process fully automated, without manual pasting of data to spreadsheets. A ‘good enough’ approach was to make use of the venerable gnuplot.

CPU usage plotted by gnuplot

If you have plain text data as columns separated by whitespace (as you can see in the awk call, we used tabs), it’s easy to feed the data into gnuplot.

cat cpu.dat | gnuplot -e "set yrange [0:100]; set terminal png size 800,600; set output 'cpu.png'" base.gnuplot

To avoid repeating same gnuplot options when creating CPU graphs and memory graphs, we extracted the common part to a separate file (base.gnuplot):

set timefmt "%H:%M:%S"
set xdata time
set style data lines
plot "/dev/stdin" using 1:2 with lines notitle

Here we define time format, inform that x axis shows time, use a solid line for plotting and finally use columns 1 and 2 from the input as x and y coordinates of each point.

Unfortunately, sar makes use of Linux kernel features and is not available on MacOS. We were able to work around this by measuring memory usage by calling vm_stat command, but we think results are very inaccurate and only show general trends. We could not find a reliable way to measure global CPU usage on Mac.

Summary

Flapdoodle Embedded Mongo is a very popular library for running MongoDB for tests. While still recommended by many tutorials, it is known to cause various performance and maintainability problems. It prevents testing your code against the modern versions of MongoDB, wastes computer resources by starting too many database instances and leaving zombie processes. In some circumstances, it can block your build infinitely if you add dependencies that clash in some cryptic way with it.

Testcontainers is a new tool with a similar purpose that gains much popularity. It is actively maintained and can be used not just for MongoDB but for a wide range of systems, like different SQL and NoSQL databases and even message queues. We recommend learning it as its broad usage means it may be useful not only in your current project, but also in future ones, even if they use a different persistence mechanism. It manages test databases reliably and does not leave zombie processes.

You can check my repository comparing the performance of Flapdoodle Embedded Mongo and Testcontainers. The two projects written in Kotlin along with shell scripts automatically measuring performance are available on GitHub: https://github.com/pkubowicz/embed-vs-testcontainers.

In general, we found that Testcontainers are faster by 30% on Linux and even by 45% on MacOS. You can run tests on your own machine or see the visualization of results we had: https://pkubowicz.github.io/embed-vs-testcontainers/.

About the author

Piotr Kubowicz

Software Engineer

Piotr is a polyglot developer who has been coding in Java for over ten years. He also tried many other languages, from C and Perl to Ruby.
For the past few years, Piotr's primary focus has been on nexocode's evolving team culture and codebase by building automation and developing systems' architecture to ensure delivery is smooth even as project codebases get bigger and more complex. As an active developer in the community, you can notice him speaking at various meetups and conferences.

Tempted to work
on something
as creative?

That’s all we do.

join nexocode

More articles

Find us on

Need help with implementing AI in your business?

Let's talk blue circle

This site uses cookies for analytical purposes.

Accept Privacy Policy

In the interests of your safety and to implement the principle of lawful, reliable and transparent processing of your personal data when using our services, we developed this document called the Privacy Policy. This document regulates the processing and protection of Users’ personal data in connection with their use of the Website and has been prepared by Nexocode.

To ensure the protection of Users' personal data, Nexocode applies appropriate organizational and technical solutions to prevent privacy breaches. Nexocode implements measures to ensure security at the level which ensures compliance with applicable Polish and European laws such as:

  1. Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation) (published in the Official Journal of the European Union L 119, p 1); Act of 10 May 2018 on personal data protection (published in the Journal of Laws of 2018, item 1000);
  2. Act of 18 July 2002 on providing services by electronic means;
  3. Telecommunications Law of 16 July 2004.

The Website is secured by the SSL protocol, which provides secure data transmission on the Internet.

1. Definitions

  1. User – a person that uses the Website, i.e. a natural person with full legal capacity, a legal person, or an organizational unit which is not a legal person to which specific provisions grant legal capacity.
  2. Nexocode – NEXOCODE sp. z o.o. with its registered office in Kraków, ul. Generała Henryka Kamieńskiego 51, 30-644 Kraków, entered into the Register of Entrepreneurs of the National Court Register kept by the District Court for Kraków-Śródmieście in Kraków, 11th Commercial Department of the National Court Register, under the KRS number: 0000686992, NIP: 6762533324.
  3. Website – website run by Nexocode, at the URL: nexocode.com whose content is available to authorized persons.
  4. Cookies – small files saved by the server on the User's computer, which the server can read when when the website is accessed from the computer.
  5. SSL protocol – a special standard for transmitting data on the Internet which unlike ordinary methods of data transmission encrypts data transmission.
  6. System log – the information that the User's computer transmits to the server which may contain various data (e.g. the user’s IP number), allowing to determine the approximate location where the connection came from.
  7. IP address – individual number which is usually assigned to every computer connected to the Internet. The IP number can be permanently associated with the computer (static) or assigned to a given connection (dynamic).
  8. GDPR – Regulation 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of individuals regarding the processing of personal data and onthe free transmission of such data, repealing Directive 95/46 / EC (General Data Protection Regulation).
  9. Personal data – information about an identified or identifiable natural person ("data subject"). An identifiable natural person is a person who can be directly or indirectly identified, in particular on the basis of identifiers such as name, identification number, location data, online identifiers or one or more specific factors determining the physical, physiological, genetic, mental, economic, cultural or social identity of a natural person.
  10. Processing – any operations performed on personal data, such as collecting, recording, storing, developing, modifying, sharing, and deleting, especially when performed in IT systems.

2. Cookies

The Website is secured by the SSL protocol, which provides secure data transmission on the Internet. The Website, in accordance with art. 173 of the Telecommunications Act of 16 July 2004 of the Republic of Poland, uses Cookies, i.e. data, in particular text files, stored on the User's end device.
Cookies are used to:

  1. improve user experience and facilitate navigation on the site;
  2. help to identify returning Users who access the website using the device on which Cookies were saved;
  3. creating statistics which help to understand how the Users use websites, which allows to improve their structure and content;
  4. adjusting the content of the Website pages to specific User’s preferences and optimizing the websites website experience to the each User's individual needs.

Cookies usually contain the name of the website from which they originate, their storage time on the end device and a unique number. On our Website, we use the following types of Cookies:

  • "Session" – cookie files stored on the User's end device until the Uses logs out, leaves the website or turns off the web browser;
  • "Persistent" – cookie files stored on the User's end device for the time specified in the Cookie file parameters or until they are deleted by the User;
  • "Performance" – cookies used specifically for gathering data on how visitors use a website to measure the performance of a website;
  • "Strictly necessary" – essential for browsing the website and using its features, such as accessing secure areas of the site;
  • "Functional" – cookies enabling remembering the settings selected by the User and personalizing the User interface;
  • "First-party" – cookies stored by the Website;
  • "Third-party" – cookies derived from a website other than the Website;
  • "Facebook cookies" – You should read Facebook cookies policy: https://www.facebook.com/policy/cookies
  • "Other Google cookies" – Refer to Google cookie policy: www.google.com/policies/technologies/types/

3. How System Logs work on the Website

User's activity on the Website, including the User’s Personal Data, is recorded in System Logs. The information collected in the Logs is processed primarily for purposes related to the provision of services, i.e. for the purposes of:

  • analytics – to improve the quality of services provided by us as part of the Website and adapt its functionalities to the needs of the Users. The legal basis for processing in this case is the legitimate interest of Nexocode consisting in analyzing Users' activities and their preferences;
  • fraud detection, identification and countering threats to stability and correct operation of the Website.

4. Cookie mechanism on the Website

Our site uses basic cookies that facilitate the use of its resources. Cookies contain useful information and are stored on the User's computer – our server can read them when connecting to this computer again. Most web browsers allow cookies to be stored on the User's end device by default. Each User can change their Cookie settings in the web browser settings menu: Google ChromeOpen the menu (click the three-dot icon in the upper right corner), Settings > Advanced. In the "Privacy and security" section, click the Content Settings button. In the "Cookies and site date" section you can change the following Cookie settings:

  • Deleting cookies,
  • Blocking cookies by default,
  • Default permission for cookies,
  • Saving Cookies and website data by default and clearing them when the browser is closed,
  • Specifying exceptions for Cookies for specific websites or domains

Internet Explorer 6.0 and 7.0
From the browser menu (upper right corner): Tools > Internet Options > Privacy, click the Sites button. Use the slider to set the desired level, confirm the change with the OK button.

Mozilla Firefox
browser menu: Tools > Options > Privacy and security. Activate the “Custom” field. From there, you can check a relevant field to decide whether or not to accept cookies.

Opera
Open the browser’s settings menu: Go to the Advanced section > Site Settings > Cookies and site data. From there, adjust the setting: Allow sites to save and read cookie data

Safari
In the Safari drop-down menu, select Preferences and click the Security icon.From there, select the desired security level in the "Accept cookies" area.

Disabling Cookies in your browser does not deprive you of access to the resources of the Website. Web browsers, by default, allow storing Cookies on the User's end device. Website Users can freely adjust cookie settings. The web browser allows you to delete cookies. It is also possible to automatically block cookies. Detailed information on this subject is provided in the help or documentation of the specific web browser used by the User. The User can decide not to receive Cookies by changing browser settings. However, disabling Cookies necessary for authentication, security or remembering User preferences may impact user experience, or even make the Website unusable.

5. Additional information

External links may be placed on the Website enabling Users to directly reach other website. Also, while using the Website, cookies may also be placed on the User’s device from other entities, in particular from third parties such as Google, in order to enable the use the functionalities of the Website integrated with these third parties. Each of such providers sets out the rules for the use of cookies in their privacy policy, so for security reasons we recommend that you read the privacy policy document before using these pages. We reserve the right to change this privacy policy at any time by publishing an updated version on our Website. After making the change, the privacy policy will be published on the page with a new date. For more information on the conditions of providing services, in particular the rules of using the Website, contracting, as well as the conditions of accessing content and using the Website, please refer to the the Website’s Terms and Conditions.

Nexocode Team