If your Spring application uses MongoDB, there is one question you will have to answer at some point: how to set up a database instance for your tests. Until early 2020 there was not much choice and the default was to use ’embedded Mongo’ – my team was no exception. At some point, however, we realised that embedded Mongo was turning our builds into a nightmare. This ultimately prompted us to migrate our code to a tool getting much hype recently – Testcontainers.
In this article I would like to share some of the problems connected with Flapdoodle Embedded Mongo and talk briefly about Spring tests to explain what causes those problems. I will also guide you through the setup process of MongoDB using Testcontainers, and offer some tricks to improve Testcontainers performance. Finally, I will talk about how sar and other Linux tools can be used to measure the load on your machine.
Introducing Flapdoodle Embedded Mongo
If you search the web for ’test spring boot mongo’ or similar terms, you will very likely end up with instructions which may harm your codebase. Flapdoodle Embedded Mongo was immensely popular at some point and many tutorials were written about using it. Those articles still rank surprisingly high in search engines – things don’t just disappear from the Internet when they cease to be useful. In fact, Embedded Mongo is still recommended by the
recent Spring Boot documentation.
At a first glance it looks fantastic: you don’t need to modify your production code, you don’t even need to configure anything, just add a dependency:
…And voilà! Your tests connect to a database started just for the time of your tests being executed.
Under the hood, Spring Boot has logic detecting the presence of this library on classpath, and modifies MongoDB connections to point at an instance started by the library. The library itself handles downloading and caching MongoDB binaries suitable for your machine, and starting/stopping the server.
Our problems with Embedded Mongo
Stories on software engineering straight to your inbox
We have been following setup procedures as described in the Spring Documentation for some time. At some point, however, developers started complaining that when they executed unit tests, their computer fans started roaring and the machine became painfully slow to use. It wasn’t just a matter of perception: we found out that during tests multiple MongoDB instances were running, reaching a peak of 8 instances for a standard build or even 15 when gradle --parallel was used. Very often, after tests finished successfully, we found that embedded MongoDB processes were still running and consuming lots of CPU. It’s not fun to work if 75% of your CPU is consumed by ‘zombie’ MongoDBs.
We started to experience random CI build failures: a pipeline that usually took around 15 minutes timed out after 40 minutes, it looked like unit tests never finished.
Finally, we found ourselves at a dead end. Adding a dependency on AWS S3 client library broke our tests. They never finished, but not just on CI server – they never finished on any machine. There was some kind of classpath conflict, preventing us from implementing a business feature. Everything broken by adding a single dependency:
@Bean
fun asyncHttpClient() = NettyNioAsyncHttpClient.builder().build()
What is wrong with Embedded Mongo
At the beginning of 2020 it became clear that the library is dead. You could not configure it to run a version of MongoDB newer than 4.0.2. Hardly anything happened with the library source code. Only recently, in October 2020, something
started to change, but the newly released library version conflicts with all stable releases of Spring Boot, so still you are not able to test your code with MongoDB 4.2.
Another serious issue is that the library can make your tests extremely resource-hungry by starting a horde of MongoDB instances. And it’s a consequence of library design, not a mistake in the implementation.
The library integration with Spring Boot works by starting an embedded MongoDB instance when Spring context is started. Because it would take ages to execute tests with a context started for each test, Spring offers a caching mechanism. A started context is kept between tests and re-used. However, a cached context cannot be simply used in all situations – when a test expects a context with a different configuration, a fresh context has to be started. Now, the problem is that there are lots of situations where Spring considers a fresh context is needed, and these include things you do very often when writing tests:
you use a @MockBean
you enable a profile with @ActiveProfiles
you override some properties using @TestPropertySource
All in all, there will be many opportunities in your tests to start a fresh context, each holding its own MongoDB instance. What about ‘old’ contexts? They aren’t closed, Spring keeps them in case there is a test that will need them. If you don’t touch the spring.test.context.cache.maxSize property, Spring will keep up to 32 contexts. This means that potentially you can have 32 instances of MongoDB running simultaneously when you use Flapdoodle Embedded Mongo. Moreover, each test worker is an independent JVM, so if you have 2 subprojects and run gradle --parallel you can have up to 64 MongoDB instances, or 96 instances for 3 subprojects and so on.
MongoDB with Testcontainers
Testcontainers is a Java library allowing to start different services for use in tests. It uses Docker to fetch, start and stop those services. In our experience, this approach is way more stable than the one from Flapdoodle Embed Mongo – no more zombie processes eating CPU.
The library is a generic tool rather than a plug-in that will do anything for you under the hood, as it is the case with the Flapdoodle one. You will need a bit of code to connect Spring tests and Testcontainers. Here is how we approached this:
class MongoInitializer : ApplicationContextInitializer<ConfigurableApplicationContext> {
override fun initialize(context: ConfigurableApplicationContext) {
val addedProperties = listOf(
"spring.data.mongodb.uri=${MongoContainerSingleton.instance.replicaSetUrl}"
)
TestPropertyValues.of(addedProperties).applyTo(context.environment)
}
}
object MongoContainerSingleton {
val instance: MongoDBContainer by lazy { startMongoContainer() }
private fun startMongoContainer(): MongoDBContainer =
MongoDBContainer("mongo:4.2.11")
.withReuse(true)
.apply { start() }
}
We have a context initializer that starts a Docker container with MongoDB when the context starts and replaces the connection URL with a one pointing at the container. To avoid starting the database for each test, we cache the container in a lazy Kotlin property (following advice from
Best Practices for Unit Testing in Kotlin by Philipp Hauer).
We also defined an annotation:
@Target(CLASS)
@SpringBootTest
@ContextConfiguration(initializers = [MongoInitializer::class])
annotation class MongoSpringBootTest
which we use instead of @SpringBootTest in each test class:
@MongoSpringBootTest
class MyTest {
@Test
fun test1() {
Testcontainers performance
Simply replacing Flapdoodle Embed Mongo with Testcontainers makes tests execute much faster. In a small test project this makes a 30% difference.
We wanted to see if we can go even faster. One optimization opportunity appears in start time. By default, even if you cache the container between test runs, the container is stopped once all tests finish. So if you execute just one test from your IDE, each time you do so, you have to wait for the container to start. In this scenario Testcontainers become slower than Flapdoodle Embed Mongo by 2 seconds. It’s painful if you run your unit tests very often as you do TDD.
The situation can be improved by enabling container reuse. This mechanism marks a container as eligible to be picked by subsequent test runs and does not stop it after a test run finishes. If you go back to the code of MongoContainerSingleton, you will see the reuse feature is enabled there. This is however not enough: you have to additionally set testcontainers.reuse.enable=true in ~/.testcontainers.properties. We encourage our developers to enable it, but keep this option disabled on CI servers.
As you can see on the graph above, it improves overall execution time a bit (results with “-r” have container reuse enabled). Startup time of a single test becomes roughly the same as with Flapdoodle Embed Mongo. If you want to learn more about container reuse, check
this article by Paweł Pluta.
One challenge when enabling container reuse is parallel build with multiple subprojects. If a test from project1 runs and a test from project2 is started, it will connect to the same reuse-allowing MongoDB container project1 uses. There is a danger those two tests will simultaneously write to the same collections. We solve this by making sure each subproject gets a separate database inside the same container.
We are very happy with the improvement in build stability. After moving the tests to Testcontainers, developers no longer have zombie MongoDB instances on their machines. CI builds fail much less often (well, they still do at times when Testcontainers fail to connect to Docker breaking the build). Additionally, the CI build time is much more consistent now.
Measuring the operating system load
When deciding whether to move to Testcontainers or not, we wanted to understand the consequences of switching. Some of the team members used Linux and others used MacOS. We knew that test execution time and the subjectively perceived machine load was different from person to person. There were multiple open questions:
Is the problem with too many Mongo instances repeatable?
What if Testcontainers are faster on Linux where Docker is a ’native’ mechanism, but slower on MacOS?
What if Testcontainers make tests execute faster but eat all CPU and memory, so developers aren’t able to do anything while executing tests?
We decided to create a test script, assuring that performance is measured in the same way on different machines. Finding out how fast a build executes is easy: you can do it using the built-in time shell command.
However, it is not trivial to compare CPU and memory usage during test execution as there are multiple processes interacting with each other. Firstly,
Gradle has a separate process executed from the command line, another one running tasks (Gradle daemon) and a separate test execution worker. Then there are processes for embedded Mongo. Testcontainers run not only MongoDB Docker containers, but also a separate container called ryuk, which is responsible for cleaning. We decided to measure the load of the whole operating system to have a big picture of the impact of the changes on the build process.
Linux has a great tool for such a purpose called
sar available in package sysstat. It can collect a wide range of parameters on system performance in a reliable way, without adding much overhead. You can start it in background:
sar -o sar.binary 2 15 &
Here it will measure performance every 2 seconds, 15 times, and save results to a binary file. Then, after measurement is finished, you can extract data on CPU usage:
% sar -f sar.binary -u
12:36:05 CPU %user %nice %system %iowait %steal %idle
12:36:07 all 0,19 0,00 0,25 0,00 0,00 99,56
12:36:09 all 70,38 0,00 3,28 0,00 0,00 26,34
12:36:11 all 85,47 0,00 4,09 0,57 0,00 9,87
With built-in shell tools it’s easy to quickly summarize data, for example by counting min, max and average:
cat memory.dat | awk 'BEGIN { min=99999999 } { total += $2; count++; if($2<min) min=$2; if ($2>max) max=$2; } END { print "Max\t" max "\tMin\t" min "\tAverage\t" int(total/count) }'
However, it’s hard to understand the big picture looking at the numbers alone. Visualisation can be a great help. We wanted to make the whole benchmark process fully automated, without manual pasting of data to spreadsheets. A ‘good enough’ approach was to make use of the venerable
gnuplot.
If you have plain text data as columns separated by whitespace (as you can see in the awk call, we used tabs), it’s easy to feed the data into gnuplot.
cat cpu.dat | gnuplot -e "set yrange [0:100]; set terminal .webp size 800,600; set output 'cpu.webp'" base.gnuplot
To avoid repeating same gnuplot options when creating CPU graphs and memory graphs, we extracted the common part to a separate file (base.gnuplot):
set timefmt "%H:%M:%S"
set xdata time
set style data lines
plot "/dev/stdin" using 1:2 with lines notitle
Here we define time format, inform that x axis
shows time, use a solid line for plotting and finally use columns 1 and 2 from the input as x and y coordinates of each point.
Unfortunately, sar makes use of Linux kernel features and is not available on MacOS. We were able to work around this by measuring memory usage by calling
vm_stat command, but we think results are very inaccurate and only show general trends. We could not find a reliable way to measure global CPU usage on Mac.
Summary
Flapdoodle Embedded Mongo is a very popular library for running MongoDB for tests. While still recommended by many tutorials, it is known to cause various performance and maintainability problems. It prevents testing your code against the modern versions of MongoDB, wastes computer resources by starting too many database instances and leaving zombie processes. In some circumstances, it can block your build infinitely if you add dependencies that clash in some cryptic way with it.
Testcontainers is a new tool with a similar purpose that gains much popularity. It is actively maintained and can be used not just for MongoDB but for a wide range of systems, like different SQL and NoSQL databases and even message queues. We recommend learning it as its broad usage means it may be useful not only in your current project, but also in future ones, even if they use a different persistence mechanism. It manages test databases reliably and does not leave zombie processes.
Are you interested in this topic? Read our article about MongoDB schema design for SQL developers
here.
You can check my repository comparing the performance of Flapdoodle Embedded Mongo and Testcontainers. The two projects written in Kotlin along with shell scripts automatically measuring performance are available on GitHub:
https://github.com/pkubowicz/embed-vs-testcontainers.
In general, we found that Testcontainers are faster by 30% on Linux and even by 45% on MacOS. You can run tests on your own machine or see the visualization of results we had:
https://pkubowicz.github.io/embed-vs-testcontainers/.
Piotr is a polyglot developer who has been coding in Java for over ten years. He also tried many other languages, from C and Perl to Ruby. During his time at nexocode, Piotr's primary focus has been on evolving team culture and ongoing projects by developing build automation and systems architecture to ensure delivery is smooth even as project codebases get bigger and more complex. As an active developer in the community, you can notice him speaking at various meetups and conferences.
What goes on behind the scenes in our engineering team? How do we solve large-scale technical challenges? How do we ensure our applications run smoothly? How do we perform testing and strive for clean code?
Follow our article series to get insight into our developers' current work and learn from their experience. Expect to see technical details, architecture discussions, reviews on libraries and tools we use, best practices on software quality, and maybe even some fail stories.
In the interests of your safety and to implement the principle of lawful, reliable and transparent
processing of your personal data when using our services, we developed this document called the
Privacy Policy. This document regulates the processing and protection of Users’ personal data in
connection with their use of the Website and has been prepared by Nexocode.
To ensure the protection of Users' personal data, Nexocode applies appropriate organizational and
technical solutions to prevent privacy breaches. Nexocode implements measures to ensure security at
the level which ensures compliance with applicable Polish and European laws such as:
Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on
the protection of natural persons with regard to the processing of personal data and on the free
movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation)
(published in the Official Journal of the European Union L 119, p 1);
Act of 10 May 2018 on personal data protection (published in the Journal of Laws of 2018, item
1000);
Act of 18 July 2002 on providing services by electronic means;
Telecommunications Law of 16 July 2004.
The Website is secured by the SSL protocol, which provides secure data transmission on the Internet.
1. Definitions
User – a person that uses the Website, i.e. a natural person with full legal capacity, a legal
person, or an organizational unit which is not a legal person to which specific provisions grant
legal capacity.
Nexocode – NEXOCODE sp. z o.o. with its registered office in Kraków, ul. Wadowicka 7, 30-347 Kraków, entered into the Register of Entrepreneurs of the National Court
Register kept by the District Court for Kraków-Śródmieście in Kraków, 11th Commercial Department
of the National Court Register, under the KRS number: 0000686992, NIP: 6762533324.
Website – website run by Nexocode, at the URL: nexocode.com whose content is available to
authorized persons.
Cookies – small files saved by the server on the User's computer, which the server can read when
when the website is accessed from the computer.
SSL protocol – a special standard for transmitting data on the Internet which unlike ordinary
methods of data transmission encrypts data transmission.
System log – the information that the User's computer transmits to the server which may contain
various data (e.g. the user’s IP number), allowing to determine the approximate location where
the connection came from.
IP address – individual number which is usually assigned to every computer connected to the
Internet. The IP number can be permanently associated with the computer (static) or assigned to
a given connection (dynamic).
GDPR – Regulation 2016/679 of the European Parliament and of the Council of 27 April 2016 on the
protection of individuals regarding the processing of personal data and onthe free transmission
of such data, repealing Directive 95/46 / EC (General Data Protection Regulation).
Personal data – information about an identified or identifiable natural person ("data subject").
An identifiable natural person is a person who can be directly or indirectly identified, in
particular on the basis of identifiers such as name, identification number, location data,
online identifiers or one or more specific factors determining the physical, physiological,
genetic, mental, economic, cultural or social identity of a natural person.
Processing – any operations performed on personal data, such as collecting, recording, storing,
developing, modifying, sharing, and deleting, especially when performed in IT systems.
2. Cookies
The Website is secured by the SSL protocol, which provides secure data transmission on the Internet.
The Website, in accordance with art. 173 of the Telecommunications Act of 16 July 2004 of the
Republic of Poland, uses Cookies, i.e. data, in particular text files, stored on the User's end
device. Cookies are used to:
improve user experience and facilitate navigation on the site;
help to identify returning Users who access the website using the device on which Cookies were
saved;
creating statistics which help to understand how the Users use websites, which allows to improve
their structure and content;
adjusting the content of the Website pages to specific User’s preferences and optimizing the
websites website experience to the each User's individual needs.
Cookies usually contain the name of the website from which they originate, their storage time on the
end device and a unique number. On our Website, we use the following types of Cookies:
"Session" – cookie files stored on the User's end device until the Uses logs out, leaves the
website or turns off the web browser;
"Persistent" – cookie files stored on the User's end device for the time specified in the Cookie
file parameters or until they are deleted by the User;
"Performance" – cookies used specifically for gathering data on how visitors use a website to
measure the performance of a website;
"Strictly necessary" – essential for browsing the website and using its features, such as
accessing secure areas of the site;
"Functional" – cookies enabling remembering the settings selected by the User and personalizing
the User interface;
"First-party" – cookies stored by the Website;
"Third-party" – cookies derived from a website other than the Website;
"Facebook cookies" – You should read Facebook cookies policy: www.facebook.com
"Other Google cookies" – Refer to Google cookie policy: google.com
3. How System Logs work on the Website
User's activity on the Website, including the User’s Personal Data, is recorded in System Logs. The
information collected in the Logs is processed primarily for purposes related to the provision of
services, i.e. for the purposes of:
analytics – to improve the quality of services provided by us as part of the Website and adapt
its functionalities to the needs of the Users. The legal basis for processing in this case is
the legitimate interest of Nexocode consisting in analyzing Users' activities and their
preferences;
fraud detection, identification and countering threats to stability and correct operation of the
Website.
4. Cookie mechanism on the Website
Our site uses basic cookies that facilitate the use of its resources. Cookies contain useful
information
and are stored on the User's computer – our server can read them when connecting to this computer
again.
Most web browsers allow cookies to be stored on the User's end device by default. Each User can
change
their Cookie settings in the web browser settings menu:
Google ChromeOpen the menu (click the three-dot icon in the upper right corner), Settings >
Advanced. In
the "Privacy and security" section, click the Content Settings button. In the "Cookies and site
date"
section you can change the following Cookie settings:
Deleting cookies,
Blocking cookies by default,
Default permission for cookies,
Saving Cookies and website data by default and clearing them when the browser is closed,
Specifying exceptions for Cookies for specific websites or domains
Internet Explorer 6.0 and 7.0
From the browser menu (upper right corner): Tools > Internet Options >
Privacy, click the Sites button. Use the slider to set the desired level, confirm the change with
the OK
button.
Mozilla Firefox
browser menu: Tools > Options > Privacy and security. Activate the “Custom” field.
From
there, you can check a relevant field to decide whether or not to accept cookies.
Opera
Open the browser’s settings menu: Go to the Advanced section > Site Settings > Cookies and site
data. From there, adjust the setting: Allow sites to save and read cookie data
Safari
In the Safari drop-down menu, select Preferences and click the Security icon.From there,
select
the desired security level in the "Accept cookies" area.
Disabling Cookies in your browser does not deprive you of access to the resources of the Website.
Web
browsers, by default, allow storing Cookies on the User's end device. Website Users can freely
adjust
cookie settings. The web browser allows you to delete cookies. It is also possible to automatically
block cookies. Detailed information on this subject is provided in the help or documentation of the
specific web browser used by the User. The User can decide not to receive Cookies by changing
browser
settings. However, disabling Cookies necessary for authentication, security or remembering User
preferences may impact user experience, or even make the Website unusable.
5. Additional information
External links may be placed on the Website enabling Users to directly reach other website. Also,
while
using the Website, cookies may also be placed on the User’s device from other entities, in
particular
from third parties such as Google, in order to enable the use the functionalities of the Website
integrated with these third parties. Each of such providers sets out the rules for the use of
cookies in
their privacy policy, so for security reasons we recommend that you read the privacy policy document
before using these pages.
We reserve the right to change this privacy policy at any time by publishing an updated version on
our
Website. After making the change, the privacy policy will be published on the page with a new date.
For
more information on the conditions of providing services, in particular the rules of using the
Website,
contracting, as well as the conditions of accessing content and using the Website, please refer to
the
the Website’s Terms and Conditions.
Nexocode Team
Want to be a part of our engineering team?
Join our teal organization and work on challenging projects.