Services and resources monitoring with Prometheus and Grafana running on Docker

Services and resources monitoring with Prometheus and Grafana running on Docker

Wojciech Gębiś - September 13, 2018

My experience with solutions for resource monitoring is quite extensive. In the past, I have dealt with many popular tools, e.g.: Nagios - perfect for real-time service monitoring, based on dedicated agents, Cacti - very common RRD-Tool based metrics, Application monitoring, e.g., JVM like JConsole, VisualVM, and many other commercial solutions. Every tool listed above is powerful, feature-rich, and dedicated to specific purposes, but there is a catch.

Software architecture now usually runs across multiple server instances, there is a whole variety of services to monitor and metrics to collect. Well, the number of those is growing with every new project I work on. At this scale, it is almost impossible to understand what is happening by overlooking multiple tools each one dedicated to a different service or metric. The constant switching between visualizing, monitoring, and alerting based on metrics from your downstream systems is just getting way too hard.

At some point in this process, new features become more and more important, e.g., centralized and unified configuration patterns, one consistent storage engine for managing, storing, and displaying all of the metrics, and finally an easy way to add and configure new Cloud Services, Web Servers, MQ brokers, etc. That is why for some time now, I have been digging deeper into this matter by looking for a new solution for centralized resource monitoring.

There are three major modules that would make up for a monitoring infrastructure:

  • metrics exporters,
  • a centralized engine for collecting and storing the data in a time-based organized database,
  • dashboard system for metrics visualization.

Moreover, it should be ready to easily integrate Docker hosts monitoring with several containers per each node as well as detailed Linux-box statistics like CPU utilization, storage activity, network traffic, etc.

As a result of my research and evaluation of many projects, finally, I have a stack of tools I want to use. My choice fell on creating the monitoring infrastructure from scratch based on Grafana and Prometheus ( https://grafana.com, https://prometheus.io).

The big picture

Prometheus Grafana

The core element of the system is Prometheus, and it is responsible for collecting and storing statistics data. Prometheus can efficiently manage many vital parameters such as a retention policy or a frequency of metrics collection. The default collection mechanism is based on easy-to-maintain endpoint scraping. The small services named Metrics Exporters expose the endpoints. I have implemented two of them: cAdvisor and node_exporter… Using these simple agents, we can create powerful data sources with all of the critical metrics to be monitored.

Components

Installation and configuration can be done via a few simple steps:

  • Prometheus can be run as a Docker container. It’s a good option to attach an external block device for collected data:

    docker run -p 9090:9090 -v /prometheus-data prom/prometheus –config.file=/prometheus-data/prometheus.yml

  • cAdvisor connects to the Docker Host and can collect tons of parameters for live monitoring of each container as well as the Docker engine itself. The recommended way to run cAdvisor is, of course, a dockerized service, it can be run on the Docker Host by ‘docker run’, e.g.:

    docker run --volume=/:/rootfs:ro --volume=/var/run:/var/run:ro --volume=/sys:/sys:ro --volume=/var/lib/docker/:/var/lib/docker:ro --volume=/dev/disk/:/dev/disk:ro --publish=8080:8080 --detach=true --name=cadvisor google/cadvisor:latest

  • In case of node_exporter, it should be deployed as a regular host service (not in a container) due to access to native hosts metrics, e.g., stored in ‘/proc’. It can be installed from source or binary packages available in the most popular Linux distribution. After installation node_exporter is listening on the localhost:9100 by default. https://github.com/prometheus/node_exporter

  • Now it can be included into the Prometheus configuration prometheus.yml, here is the snippet:

    A scrape configuration containing exactly one endpoint to scrape:

    Here it’s Prometheus itself.

    scrape_configs: # The job name is added as a label job=<job_name> to any time series scraped from this config. # metrics_path defaults to ‘/metrics’ # scheme defaults to ‘http’. - job_name: ‘default_job’ static_configs: - targets: - localhost:9100 # node_exporter - localhost:8080 # cAdvisor

  • Finally, Grafana can be run as a Docker container just like Prometheus. The default configuration uses a file database embedded in the container image - in production it should be moved to an external database.

    docker run -d -p 3000:3000 --name=grafana -e “GF_SERVER_ROOT_URL=http://grafana.server.name” -e “GF_SECURITY_ADMIN_PASSWORD=secret” grafana/grafana

What’s next?

We have the running stack. However, it was only the first step and this is when Grafana can fully come into play. Grafana is a data visualization and exploration tool for global infrastructure. You can extend this tool to fit your custom needs with many widgets and plugins to create interactive & user-friendly dashboards. So, go ahead and customize it by adding new hosts and metrics to monitor, use the graph composer to create charts and place them on dashboards.

These projects are open source and come with many useful extensions. If other resources need to be monitored, here is the Prometheus repository with many useful exporters: https://prometheus.io/docs/instrumenting/exporters/.

All of that data is accessible in Grafana dashboard. It can be freely visualized and monitored. There is a storefront of ready-to-use community built dashboards, that can be found on Grafana repository: https://grafana.com/dashboards.

Graph1 Graph2

It is also worth mentioning that the completeness of this solution is complemented by the ability to create advanced alerting rules based on defined thresholds, which can push notifications to your email or Slack whenever something out of the ordinary happens.

I hope that this short article will get your attention on the stack I described and will help you to improve your own system monitoring.

About the author

Wojciech Gębiś

Wojciech Gębiś

Project Lead & DevOps Engineer

Linkedin profile Twitter Github profile

Wojciech is a seasoned engineer with experience in development and management. He has worked on many projects and in different industries, making him very knowledgeable about what it takes to succeed in the workplace by applying Agile methodologies. Wojciech has deep knowledge about DevOps principles and Machine Learning. His practices guarantee that you can reliably build and operate a scalable AI solution.
You can find Wojciech working on open source projects or reading up on new technologies that he may want to explore more deeply.

This article is a part of

Zero Legacy
36 articles

Zero Legacy

What goes on behind the scenes in our engineering team? How do we solve large-scale technical challenges? How do we ensure our applications run smoothly? How do we perform testing and strive for clean code?

Follow our article series to get insight into our developers' current work and learn from their experience. Expect to see technical details, architecture discussions, reviews on libraries and tools we use, best practices on software quality, and maybe even some fail stories.

check it out

Zero Legacy

Insights from nexocode team just one click away

Sign up for our newsletter and don't miss out on the updates from our team on engineering and teal culture.

Done!

Thanks for joining the newsletter

Check your inbox for the confirmation email & enjoy the read!

This site uses cookies for analytical purposes.

Accept Privacy Policy

In the interests of your safety and to implement the principle of lawful, reliable and transparent processing of your personal data when using our services, we developed this document called the Privacy Policy. This document regulates the processing and protection of Users’ personal data in connection with their use of the Website and has been prepared by Nexocode.

To ensure the protection of Users' personal data, Nexocode applies appropriate organizational and technical solutions to prevent privacy breaches. Nexocode implements measures to ensure security at the level which ensures compliance with applicable Polish and European laws such as:

  1. Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation) (published in the Official Journal of the European Union L 119, p 1); Act of 10 May 2018 on personal data protection (published in the Journal of Laws of 2018, item 1000);
  2. Act of 18 July 2002 on providing services by electronic means;
  3. Telecommunications Law of 16 July 2004.

The Website is secured by the SSL protocol, which provides secure data transmission on the Internet.

1. Definitions

  1. User – a person that uses the Website, i.e. a natural person with full legal capacity, a legal person, or an organizational unit which is not a legal person to which specific provisions grant legal capacity.
  2. Nexocode – NEXOCODE sp. z o.o. with its registered office in Kraków, ul. Wadowicka 7, 30-347 Kraków, entered into the Register of Entrepreneurs of the National Court Register kept by the District Court for Kraków-Śródmieście in Kraków, 11th Commercial Department of the National Court Register, under the KRS number: 0000686992, NIP: 6762533324.
  3. Website – website run by Nexocode, at the URL: nexocode.com whose content is available to authorized persons.
  4. Cookies – small files saved by the server on the User's computer, which the server can read when when the website is accessed from the computer.
  5. SSL protocol – a special standard for transmitting data on the Internet which unlike ordinary methods of data transmission encrypts data transmission.
  6. System log – the information that the User's computer transmits to the server which may contain various data (e.g. the user’s IP number), allowing to determine the approximate location where the connection came from.
  7. IP address – individual number which is usually assigned to every computer connected to the Internet. The IP number can be permanently associated with the computer (static) or assigned to a given connection (dynamic).
  8. GDPR – Regulation 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of individuals regarding the processing of personal data and onthe free transmission of such data, repealing Directive 95/46 / EC (General Data Protection Regulation).
  9. Personal data – information about an identified or identifiable natural person ("data subject"). An identifiable natural person is a person who can be directly or indirectly identified, in particular on the basis of identifiers such as name, identification number, location data, online identifiers or one or more specific factors determining the physical, physiological, genetic, mental, economic, cultural or social identity of a natural person.
  10. Processing – any operations performed on personal data, such as collecting, recording, storing, developing, modifying, sharing, and deleting, especially when performed in IT systems.

2. Cookies

The Website is secured by the SSL protocol, which provides secure data transmission on the Internet. The Website, in accordance with art. 173 of the Telecommunications Act of 16 July 2004 of the Republic of Poland, uses Cookies, i.e. data, in particular text files, stored on the User's end device.
Cookies are used to:

  1. improve user experience and facilitate navigation on the site;
  2. help to identify returning Users who access the website using the device on which Cookies were saved;
  3. creating statistics which help to understand how the Users use websites, which allows to improve their structure and content;
  4. adjusting the content of the Website pages to specific User’s preferences and optimizing the websites website experience to the each User's individual needs.

Cookies usually contain the name of the website from which they originate, their storage time on the end device and a unique number. On our Website, we use the following types of Cookies:

  • "Session" – cookie files stored on the User's end device until the Uses logs out, leaves the website or turns off the web browser;
  • "Persistent" – cookie files stored on the User's end device for the time specified in the Cookie file parameters or until they are deleted by the User;
  • "Performance" – cookies used specifically for gathering data on how visitors use a website to measure the performance of a website;
  • "Strictly necessary" – essential for browsing the website and using its features, such as accessing secure areas of the site;
  • "Functional" – cookies enabling remembering the settings selected by the User and personalizing the User interface;
  • "First-party" – cookies stored by the Website;
  • "Third-party" – cookies derived from a website other than the Website;
  • "Facebook cookies" – You should read Facebook cookies policy: www.facebook.com
  • "Other Google cookies" – Refer to Google cookie policy: google.com

3. How System Logs work on the Website

User's activity on the Website, including the User’s Personal Data, is recorded in System Logs. The information collected in the Logs is processed primarily for purposes related to the provision of services, i.e. for the purposes of:

  • analytics – to improve the quality of services provided by us as part of the Website and adapt its functionalities to the needs of the Users. The legal basis for processing in this case is the legitimate interest of Nexocode consisting in analyzing Users' activities and their preferences;
  • fraud detection, identification and countering threats to stability and correct operation of the Website.

4. Cookie mechanism on the Website

Our site uses basic cookies that facilitate the use of its resources. Cookies contain useful information and are stored on the User's computer – our server can read them when connecting to this computer again. Most web browsers allow cookies to be stored on the User's end device by default. Each User can change their Cookie settings in the web browser settings menu: Google ChromeOpen the menu (click the three-dot icon in the upper right corner), Settings > Advanced. In the "Privacy and security" section, click the Content Settings button. In the "Cookies and site date" section you can change the following Cookie settings:

  • Deleting cookies,
  • Blocking cookies by default,
  • Default permission for cookies,
  • Saving Cookies and website data by default and clearing them when the browser is closed,
  • Specifying exceptions for Cookies for specific websites or domains

Internet Explorer 6.0 and 7.0
From the browser menu (upper right corner): Tools > Internet Options > Privacy, click the Sites button. Use the slider to set the desired level, confirm the change with the OK button.

Mozilla Firefox
browser menu: Tools > Options > Privacy and security. Activate the “Custom” field. From there, you can check a relevant field to decide whether or not to accept cookies.

Opera
Open the browser’s settings menu: Go to the Advanced section > Site Settings > Cookies and site data. From there, adjust the setting: Allow sites to save and read cookie data

Safari
In the Safari drop-down menu, select Preferences and click the Security icon.From there, select the desired security level in the "Accept cookies" area.

Disabling Cookies in your browser does not deprive you of access to the resources of the Website. Web browsers, by default, allow storing Cookies on the User's end device. Website Users can freely adjust cookie settings. The web browser allows you to delete cookies. It is also possible to automatically block cookies. Detailed information on this subject is provided in the help or documentation of the specific web browser used by the User. The User can decide not to receive Cookies by changing browser settings. However, disabling Cookies necessary for authentication, security or remembering User preferences may impact user experience, or even make the Website unusable.

5. Additional information

External links may be placed on the Website enabling Users to directly reach other website. Also, while using the Website, cookies may also be placed on the User’s device from other entities, in particular from third parties such as Google, in order to enable the use the functionalities of the Website integrated with these third parties. Each of such providers sets out the rules for the use of cookies in their privacy policy, so for security reasons we recommend that you read the privacy policy document before using these pages. We reserve the right to change this privacy policy at any time by publishing an updated version on our Website. After making the change, the privacy policy will be published on the page with a new date. For more information on the conditions of providing services, in particular the rules of using the Website, contracting, as well as the conditions of accessing content and using the Website, please refer to the the Website’s Terms and Conditions.

Nexocode Team

Close

Want to be a part of our engineering team?

Join our teal organization and work on challenging projects.

CHECK OPEN POSITIONS