NoSQL (almost) for everyone. Choose wisely.

Wojciech Marusarz - April 2, 2019

For several years, it seems that there is hype for using NoSQL over Relational Databases, as a storage solution well suited for current needs. A glance at Popularity Chart shows an exciting thing. Oracle, MySQL, and Microsoft SQL Server stay at the same level probably due to legacy projects and developers habits. Databases that gain the most on popularity are PostgreSQL (Relational) and MongoDB (NoSQL).

If NoSQL DB’s, as said are designed for modern apps and growing data amounts, let’s think why developers still reach both for PostgreSQL and MongoDB. Maybe opinion that Relational DBS are obsolete is not fair according to them?

In this article we will show when to use non-relational MongoDB and Relational DB’s, and how to project MongoDB schema, to empower efficient queries. Mostly we will focus on Distributed Database Systems, which due to changing requirements for systems gains on popularity.

A bit of history for a better understanding

Generally, databases are used to store data. Before the era of web applications, the most common use case was to store well-structured data for various institutions in Relational DBS.

Since the World Wide Web was created, it started to change. WWW was invented in March 1989, until now, there were defined three periods in its evolution.

  • Web 1.0 - a first version based on read-only web pages and hyperlinks between them
  • Web 2.0 - current state, with services capable of read/write operations to enhance human interaction
  • Web 3.0 - not yet here, it is expected to bring Semantic Web capabilities to process unstructured data to understand context and user intent. Web 3.0 called Web of Data, rely on inter-machine communication and algorithms to provide rich interaction via diverse human-computer interfaces

As we can see, with Web evolution, changed the way how we use our data, how we process them and the most imported changed amount and structure of this data. Structured Query Language existed even before the WWW. First papers referring to the relational model of data were published in 1970, but SQL was initially developed at IBM in 1974.

Not only SQL (NoSQL) on the other hand is much more modern and supervenes web evolution, rising at the same time as Web 2.0 technologies. NoSQL foundations grew up with changing needs of applications, but to understand better this needs let’s get to know a bit of theory.

CAP Theorem, ACID, BASE acronyms

Before we decide which database to use, let’s formulate some requirements for distributed systems basing on CAP theorem. We will define two operating modes for Distributed Database Systems, and we will show how Relational DDBS and NoSQL Databases fulfill those requirements.

CAP Theorem

CAP theorem refers to DDBS, and it says, that it’s not possible for a distributed computer system to simultaneously provide consistency, availability, and partition tolerance guarantees. CAP stands for:

Consistency- ensures that data is the same across the cluster, so you can read from or write to any node and get the same data.

Availability- says that every request receives a response from cluster even if a node in the cluster goes down, but without a guarantee that it contains the most recent version of the information.

Partition tolerance - means that the cluster continues to function even if there is a “partition” (communication break) between any two nodes which are up, but can’t communicate.

For DDBS where the CAP theorem applies, we can select two of three from above, which leads us to combinations of guarantees that we require from DDBS:

AP: Highly available and partition tolerant, but not consistent. It means that nodes remain online even if they can’t communicate with each other and will resync data once the partition is resolved (communication is online again), but it is not guaranteed that all nodes will have the same data.

CP: Consistent and partition tolerant, but not highly available. It means that data is consistent between all nodes, and maintains partition tolerance, preventing data desync, by becoming unavailable when a node goes down.

CA: Highly available and consistent, but not partition tolerant. It means that data is consistent between all nodes - as long as all nodes are online - and we can read/write from any node and be sure that the data is the same, but if a partition between nodes happens, the data will be out of sync and won’t re-sync once the partition is resolved.

Since Partition Tolerance is rather a mandatory requirement for Distributed DBS, we need to prioritize between Consistency and Availability. This is known as the Availability/Consistency Tradeoff.

ACID vs BASE

The Availability/Consistency tradeoff requires a choice between two options:

  • Fulfill CP - we need a Distributed DBS that guarantees ACID
  • Fulfill AP - we need a Distributed DBS that guarantees BASE

Relational Databases Management Systems are mostly designed to be compatible with ACID guarantees; on the other hand, NoSQL databases like MongoDB are designed to be compatible with BASE guarantees.

Having solved Availability/Consistency tradeoff, we should already know do we need ACID (RDBS) or BASE (NoSQL) system, but before we make a final decision, let’s see what this guarantees gave us, and what they mean.

ACID

ACID is intended to guarantee validity even in the event of errors or power failures. ACID describes database transaction properties which are:

Atomicity - an executed whole transaction is all or nothing. Any failure causes the entire transaction to fail, what leaves the database unchanged.

Consistency - ensures that all transactions result in a valid state of the database and that all validation rules and constraints are met.

Isolation - ensures that concurrent execution of transactions leaves the database in the same state that would have been obtained if the transactions were executed sequentially.

Durability - guarantees that once a transaction has been committed, it will remain committed even in the case of a system failure.

The ACID model for transactions strongly favors consistency over availability and is not without criticism. That led to an alternate model called BASE, which is highly scalable transactional model focused on availability.

BASE

The BASE model is designed to loosening the requirements for immediate consistency, data freshness, and accuracy to gain benefits, like scalability and resilience. BASE describes database transaction properties which are:

Basically available - this states that the system does guarantee the availability of the data as regards CAP Theorem. There will be a response to any request, but that response could still be ‘failure’ to obtain the requested data, or the data may be in an inconsistent or changing state.

Soft state - indicates that given eventual consistency, the system may be in a changing state until the consistency is reached.

Eventual consistency - means that the system will eventually become consistent once it stops receiving input. The data will propagate to everywhere it should sooner or later, but the system will continue to receive input and is not checking the consistency of every transaction before it moves onto the next one.

Eventual consistency is considered an optimistic replication model, as opposed to ACID, which is considered a pessimistic replication model.

These concepts and potential tradeoffs are important to consider when selecting database technologies, as each address and prioritize requirements in different ways.

When to use MongoDB in short words?

MongoDB was built with high availability from the ground. Scaling and sharding are the most common patterns for MongoDB use cases. Relational DBS scale vertically by using more efficient servers, but horizontal scaling can be a challenge for them. Easily horizontal scaling using built-in sharding and replica sets for data replication and offloading primary servers from the read load can help developers to store massive data sets more effectively.

MongoDB is a general purpose database. Thanks to document-oriented approach, with non-defined attributes that can be modified on the fly, it has a flexible schema design, which is a crucial contrast between MongoDB and relational databases.

Being able to store documents inside a collection that can have different properties can help both during the development phase but also in ingesting data from heterogeneous sources that may or may not have the same properties. Having the ability to deep nest attributes into documents, add arrays of values into attributes and all the while being able to search and index these fields helps application developers exploit the schema-less nature of MongoDB

You can read about MongoDB use cases here.

MongoDB criticism

MongoDB schema-less nature is its huge advantage, but it is also a big point of debate and argument. Schema-less can be beneficial in many use cases as it allows for heterogeneous data to be dumped into the database without complex cleansing or ending up with lots of empty columns or blocks of text stuffed into a single column. On the other hand, this is a double-edged sword as a developer may end up with many documents in a collection that have loose semantics in their fields, and it becomes tough to extract this semantics at the code level. What we can have in the end if schema design is not optimal, is a plain datastore rather than a database.

Summary

I hope that now we have a better understanding of all the pros and cons of RDBS and NoSQL databases. Both solutions according to CAP theorem have their strength and weakness. Selecting between RDBS and NoSQL is a decision that is dependent on system requirements and available data structure.

We need to select MongoDB when the data structure is, or availability and horizontal scaling are the priority.

About the author

Wojciech Marusarz

Software Engineer

Wojciech enjoys working with small teams where the quality of the code and the project's direction are essential. In the long run, this allows him to have a broad understanding of the subject, develop personally and look for challenges. He deals with programming in Java and Kotlin. Additionally, Wojciech is interested in Big Data tools, making him a perfect candidate for various Data-Intensive Application implementations.

Tempted to work
on something
as creative?

That’s all we do.

join nexocode

Find us on

Need help with implementing AI in your business?

Let's talk blue circle

This site uses cookies for analytical purposes.

Accept Privacy Policy

In the interests of your safety and to implement the principle of lawful, reliable and transparent processing of your personal data when using our services, we developed this document called the Privacy Policy. This document regulates the processing and protection of Users’ personal data in connection with their use of the Website and has been prepared by Nexocode.

To ensure the protection of Users' personal data, Nexocode applies appropriate organizational and technical solutions to prevent privacy breaches. Nexocode implements measures to ensure security at the level which ensures compliance with applicable Polish and European laws such as:

  1. Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation) (published in the Official Journal of the European Union L 119, p 1); Act of 10 May 2018 on personal data protection (published in the Journal of Laws of 2018, item 1000);
  2. Act of 18 July 2002 on providing services by electronic means;
  3. Telecommunications Law of 16 July 2004.

The Website is secured by the SSL protocol, which provides secure data transmission on the Internet.

1. Definitions

  1. User – a person that uses the Website, i.e. a natural person with full legal capacity, a legal person, or an organizational unit which is not a legal person to which specific provisions grant legal capacity.
  2. Nexocode – NEXOCODE sp. z o.o. with its registered office in Kraków, ul. Generała Henryka Kamieńskiego 51, 30-644 Kraków, entered into the Register of Entrepreneurs of the National Court Register kept by the District Court for Kraków-Śródmieście in Kraków, 11th Commercial Department of the National Court Register, under the KRS number: 0000686992, NIP: 6762533324.
  3. Website – website run by Nexocode, at the URL: nexocode.com whose content is available to authorized persons.
  4. Cookies – small files saved by the server on the User's computer, which the server can read when when the website is accessed from the computer.
  5. SSL protocol – a special standard for transmitting data on the Internet which unlike ordinary methods of data transmission encrypts data transmission.
  6. System log – the information that the User's computer transmits to the server which may contain various data (e.g. the user’s IP number), allowing to determine the approximate location where the connection came from.
  7. IP address – individual number which is usually assigned to every computer connected to the Internet. The IP number can be permanently associated with the computer (static) or assigned to a given connection (dynamic).
  8. GDPR – Regulation 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of individuals regarding the processing of personal data and onthe free transmission of such data, repealing Directive 95/46 / EC (General Data Protection Regulation).
  9. Personal data – information about an identified or identifiable natural person ("data subject"). An identifiable natural person is a person who can be directly or indirectly identified, in particular on the basis of identifiers such as name, identification number, location data, online identifiers or one or more specific factors determining the physical, physiological, genetic, mental, economic, cultural or social identity of a natural person.
  10. Processing – any operations performed on personal data, such as collecting, recording, storing, developing, modifying, sharing, and deleting, especially when performed in IT systems.

2. Cookies

The Website is secured by the SSL protocol, which provides secure data transmission on the Internet. The Website, in accordance with art. 173 of the Telecommunications Act of 16 July 2004 of the Republic of Poland, uses Cookies, i.e. data, in particular text files, stored on the User's end device.
Cookies are used to:

  1. improve user experience and facilitate navigation on the site;
  2. help to identify returning Users who access the website using the device on which Cookies were saved;
  3. creating statistics which help to understand how the Users use websites, which allows to improve their structure and content;
  4. adjusting the content of the Website pages to specific User’s preferences and optimizing the websites website experience to the each User's individual needs.

Cookies usually contain the name of the website from which they originate, their storage time on the end device and a unique number. On our Website, we use the following types of Cookies:

  • "Session" – cookie files stored on the User's end device until the Uses logs out, leaves the website or turns off the web browser;
  • "Persistent" – cookie files stored on the User's end device for the time specified in the Cookie file parameters or until they are deleted by the User;
  • "Performance" – cookies used specifically for gathering data on how visitors use a website to measure the performance of a website;
  • "Strictly necessary" – essential for browsing the website and using its features, such as accessing secure areas of the site;
  • "Functional" – cookies enabling remembering the settings selected by the User and personalizing the User interface;
  • "First-party" – cookies stored by the Website;
  • "Third-party" – cookies derived from a website other than the Website;
  • "Facebook cookies" – You should read Facebook cookies policy: https://www.facebook.com/policy/cookies
  • "Other Google cookies" – Refer to Google cookie policy: www.google.com/policies/technologies/types/

3. How System Logs work on the Website

User's activity on the Website, including the User’s Personal Data, is recorded in System Logs. The information collected in the Logs is processed primarily for purposes related to the provision of services, i.e. for the purposes of:

  • analytics – to improve the quality of services provided by us as part of the Website and adapt its functionalities to the needs of the Users. The legal basis for processing in this case is the legitimate interest of Nexocode consisting in analyzing Users' activities and their preferences;
  • fraud detection, identification and countering threats to stability and correct operation of the Website.

4. Cookie mechanism on the Website

Our site uses basic cookies that facilitate the use of its resources. Cookies contain useful information and are stored on the User's computer – our server can read them when connecting to this computer again. Most web browsers allow cookies to be stored on the User's end device by default. Each User can change their Cookie settings in the web browser settings menu: Google ChromeOpen the menu (click the three-dot icon in the upper right corner), Settings > Advanced. In the "Privacy and security" section, click the Content Settings button. In the "Cookies and site date" section you can change the following Cookie settings:

  • Deleting cookies,
  • Blocking cookies by default,
  • Default permission for cookies,
  • Saving Cookies and website data by default and clearing them when the browser is closed,
  • Specifying exceptions for Cookies for specific websites or domains

Internet Explorer 6.0 and 7.0
From the browser menu (upper right corner): Tools > Internet Options > Privacy, click the Sites button. Use the slider to set the desired level, confirm the change with the OK button.

Mozilla Firefox
browser menu: Tools > Options > Privacy and security. Activate the “Custom” field. From there, you can check a relevant field to decide whether or not to accept cookies.

Opera
Open the browser’s settings menu: Go to the Advanced section > Site Settings > Cookies and site data. From there, adjust the setting: Allow sites to save and read cookie data

Safari
In the Safari drop-down menu, select Preferences and click the Security icon.From there, select the desired security level in the "Accept cookies" area.

Disabling Cookies in your browser does not deprive you of access to the resources of the Website. Web browsers, by default, allow storing Cookies on the User's end device. Website Users can freely adjust cookie settings. The web browser allows you to delete cookies. It is also possible to automatically block cookies. Detailed information on this subject is provided in the help or documentation of the specific web browser used by the User. The User can decide not to receive Cookies by changing browser settings. However, disabling Cookies necessary for authentication, security or remembering User preferences may impact user experience, or even make the Website unusable.

5. Additional information

External links may be placed on the Website enabling Users to directly reach other website. Also, while using the Website, cookies may also be placed on the User’s device from other entities, in particular from third parties such as Google, in order to enable the use the functionalities of the Website integrated with these third parties. Each of such providers sets out the rules for the use of cookies in their privacy policy, so for security reasons we recommend that you read the privacy policy document before using these pages. We reserve the right to change this privacy policy at any time by publishing an updated version on our Website. After making the change, the privacy policy will be published on the page with a new date. For more information on the conditions of providing services, in particular the rules of using the Website, contracting, as well as the conditions of accessing content and using the Website, please refer to the the Website’s Terms and Conditions.

Nexocode Team