MongoDB Schema Design for SQL Developers

MongoDB Schema Design for SQL Developers

Wojciech Marusarz - May 7, 2019

We should already know when to use relational and Non-Relational DBS. If we decide to select Non-Relational MongoDB, let’s use it properly. For SQL developers this requires changing the way of thinking about data and connections between them. Let’s dive into the MongoDB world.

First of all, some terms translated for further reading

SQL,MongoDB
Database,Database
Table,Collection
Index,Index
Row,Document
Column,Field
Joins,Embed in document or link via DBRef

Even if such a mapping exists, it doesn’t mean that data persists in a similar way. For Relational DBS, the data structure is defined for Table, for Document DBS, each document has its structure defined, and documents in collections may differ.

No Schema Design ≠ No Database Design

In Relational DB, we need to design a database structure carefully. We need to create an appropriate table with predefined rows and to create foreign keys to join them. In MongoDB, we do not need to predefine collections, but instead, we have to decide how we are going to use our data and design a suitable document schema up front so that it allows to access data efficiently.

Let’s get through all mappings between tables for RDBS, and let’s look, how to implement it in MongoDB way using an online bookstore as an example.

One-To-One (1:1)

Relationship: Book Author has a single Address

Data model: Embedding: This first approach is to embed the Address document as an embedded document in the User document.

{
  "name": "Peter Wilkinson",
  "age": 27,
  "address": {
   "street": "100 some road",
   "city": "Nevermore"
  }
}

Data model: Linking: The second approach is to link the Address and User document using a foreign key.

{
  "_id": 1,
  "name": "Peter Wilkinson",
  "age": 27
}

{
 "_id": 1,
 "user_id": 1,
 "Street": "100 some road",
 "City": "Nevermore"
}

In the one to one relationship, Embedding is the preferred way to model the relationship as it’s more efficient to retrieve the document if querying about author address. If asking about all addresses, more efficient would be linking, but it is a rare case.

One-To-Many (1:N)

Relationship: A Book might have a many Comments with user ratings, but a Comment is only related to a single Book

Data model: Embedding: The embedding of the Comments in the Book means we can easily retrieve all the comments belong to a particular Book. Adding new comments is as simple as appending the new comment document to the end of the comments array.

{
  "title": "Great Expectations",
  "description": "This is an awesome book of Charles Dickens",
  "comments": [{
    "name": "Peter Critic",
    "created_on": "2019-01-01T10:01:22Z",
    "comment": "Not so awesome book"
  }, {
    "name": "John Lenient",
    "created_on": "2018-12-01T11:01:22Z",
    "comment": "Awesome book"
  }]
}

Data model: Linking: allows returning paginated comments as the application can slice and dice the comments more easily. On the downside, if we have 1000 comments on a book, we would need to retrieve all 1000 documents causing way too many reads from the database.

{
  "_id": 1,
  "title": "Great Expectations",
  "description": "This is an awesome book of Charles Dickens"
}
{
  "book_id": 1,
  "name": "Peter Critic",
  "created_on": "2019-01-01T10:01:22Z",
  "comment": "Not so awesome book"
}
{
  "book_id": 1,
  "name": "John Lenient",
  "created_on": "2018-12-01T11:01:22Z",
  "comment": "Awesome book"
}

Data model: Bucketing: The third approach is a hybrid of the two above. It tries to balance the rigidity of the embedding strategy with the flexibility of the linking strategy. For this example, we split the comments into buckets with a maximum of 50 comments in each bucket.

{
 "_id": 1,
 "title": "Great Expectations",
 "description": "This is an awesome book of Charles Dickens"
}
{
  "blog_entry_id": 1,
  "page": 1,
  "count": 50,
  "comments": [{
    "name": "Peter Critic",
    "created_on": "2019-01-01T10:01:22Z",
    "comment": "Not so awesome book"
  }]
}
{
  "blog_entry_id": 1,
  "page": 2,
  "count": 1,
  "comments": [{
    "name": "John Lenient",
    "created_on": "2018-12-01T11:01:22Z",
    "comment": "Awesome book"
  }]
}

The main benefit of using buckets, in this case, is that we can perform a single read to fetch 50 comments at a time, allowing for efficient pagination.

Many-To-Many (N:M)

Relationship: Let’s think about a Book that was written by many Authors. At the same time, an Author might have written many Books

Data model: Two Way Embedding

{
  "_id": 1,
  "name": "Peter Standford",
  "books": [1, 2]
}
{
  "_id": 2,
  "name": "Georg Peterson",
  "books": [2]
}
{
  "_id": 1,
  "title": "A tale of two people",
  "categories": ["drama"],
  "authors": [1, 2]
}
{
  "_id": 2,
  "title": "A tale of two space ships",
  "categories": ["scifi"],
  "authors": [1]
}

Queries:

Fetch books by a specific author

var db = db.getSisterDB("library");
var booksCollection = db.books;
var authorsCollection = db.authors;

var author = authorsCollection.findOne({name: "Peter Standford"});
var books = booksCollection.find({_id: {$in: author.books}}).toArray();

Fetch authors by a specific book

var db = db.getSisterDB("library");
var booksCollection = db.books;
var authorsCollection = db.authors;

var book = booksCollection.findOne({title: "A tale of two space ships"});
var authors = authorsCollection.find({_id: {$in: book.authors}}).toArray();

Data model: One Way Embedding for uneven N:M relationships Let’s take category drama that might have thousands of books in it and with many new books consistently being added and removed. It would be impracticable to embed all the books in a category document. Each book, however, can easily have categories embedded within it, as the rate of categories change for a specific book might be very low. In this case, we should consider One Way Embedding as a strategy.

The One Way Embedding strategy chooses to optimize the read performance of an N:M relationship by embedding the references in one side of the relationship. An example might be where several books belong to a few categories, but a couple of categories have many books. Let’s look at an example, pulling the categories out into a separate document.

{
 "_id": 1,
 "name": "drama"
}
{
  "_id": 1,
  "title": "A tale of two people",
  "categories": [1],
  "authors": [1, 2]
}

The reason we choose to embed all the references to categories in the books is due to there being a lot more books in the drama category than categories in a book.

Queries Fetch categories for a specific book

var db = db.getSisterDB("library");
var booksCol = db.books;
var categoriesCol = db.categories;

var book = booksCol.findOne({title: "A tale of two space ships"});
var categories = categoriesCol.find({_id: {$in: book.categories}}).toArray();

Fetch books for a specific category

var db = db.getSisterDB("library");
var booksCollection = db.books;
var categoriesCollection = db.categories;

var category = categoriesCollection.findOne({name: "drama"});
var books = booksCollection.find({categories: category.id}).toArray();

Aggregation framework - SQL Statements for MongoDB

When we designed documents structure and modeled relationships between them, we may need to process data. In SQL databases we use SQL Statements that allow to search, filter, sort and many many more. For this purposes in MongoDB shell, Aggregation Pipeline is used.

The drawback of Aggregation Pipeline is lack of support for joining documents similar to JOIN operations in RDBS. For the processing of unstructured data, it is often required to modify documents structure.

The first MongoDB method that provided data processing paradigm for condensing large volumes of data into useful aggregated results is Map-Reduce. MapReduce is useful when used to implement a data pipeline. Multiple MapReduce commands can be chained to produce different results. Its main advantage is that it is inherently parallelizable as evidenced by frameworks such as Hadoop. Using MapReduce, MongoDB applies a map to each input document, emitting key-value pairs at the end of the map phase. Then each reducer gets key-value pairs with the same key as input, processing all multiple values. The reducer’s output is a single key-value pair for each key.

Aggregation framework

Since version 2.2, MongoDB has provided a better way to work with aggregation. The aggregation framework is modeled as a data processing pipelines.

In data processing pipelines there are two primary operations: filters that operate like queries, filtering documents, and document transformations that transform documents to get them ready for the next stage. In the following table, we can see how SQL commands map to the aggregation framework operators:

SQL,Aggregation Framework
WHERE / HAVING,$match
GROUP BY,$group
SELECT,$project
ORDER BY,$sort
LIMIT,$limit
sum() / count(),$sum
join,$lookup

If you need more details about migration from RDB to NoSQL, please read the official MongoDB Documentation

Summary

As we could see, migration from RDB to NoSQL can be pretty straightforward. The most important thing is that we need to change our mindset to do it properly. First of all, we need to take care of proper relation assignment, to embrace MongoDB capabilities. In MongoDB, those relations seem to be more intuitive, but for bigger datasets, our assumptions need to change.

About the author

Wojciech Marusarz

Wojciech Marusarz

Software Engineer

Linkedin profile Twitter Github profile

Wojciech enjoys working with small teams where the quality of the code and the project's direction are essential. In the long run, this allows him to have a broad understanding of the subject, develop personally and look for challenges. He deals with programming in Java and Kotlin. Additionally, Wojciech is interested in Big Data tools, making him a perfect candidate for various Data-Intensive Application implementations.

Tempted to work
on something
as creative?

That’s all we do.

join nexocode

This article is a part of

Zero Legacy
36 articles

Zero Legacy

What goes on behind the scenes in our engineering team? How do we solve large-scale technical challenges? How do we ensure our applications run smoothly? How do we perform testing and strive for clean code?

Follow our article series to get insight into our developers' current work and learn from their experience. Expect to see technical details, architecture discussions, reviews on libraries and tools we use, best practices on software quality, and maybe even some fail stories.

check it out

Zero Legacy

Insights from nexocode team just one click away

Sign up for our newsletter and don't miss out on the updates from our team on engineering and teal culture.

Done!

Thanks for joining the newsletter

Check your inbox for the confirmation email & enjoy the read!

This site uses cookies for analytical purposes.

Accept Privacy Policy

In the interests of your safety and to implement the principle of lawful, reliable and transparent processing of your personal data when using our services, we developed this document called the Privacy Policy. This document regulates the processing and protection of Users’ personal data in connection with their use of the Website and has been prepared by Nexocode.

To ensure the protection of Users' personal data, Nexocode applies appropriate organizational and technical solutions to prevent privacy breaches. Nexocode implements measures to ensure security at the level which ensures compliance with applicable Polish and European laws such as:

  1. Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation) (published in the Official Journal of the European Union L 119, p 1); Act of 10 May 2018 on personal data protection (published in the Journal of Laws of 2018, item 1000);
  2. Act of 18 July 2002 on providing services by electronic means;
  3. Telecommunications Law of 16 July 2004.

The Website is secured by the SSL protocol, which provides secure data transmission on the Internet.

1. Definitions

  1. User – a person that uses the Website, i.e. a natural person with full legal capacity, a legal person, or an organizational unit which is not a legal person to which specific provisions grant legal capacity.
  2. Nexocode – NEXOCODE sp. z o.o. with its registered office in Kraków, ul. Wadowicka 7, 30-347 Kraków, entered into the Register of Entrepreneurs of the National Court Register kept by the District Court for Kraków-Śródmieście in Kraków, 11th Commercial Department of the National Court Register, under the KRS number: 0000686992, NIP: 6762533324.
  3. Website – website run by Nexocode, at the URL: nexocode.com whose content is available to authorized persons.
  4. Cookies – small files saved by the server on the User's computer, which the server can read when when the website is accessed from the computer.
  5. SSL protocol – a special standard for transmitting data on the Internet which unlike ordinary methods of data transmission encrypts data transmission.
  6. System log – the information that the User's computer transmits to the server which may contain various data (e.g. the user’s IP number), allowing to determine the approximate location where the connection came from.
  7. IP address – individual number which is usually assigned to every computer connected to the Internet. The IP number can be permanently associated with the computer (static) or assigned to a given connection (dynamic).
  8. GDPR – Regulation 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of individuals regarding the processing of personal data and onthe free transmission of such data, repealing Directive 95/46 / EC (General Data Protection Regulation).
  9. Personal data – information about an identified or identifiable natural person ("data subject"). An identifiable natural person is a person who can be directly or indirectly identified, in particular on the basis of identifiers such as name, identification number, location data, online identifiers or one or more specific factors determining the physical, physiological, genetic, mental, economic, cultural or social identity of a natural person.
  10. Processing – any operations performed on personal data, such as collecting, recording, storing, developing, modifying, sharing, and deleting, especially when performed in IT systems.

2. Cookies

The Website is secured by the SSL protocol, which provides secure data transmission on the Internet. The Website, in accordance with art. 173 of the Telecommunications Act of 16 July 2004 of the Republic of Poland, uses Cookies, i.e. data, in particular text files, stored on the User's end device.
Cookies are used to:

  1. improve user experience and facilitate navigation on the site;
  2. help to identify returning Users who access the website using the device on which Cookies were saved;
  3. creating statistics which help to understand how the Users use websites, which allows to improve their structure and content;
  4. adjusting the content of the Website pages to specific User’s preferences and optimizing the websites website experience to the each User's individual needs.

Cookies usually contain the name of the website from which they originate, their storage time on the end device and a unique number. On our Website, we use the following types of Cookies:

  • "Session" – cookie files stored on the User's end device until the Uses logs out, leaves the website or turns off the web browser;
  • "Persistent" – cookie files stored on the User's end device for the time specified in the Cookie file parameters or until they are deleted by the User;
  • "Performance" – cookies used specifically for gathering data on how visitors use a website to measure the performance of a website;
  • "Strictly necessary" – essential for browsing the website and using its features, such as accessing secure areas of the site;
  • "Functional" – cookies enabling remembering the settings selected by the User and personalizing the User interface;
  • "First-party" – cookies stored by the Website;
  • "Third-party" – cookies derived from a website other than the Website;
  • "Facebook cookies" – You should read Facebook cookies policy: www.facebook.com
  • "Other Google cookies" – Refer to Google cookie policy: google.com

3. How System Logs work on the Website

User's activity on the Website, including the User’s Personal Data, is recorded in System Logs. The information collected in the Logs is processed primarily for purposes related to the provision of services, i.e. for the purposes of:

  • analytics – to improve the quality of services provided by us as part of the Website and adapt its functionalities to the needs of the Users. The legal basis for processing in this case is the legitimate interest of Nexocode consisting in analyzing Users' activities and their preferences;
  • fraud detection, identification and countering threats to stability and correct operation of the Website.

4. Cookie mechanism on the Website

Our site uses basic cookies that facilitate the use of its resources. Cookies contain useful information and are stored on the User's computer – our server can read them when connecting to this computer again. Most web browsers allow cookies to be stored on the User's end device by default. Each User can change their Cookie settings in the web browser settings menu: Google ChromeOpen the menu (click the three-dot icon in the upper right corner), Settings > Advanced. In the "Privacy and security" section, click the Content Settings button. In the "Cookies and site date" section you can change the following Cookie settings:

  • Deleting cookies,
  • Blocking cookies by default,
  • Default permission for cookies,
  • Saving Cookies and website data by default and clearing them when the browser is closed,
  • Specifying exceptions for Cookies for specific websites or domains

Internet Explorer 6.0 and 7.0
From the browser menu (upper right corner): Tools > Internet Options > Privacy, click the Sites button. Use the slider to set the desired level, confirm the change with the OK button.

Mozilla Firefox
browser menu: Tools > Options > Privacy and security. Activate the “Custom” field. From there, you can check a relevant field to decide whether or not to accept cookies.

Opera
Open the browser’s settings menu: Go to the Advanced section > Site Settings > Cookies and site data. From there, adjust the setting: Allow sites to save and read cookie data

Safari
In the Safari drop-down menu, select Preferences and click the Security icon.From there, select the desired security level in the "Accept cookies" area.

Disabling Cookies in your browser does not deprive you of access to the resources of the Website. Web browsers, by default, allow storing Cookies on the User's end device. Website Users can freely adjust cookie settings. The web browser allows you to delete cookies. It is also possible to automatically block cookies. Detailed information on this subject is provided in the help or documentation of the specific web browser used by the User. The User can decide not to receive Cookies by changing browser settings. However, disabling Cookies necessary for authentication, security or remembering User preferences may impact user experience, or even make the Website unusable.

5. Additional information

External links may be placed on the Website enabling Users to directly reach other website. Also, while using the Website, cookies may also be placed on the User’s device from other entities, in particular from third parties such as Google, in order to enable the use the functionalities of the Website integrated with these third parties. Each of such providers sets out the rules for the use of cookies in their privacy policy, so for security reasons we recommend that you read the privacy policy document before using these pages. We reserve the right to change this privacy policy at any time by publishing an updated version on our Website. After making the change, the privacy policy will be published on the page with a new date. For more information on the conditions of providing services, in particular the rules of using the Website, contracting, as well as the conditions of accessing content and using the Website, please refer to the the Website’s Terms and Conditions.

Nexocode Team

Close

Want to be a part of our engineering team?

Join our teal organization and work on challenging projects.

CHECK OPEN POSITIONS