By Davy Cielen, Arno D. B. Meysman, and Mohamed Ali
In this article, excerpted from Introducing Data Science, we will introduce you to the four big NoSQL database types.
There are four big NoSQL types: key-value store, document store, column-oriented database, and graph database. Each type solves a problem that can’t be solved with relational databases. Actual implementations are often combinations of these. OrientDB, for example, is a multi-model database, combining NoSQL types. OrientDB is graph database where each node is a document.
Before going into the different NoSQL databases, let’s look at relational databases so you have something to compare them to. In data modelling, many approaches are possible. Relational databases generally strive toward normalization: making sure every piece of data is stored only once. Normalization marks their structural setup. If, for instance, you want to store data about a person and their hobbies, you can do so with two tables: one about the person and one about their hobbies. As you can see in figure 1, an additional table is necessary to link hobbies to persons because of their many-to-many relationship: a person can have multiple hobbies and a hobby can have many persons practicing it.
Figure 1 Relational databases strive toward normalization (making sure every piece of data is stored only once). Each table has unique identifiers (primary keys) that are used to model the relation between the entities (tables), hence the term relational.
A full-scale relational database can be made up of many entity and linking tables. Now that you have something to compare NoSQL to, let’s look at the different types.
COLUMN-ORIENTED DATABASE
Traditional relational databases are row-oriented, with each row having a row-id and each field within the row stored together in a table. Let’s say, for example’s sake, that no extra data about hobbies is stored and you have only a single table to describe people, as shown in figure 6.8. Notice how in this scenario you have slight denormalization because hobbies could be repeated. If the hobby information is a nice extra but not essential to your use case, adding it as a list within the Hobbies column is an acceptable approach. But if the information isn’t important enough for a separate table, should it be stored at all?