How To Choose A Database For Backend

Each database management system has its pros and cons, they must be taken into account when choosing a particular database for your application. You need to evaluate how much data will be expected to store, whether complex relationships between data will be necessary, whether binary data or data in JSON format will be stored, whether sharding is supposed to be used, and so on.

The figure below shows the popularity of databases at the beginning of 2023.

More details are here.

MySQL

One of the most common DBMS, allows you to implement projects of small and medium complexity. But of course you can try to use it on complex projects with a large amount of stored data, however, this may entail some difficulties.

Pros

Open source.
Is a Relational Database Management System or RDBMS which means that it stores and presents data in tabular form, organized in rows and columns.
Is more secure as it consists of a solid data security layer to protect sensitive data from intruders and passwords in MySQL are encrypted.
Allows transactions to be rolled back.
Cross-platform.

Cons

Is not very efficient in handling very large databases.
Versions less than 5.0 do not support COMMIT, stored procedure and ROLE.
Does not support SQL check constraints.
Complex data replication mechanics.

MariaDB

MariaDB is a fork of MySQL. In other words, it is an enhanced, drop-in replacement of MySQL and they have the similar pros and cons, let’s take a quick look at the differences between them.

MySQL

The performance of MySQL is not that great when compared to MariaDB.
It does not provide a replacement for MariaDB.
Data masking is done in MySQL.
It can not handle large-sized data easily.
OS X is supported in MySQL as a server.

MariaDB

The performance of MariaDB is far better than MySQL.
It provides a drop-in replacement for MySQL.
There is no data masking.
It can handle large-sized data easily.
OS X is not supported in MariaDB as a server.

PostgreSQL

This DBMS allows using triggers and functions to implement its own methods for converting data stored in the database, it supports storing data in JSON format, but with large amount of data does not work very fast. It is often used in open source projects, it is easy to maintain and update, there are built-in mechanisms for implementing various replication schemes.

Pros

Open source.
Highly expandable.
Possible to process complex data types (e.g. geographical data).
Flexible full text search.
Creation of own functions, triggers, data types, etc. possible.
Supports JSON.
Cross-platform.

Cons

Expandable documentation only available in English.
Comparatively low reading speed.
Complex data replication mechanics and sometime unstable.

HBase

NoSQL database of the key-value family is popular in projects involving the storage of huge amounts of data with fast access to them. Supports sharding out of the box. Has some implementation quirks that sometimes lead to unexpected memory leaks.

Pros

HBase can handle as well as stores large datasets on top of HDFS file storage. Moreover, it aggregates and analyzes billions of rows present in the HBase tables.
There is a time when relational databases break down, then HBase shines in the picture.
As compared to traditional dataBase, data reading and processing in HBase will take a small amount of time.
Scalability is supported in both linear and modular form, in HBase.
There is no concept of fixed columns schema in HBase because it is schema-less. Hence, it defines only column families.

Cons

In HBase, there is no support for the transaction.
Instead of the database itself, JOINs are handled in the MapReduce layer.
As RDBMS can be indexed on some arbitrary field, HBase is indexed and sorted only on key.
There are no permissions or built-in authentication.
As there is no support for SQL structure, it cannot contain any query optimizer.
In some time memory issues on the cluster, HBase is integrated with Pig and Hive jobs results.
Poor support for non-Java clients

Clickhouse

An open source column-oriented database management system developed by Yandex (it currently powers Yandex.Metrica, the world’s second-largest web analytics platform). Well suited for storing and uploading large amounts of small amounts of data, such as server logs, activity analytics, and so on.

Pros

Open source.
Parallel processing for single query (utilizing multiple cores).
Distributed processing on multiple servers.
Good compression.
SQL support (with limitations).
Good set of functions, including support for approximated calculations.
Great for structural log/event data as well as time series data (engine MergeTree requires date field).
Index support (primary key only, not all storage engines)

Cons

No real delete/update support, and no transactions (same as Spark and most of the big data systems).
No secondary keys (same as Spark and most of the big data systems).
Own protocol (no MySQL protocol support).
Limited SQL support, and the joins implementation is different. If you are migrating from MySQL or Spark, you will probably have to re-write all queries with joins.

ManticoreSearch Engine

A fork of the popular search engine Sphinx, the first release was released in 2017, it has a high speed, supports JSON, can work in HTTP server mode, and supports SQL-like query syntax. Well suited both for storing data that needs quick access, and in search engine mode.

Pros

Open source.
Fast search on large (multi GB) datasets.
Very fast indexing and small index size.
Good relevance, can be tuned even further with field weighting and proximity limits.
Support for morphology; soundex & porter stemmers (EN & RU) included.
Percolate index.
Simple data replication.

Cons

Supports only positive integers for grouping/filtering.
Not quite as simple as ALTER TABLE ADD FULLTEXT.

Redis

One of the most popular key-value databases, which are usually used for caching the content of web pages and as well as for organizing a queuing system, for example, for chat applications. It can work in cluster mode, which, combined with an emphasis on in-memory, allows you to achieve very high speed and high availability.

Pros

Open source.
In-memory cache.
Pub/Sub – this is a special feature of Redis that uses the publish/subscribe messaging paradigm and it is suitable for implementing chatting related use cases.
Besides String data type, it supports more complex data structures such as lists, hashes, sets and sorted sets.
Provides a master-slave distributed system called Redis Sentinel to guarantee high availability.
Cross-platform.

Cons

Redis is an in-memory database, which means that the whole dataset should reside in the memory (RAM).
Isn’t intended for rich queries since it is a key-value database.
Supports only basic security options.
Persistence can impact performance since Redis will use memory dump to create snapshots used for persistence.