Graph Database 1/3 - What is DBMS?

In this series, I aim to explain what a graph database is in a way that is understandable not only for those new to software engineering but also for people coming from other industries.

In this first article, I will explain what a database is in the first place.

What is a Database?

So, what exactly is a database?

If you're a web engineer, you're probably familiar with relational databases (RDBMS) like MySQL and PostgreSQL. You may also frequently use key-value stores (KVS) such as Memcached or Redis. Depending on the use case, you might work with document-oriented databases like MongoDB or DynamoDB, column-oriented databases, or even graph databases like Neo4j.

In simple terms, a database is software optimized for storing, managing, and analyzing data. It is an application written in languages like C, C++, or Java.

As input, a database accepts any type of data it supports. While the specific data types vary by database—ranging from strings and numbers to JSON and Blob types—at its core, a database stores byte strings on disk. It is only software that allows you to create, update, and delete that data.

The Simplest Database

If all you need is to store data, you could create a simple text file, and it wouldn’t be wrong to call it a "database." For example, a simple TODO list represented as a text file might look like this:

$ cat mytodo.db
1,buy milk
2,clean up desk
3,read books

Here, mytodo.db is the "database" we created. If you want to add data to this "database," you can simply do:

$ echo "4,go to the gym" >> mytodo.db

Of course, just adding data isn’t enough. If you want to retrieve the entry with key 4, you could use:

$ grep "^4," mytodo.db | cut -d, -f2
go to the gym

With this, we now have a basic "database" that can store and retrieve data. Simple, right?

Why Do We Need Databases?

Now, can we build a payment application, an analytics platform, an autonomous driving system, or a machine learning system with this simple "database"?

The reality is that as soon as we start building an application, we realize that a plain text file does not meet most business needs. That’s because most services today need to analyze massive amounts of data.

A simple file-based system like the one above might handle tens of thousands of records at best. Searching for a specific key would require scanning all the data one by one (O(N) time complexity), meaning the search time increases linearly with the data size. This makes real-time responses impossible, and no one would use such a system.

What Databases Need to Provide

To address these challenges, a database must:

  1. Efficiently create and search large volumes of data – This is achieved through optimized data structures and algorithms. Indexes, tree structures, memory utilization, and hardware-level optimizations all contribute to better performance.
  2. Optimize storage costs – Storing large amounts of data requires significant space, and storage costs money.
  3. Improve development efficiency – Even the fastest and most cost-effective database will go unused if it lacks visualization tools, a flexible query language, or debugging capabilities.
  4. Scale with business growth – As applications grow, databases must scale accordingly. A database that cannot scale will eventually be abandoned, no matter how well it performs.

To solve real-world business problems, a database must excel in all these aspects while remaining efficient in handling data. Achieving this requires years of research and meticulous engineering efforts.

The Ultimate Database

So, does the perfect database exist? One that:

  • Searches, creates, and updates data faster than any other
  • Stores huge volumes of data effortlessly
  • Is cost-efficient
  • Scales seamlessly with business growth
  • Requires minimal operational maintenance

That would be amazing! But as far as I know, such a database does not exist. If it did, the entire database industry would collapse.

Maybe technology hasn’t advanced enough, or perhaps business needs are too diverse—or both. Either way, there is no single "best" database that excels in every aspect.

Instead, the database landscape is constantly evolving, with different solutions specializing in different areas:

  • Some databases optimize for search speed but are slower at inserts and updates.
  • Others prioritize fast writes but have slow query performance.
  • Some focus on data integrity and consistency, while others embrace schema-less flexibility at the cost of operational complexity.
  • Some scale effortlessly across thousands of nodes but sacrifice strict consistency.
  • Some struggle with batch processing but excel at handling complex relationships between data.

Each database is designed to address specific business and technical needs.

In reality, engineering is all about trade-offs, and the same applies to databases.

TBC

In the next article, we will explore different types of DBMS.

2022-03-21