In this series, I aim to explain what a graph database is in a way that is understandable not only for those new to software engineering but also for people coming from other industries.
In this first article, I will explain what a database is in the first place.
So, what exactly is a database?
If you're a web engineer, you're probably familiar with relational databases (RDBMS) like MySQL and PostgreSQL. You may also frequently use key-value stores (KVS) such as Memcached or Redis. Depending on the use case, you might work with document-oriented databases like MongoDB or DynamoDB, column-oriented databases, or even graph databases like Neo4j.
In simple terms, a database is software optimized for storing, managing, and analyzing data. It is an application written in languages like C, C++, or Java.
As input, a database accepts any type of data it supports. While the specific data types vary by database—ranging from strings and numbers to JSON and Blob types—at its core, a database stores byte strings on disk. It is only software that allows you to create, update, and delete that data.
If all you need is to store data, you could create a simple text file, and it wouldn’t be wrong to call it a "database." For example, a simple TODO list represented as a text file might look like this:
$ cat mytodo.db
1,buy milk
2,clean up desk
3,read books
Here, mytodo.db
is the "database" we created. If you want to add data to this "database," you can simply do:
$ echo "4,go to the gym" >> mytodo.db
Of course, just adding data isn’t enough. If you want to retrieve the entry with key 4
, you could use:
$ grep "^4," mytodo.db | cut -d, -f2
go to the gym
With this, we now have a basic "database" that can store and retrieve data. Simple, right?
Now, can we build a payment application, an analytics platform, an autonomous driving system, or a machine learning system with this simple "database"?
The reality is that as soon as we start building an application, we realize that a plain text file does not meet most business needs. That’s because most services today need to analyze massive amounts of data.
A simple file-based system like the one above might handle tens of thousands of records at best. Searching for a specific key would require scanning all the data one by one (O(N)
time complexity), meaning the search time increases linearly with the data size. This makes real-time responses impossible, and no one would use such a system.
To address these challenges, a database must:
To solve real-world business problems, a database must excel in all these aspects while remaining efficient in handling data. Achieving this requires years of research and meticulous engineering efforts.
So, does the perfect database exist? One that:
That would be amazing! But as far as I know, such a database does not exist. If it did, the entire database industry would collapse.
Maybe technology hasn’t advanced enough, or perhaps business needs are too diverse—or both. Either way, there is no single "best" database that excels in every aspect.
Instead, the database landscape is constantly evolving, with different solutions specializing in different areas:
Each database is designed to address specific business and technical needs.
In reality, engineering is all about trade-offs, and the same applies to databases.
In the next article, we will explore different types of DBMS.