Graph Database 3/3 - What is Graph DB?

In this series, I aim to explain what a graph database is in a way that is understandable not only for those new to software engineering but also for people coming from other industries.

The third article in this series will introduce graph databases.

Graph Database

Let’s finally dive into the main topic of this series.

What exactly is a graph database?

Simply put, it is "a database optimized for solving problems that can be represented using graph theory."

Graph theory is a branch of mathematics formalized by Leonhard Euler in the 18th century. It is said to have originated from his research on the Seven Bridges of Königsberg problem.

In the early 18th century, Königsberg (now Kaliningrad, Russia), the capital of East Prussia in the Kingdom of Prussia, was a large city with the Pregel River flowing through its center. Seven bridges crossed the river, and townspeople wondered:

“Is it possible to cross all seven bridges exactly once and return to the starting point?”

Some readers may not be familiar with graph structures, but surprisingly, many real-world problems can be represented using graphs.

Here are some common examples:

  • Semantic Web

    • If web pages are represented as "nodes" and links as "edges," we can analyze page similarity, correlations, popular pages, and even detect fraudulent link schemes.
  • Social Networks

    • In platforms like Twitter or Facebook, accounts can be represented as "nodes" and follower relationships as "edges." This allows for analyzing human relationships and identifying influential accounts.
  • Financial Transactions

    • Banks and personal accounts can be represented as "nodes," while money transfers act as "edges." This helps analyze financial flows, detect fraud, and prevent unauthorized transactions in credit card systems or online payments.
  • Recommendation Engines

    • In e-commerce, products, categories, and users can be represented as "nodes," while purchase history forms "edges." This allows clustering users with similar buying behavior and identifying unexpected correlations between products for better marketing and recommendations.
  • Mapping Applications

    • Locations (stores, houses) can be represented as "nodes," while roads act as "edges." This is useful for finding optimal delivery routes, analyzing regional demographics, and simulating disaster response scenarios.

More recently, graph theory has been applied in ML/AI contexts such as Knowledge Graphs and Context AI, proving its value in various industries.

Why Use a Graph Database?

So, why should you use a graph database?

The answer is simple: if your business problem or system requirement can be naturally expressed using graph theory, then a graph database is likely the best choice.

Social Network Example

A clear example is the design of a social network.

Suppose you're tasked with implementing a timeline feature. What use cases come to mind?

Think of Twitter—users have follower/followee relationships, and there are features like likes on posts. These requirements define the business logic, which needs to be implemented efficiently.

For data modeling, you might create a User table, a Follow table with foreign keys representing follow relationships, and a Tweet table storing post content. This relational model is straightforward.

Relational Databases Can Work—But...

It’s not difficult to model social networks using a relational database. The relational data model is highly expressive and can efficiently represent many graph-like problems.

However, challenges arise during real-world operation.

Imagine implementing the Twitter timeline feature in an RDBMS. When a user logs in, what should they see? Likely, they’ll see tweets from accounts they follow, ranked by recency and popularity.

To query this efficiently:

  1. Retrieve the user’s User ID using a WHERE clause.
  2. Use a JOIN with the Follow table to get the list of followed accounts.
  3. Use another JOIN with the Tweet table to fetch recent posts.
  4. Use yet another JOIN with the Like table to count likes.

As you can see, queries quickly become complex with multiple JOINs.

Poor Scalability

Complex queries are not just difficult to write—they must also execute quickly to maintain user satisfaction. Moreover, they must scale as the user base grows.

Queries with multiple JOINs are often computationally expensive because they require repeated disk I/O operations. Disk access is much slower than memory access, and as the number of JOINs increases, query response times degrade.

The deeper the follower network, the more data needs to be traversed. If queries scale exponentially, the timeline feature could become unbearably slow, causing users to leave the platform.

How Graph Databases Solve This Problem

Graph databases (especially those with native graph implementations) not only make modeling these relationships easier but also maintain near-linear query performance growth as data scales.

Why?

It’s not magic—it’s because graph databases optimize storage layouts to function like linked lists on disk, allowing rapid traversal of related data.

Additional optimizations include:

  • Query execution planning tailored for graph structures.
  • Memory-optimized secondary data structures for fast lookups.
  • Cluster configurations that ensure fault tolerance while maintaining high-speed queries.

In native graph databases, relationships between nodes are stored directly on disk, making reads, writes, and searches highly efficient.

Is Graph Database the Ultimate Solution?

Are graph databases the ultimate solution?

No. As mentioned earlier, every database type has its strengths and weaknesses. Understanding these trade-offs is key to choosing the right tool for your application.

However, one thing is certain: graph databases offer unique advantages that neither RDBMS nor NoSQL can match. When used in the right context, they deliver unmatched performance.

2022-03-23