Essential Knowledge for Using MySQL

MySQL, as a relational database, is a complex piece of software.

It handles client connections, parses queries, performs query optimization, and interacts with storage engines like InnoDB through APIs. Fully understanding an ever-evolving software like MySQL is nearly impossible, but there are key concepts that you should grasp.

This article is aimed at beginner application developers, junior-level DBAs, and SREs. For senior engineers with several years of experience, this is basic knowledge. Consider this article as a collection of essential keywords from a "Database Management System 101" course in Computer Science.


First, understand that the MySQL server and storage engine are separate components. The implementation of the storage engine determines aspects like query optimization possibilities, deadlock prevention mechanisms, and configurable isolation levels. The default storage engine is the highly versatile InnoDB. On the other hand, the MySQL server handles connection management and query parsing. The MySQL server and storage engine interact through an abstracted API with loose coupling, ensuring high extensibility. This knowledge is crucial because it helps you understand, for example, that row locks are implemented on the storage engine side, not the server side, which deepens your understanding of various features.

Next, let's cover Read Locks and Write Locks, also known as Shared Locks and Exclusive Locks. These concepts are not difficult. When reading a table or row, if the data is not modified, the same result is returned regardless of when or who reads it. Since there is no need to block other read operations, this is called a Shared Lock. On the other hand, a write operation changes the data. To ensure consistency, a write operation requires an Exclusive Lock, meaning it prevents other reads or writes from occurring simultaneously.

Understanding the granularity of locking is also important. A smaller granularity reduces interference with other processes, but finer granularity increases the overhead of acquiring, managing, and releasing locks, impacting CPU cycles. This is a trade-off. Table Locks, which lock the entire table, are simple to understand but significantly degrade write performance for frequently updated tables. Row Locks, which lock individual rows, are a more natural alternative.

Understanding transactions is also essential. The ACID properties (Atomicity, Consistency, Isolation, Durability) are fundamental concepts covered in every database textbook and blog. You should at least have a rough understanding of what each property means.

Another key topic is Isolation Levels. This topic can be complex because of implementation details and standard specifications, but familiarity with the key terms is enough initially. The ANSI SQL standard defines these levels:

  • READ UNCOMMITTED: Rarely used in practice.
  • READ COMMITTED: The most commonly used default level.
  • REPEATABLE READ: The default in MySQL, ensuring consistency during a transaction.
  • SERIALIZABLE: The strictest isolation level.

When learning about Isolation Levels, you will naturally encounter common consistency issues that they help mitigate. These include:

  • Dirty Reads: Reading uncommitted data.
  • Non-Repeatable Reads: Getting different results from the same query within a transaction.
  • Phantom Reads: New rows appearing in a result set when using range queries.

Also, remember that InnoDB implements Two-Phase Locking (2PL), a concept often covered in Computer Science courses.

With modern database deployments increasingly using multi-node clusters, it's important to understand Leader/Follower replication and data synchronization mechanisms. Under the hood, MySQL uses Binary Logs (binlogs) for asynchronous replication. This process is not truly real-time; there is an unavoidable latency of a few milliseconds, depending on factors like cross-region deployment and network conditions.

To ensure Durability, databases must persist data to disk, but writing to disk every time would be inefficient and wear out the disk quickly. Instead, MySQL uses Write-Ahead Logging (WAL), where data is buffered in memory before being written to disk in batches. Many of you may have heard of WAL before.

Finally, understand indexes. Simply put, an index is another representation of a table designed to speed up lookups. However, indexes consume storage space and are not a silver bullet. Adding an index does not automatically solve all performance problems, so be cautious.

There is still much to learn about MySQL, including DDL, .idb files, metadata, storage engine-specific features, and monitoring best practices. You will also need to learn migration strategies and effective indexing techniques as you gain experience. However, the topics covered here represent the minimum foundational knowledge that every engineer, including junior-level developers, should have when discussing MySQL.

If you're interested in a deeper dive, the first half of High Performance MySQL covers these concepts well. It's an excellent read.

2023-06-25