Neo4j Page Cache
2021-11-12A technical overview of Neo4j's page cache system, exploring its architecture, memory management strategies, and performance optimizations that minimize disk I/O while providing efficient data access for graph database operations.
This article explains the Page Cache in the Neo4j database.
About Page Cache
Since Neo4j is a database management system (DBMS), data is stored on disk. However, reading from the disk every time a user makes a query would be extremely slow, and write performance would also suffer.
To improve performance, various techniques are employed, one of which is the Page Cache.
In short, the Page Cache is a "cache stored in memory." Since it is stored in memory, it is significantly faster than reading and writing data from the disk each time.
Incidentally, in Neo4j, both node data and relationship data are cached in the Page Cache.
Eviction
Basically, it is a cache. It is impossible to keep all the data on disk in memory due to size constraints, so when the cache overflows, it is necessary to evict existing unused entries.
The number of evictions can be measured with the metric <prefix>.page_cache.evictions
, which will be discussed later. By measuring this along with another metric that indicates how much of the cache hit, <prefix>.page_cache.hit_ratio
, you can understand how effectively the Page Cache is being utilized.
Page Cache Configuration
The size of the Page Cache can be configured with dbms.memory.pagecache.size
.
For example, if you want to allocate 4GB of memory to the Page Cache, you can set it as follows:
dbms.memory.pagecache.size=4GB
Page Cache Metrics
When using the Page Cache, you might be concerned about how effectively the memory cache is being utilized.
In the Neo4j database, you can obtain metrics related to the Page Cache.
The main metric to check is <prefix>.page_cache.hit_ratio
, which indicates how much of the cache hit.
Additionally, you can use <prefix>.page_cache.evictions
to see the number of evictions that occurred.
The status of IOPS can be obtained with the metric <prefix>.page_cache.iops
.
Warm-Up Page Cache
The Page Cache uses a Copy-On-Write method. This means that data is only read from the disk and moved to the Page Cache in memory on-demand when there is a query for that data.
This means that immediately after the Neo4j database starts, the Page Cache is empty. Therefore, the initial read requests will be read from the disk until the Page Cache warms up, negatively impacting latency.
For applications where this is a problem, one approach is to pre-warm the Page Cache by executing dummy read queries from a separate process.
NOTE: In the Enterprise Edition, you can enable the setting
dbms.memory.pagecache.warmup.enable
.
Conclusion
In summary, we introduced the overview of the Page Cache in the Neo4j database, as well as how to configure it and the metrics that can be obtained.