about

2025-05-07

Ken Wagatsuma is a Site Reliability Engineer specializing in database management, performance engineering, and scalable systems. With experience at Neo4j, Shopify, and Cookpad, Ken focuses on database resilience, optimization, and distributed systems architecture.

I am a Site Reliability Engineer passionate about managing complex production applications at scale. My expertise lies in performance engineering and database management, focusing on building resilient and scalable systems.

My Journey

Exploration: Discovering My Passion for Software Engineering (2013~)

My career in software engineering began in 2013 when I took on the role of a programming instructor. Teaching teenagers the fundamentals of programming was a really eye-opening experience. It not only reinforced my own knowledge to answer their questions but also reminded me of the importance of teaching and learning. This experience laid the foundation for my transition into professional software development.

Eager to expand my technical skills, I joined one of Japan’s largest HR companies, where I worked on mobile development. I wanted to deepen my hands-on skills, and also my curiosity for web development after reading "High Performance Browser Networking" soon led me to a startup environment, where I immersed myself in Ruby on Rails.

Working in a fast-paced, high-growth company, I learned the intricacies of building scalable web applications and maintaining them under dynamic requirements. The culture was exciting and exposed myself not only to web programming but also maintaining infrastructure and tuning databases, which later became my favourite parts in my careers.

Excitement: Unveiling the Power of Databases (2017~)

In 2017, I took a significant leap in my career by joining Cookpad, a global platform for recipe sharing. My time there was transformative, as I not only grew into a Tech Lead role but also discovered my passion for database. One of my most exciting projects involved developing a real-time logging system, which enhanced observability and performance monitoring. Additionally, I built a distributed advertisement server using Ruby and Go, allowing the company to deliver targeted and efficient ad services at scale.

This was also the time I've came across "Designing Data-Intensive Applications". I was literally shoked by how much I did not know about distributed databases. This is a pivoting moment in my life as I seriously started thinking about immersing myself into databases in my career.

By April 2020, I transitioned into Site Reliability Engineering and relocated to Cookpad UK. Here, I was responsible for ensuring the resilience of a global-scale recipe-sharing platform that ran on Rails, backed by MySQL and Apache Kafka. The shift from backend development to SRE allowed me to refine my approach to system reliability, focusing on performance engineering, automation, and database optimization. Managing high-traffic, user-generated content in a production environment introduced me to new challenges in scaling and resiliency, reinforcing my expertise in database performance tuning and observability.

Expertise: Mastering Databases at Scale (2021~)

In 2021, I sought to deepen my knowledge of graph databases and joined Neo4j. Exploring the power of Graph DBMS at a global scale was a fascinating experience, allowing me to work with complex relationships and large datasets in an entirely new way. I needed to understand distrubuted database concepts like Raft concensus algorithms, time synchronization, replication, fail overs, and a lot in the wild. It was really a precious experience to elaborate my textbook knowledge into practical skillsets. Graph databases presented a different set of optimization challenges, broadening my perspective on data structures and storage mechanisms.

My journey continued in 2022 when I joined Shopify to further refine my skills in managing large-scale database systems. Shopify’s infrastructure presented unique challenges in scalability and performance, offering me the opportunity to work on globally distributed databases. My role here has been instrumental in shaping my expertise in database resilience, ensuring that mission-critical applications remain performant and highly available under heavy load.

I really love what I'm doing now, and my journey goes on.

Evolve logo

Focus Areas

Databases are at the core of my work as an SRE. I specialize in ensuring their resilience, optimizing performance, and designing architectures that can withstand the demands of large-scale production environments. My focus is on maintaining high availability, minimizing query latency, and preventing system failures before they occur.

Scalability is another key area of my expertise. As systems grow, databases must be able to handle increasing workloads efficiently. I work on optimizing query execution, indexing strategies, and data distribution across clusters to ensure seamless scaling without compromising performance.

Reliability is fundamental in database operations. I focus on designing failover mechanisms, automated recovery processes, and robust backup strategies to mitigate risks and minimize downtime. Proactive monitoring, anomaly detection, and incident response planning ensure that databases remain resilient under unpredictable conditions, enabling continuous availability of critical services.

Output to Learn

I see output as one of critical part in my learning. If I cannot explain concepts clearly to others people, I believe it means I fully do not understand those concepts yet. I put my effort on dymystifying difficult algorithms or architecture through various channels.

Writing is the most foundational and important learning activity for me. By going through iterations of writing and reviewing, I find my own misunderstandings and need to correct in a better way. Writing consistently about my favourite technology is the growth engine since I was a teenager. I publish some parts of my writing as a blog, which I mainly focuses on databases.

I'm a weekend podcaster, too. I found that podcasting is another great way to boost my learning and keep connected to like-minded passionate people. I see podcasting as a way to create a micro community where we gather together to enjoy learning and inspire for each other. Listening and discovering guests strength and personality really inspires me, which is why I keep doing podcasting.

I'm not a conference geek, but sometimes I go to meetups to talk about technology or join my friends' podcasts to share my failures and learnings. I love writing for company blogs, too. You can see the full list here.

About kenwagatsuma.com

The initial version of the website was... a static website hosted in GitHub Pages using Jekyll as a template engine, if I don't count the time when I used Wordpress.org before I started my career as a software engineer. I then explore a various blogging platform, including Ghost, deploying Wordpress in my own box, and even to exploring my own custome-made blog engine. I have to confess that, I was more delved myself into developing the blog platform itself, rather than writing blogs and contents. It was fun, yet I did not produce any useful articules around that time.

I then shifted my focus on Static Site Generation (SSG) using Gatsby, I remember it was in 2018, where I was invited to the first GraphQL conference in Asia (Bengaluru, India) and gave a talk about Backend-For-Frontend and GraphQL challenges. At the conference, I was so impressed with the growth of Gatsby as an ecosystem and I immediately started using Gatsby when I came back from India. I hosted my blog on Netlify, as it was really easy to set up and scale to their edge network. I then started writing my own Gatsby plugin, and wrote a couple of blog posts along with enjoyed hacking the tooling itself.

Finally, I landed on Remix in 2024. You can read more about this migration story at this post. Since then, I've been enjoying hacking around TypeScript and React.js, as well as blogging.

At the time of writing this: This website is built with React Router (previously Remix), and deployed to Cloudflare Pages. I use Cloudflare D1 as a backend to store blog post metadata. Few other Cloudflare products are fully utilised, for example Cache Rules for enhancing granular caching strategies for each assets and Cloudflare Domains for DNS. I use terraform, our industry standard tooling for IaC, to manage any infra resources on Cloudflare.

You can read the whole history of my struggle and learning from tinkering with #blog tag.

Contacts

Feel free to reach out if you’d like to discuss databases, distributed systems, or SRE best practices! Email to hi at kenwagatsuma.com, or connect with me on LinkedIn. I do not have accounts on any other SNS at the moment.

Programmer. Generalist. Open-minded amateur.

Ken Wagatsuma is a Site Reliability Engineer based in the UK. He is passionate about managing complex production applications that solve real-world problems. Keen on Performance Engineering and Databases.