about

I am a Site Reliability Engineer passionate about managing complex production applications at scale. My expertise lies in performance engineering and database management, focusing on building resilient and scalable systems.

My Journey

Exploration: Discovering My Passion for Software Engineering (2013~)

My career in software engineering began in 2013 when I took on the role of a programming instructor. Teaching teenagers the fundamentals of programming was a really eye-opening experience. It not only reinforced my own knowledge to answer their questions but also reminded me of the importance of teaching and learning. This experience laid the foundation for my transition into professional software development.

Eager to expand my technical skills, I joined one of Japan’s largest HR companies, where I worked on mobile development. I wanted to deepen my hands-on skills, and also my curiosity for web development after reading "High Performance Browser Networking" soon led me to a startup environment, where I immersed myself in Ruby on Rails.

Working in a fast-paced, high-growth company, I learned the intricacies of building scalable web applications and maintaining them under dynamic requirements. The culture was exciting and exposed myself not only to web programming but also maintaining infrastructure and tuning databases, which later became my favourite parts in my careers.

Excitement: Unveiling the Power of Databases (2017~)

In 2017, I took a significant leap in my career by joining Cookpad, a global platform for recipe sharing. My time there was transformative, as I not only grew into a Tech Lead role but also discovered my passion for database. One of my most exciting projects involved developing a real-time logging system, which enhanced observability and performance monitoring. Additionally, I built a distributed advertisement server using Ruby and Go, allowing the company to deliver targeted and efficient ad services at scale.

This was also the time I've came across "Designing Data-Intensive Applications". I was literally shoked by how much I did not know about distributed databases. This is a pivoting moment in my life as I seriously started thinking about immersing myself into databases in my career.

By April 2020, I transitioned into Site Reliability Engineering and relocated to Cookpad UK. Here, I was responsible for ensuring the resilience of a global-scale recipe-sharing platform that ran on Rails, backed by MySQL and Apache Kafka. The shift from backend development to SRE allowed me to refine my approach to system reliability, focusing on performance engineering, automation, and database optimization. Managing high-traffic, user-generated content in a production environment introduced me to new challenges in scaling and resiliency, reinforcing my expertise in database performance tuning and observability.

Expertise: Mastering Databases at Scale (2021~)

In 2021, I sought to deepen my knowledge of graph databases and joined Neo4j. Exploring the power of Graph DBMS at a global scale was a fascinating experience, allowing me to work with complex relationships and large datasets in an entirely new way. I needed to understand distrubuted database concepts like Raft concensus algorithms, time synchronization, replication, fail overs, and a lot in the wild. It was really a precious experience to elaborate my textbook knowledge into practical skillsets. Graph databases presented a different set of optimization challenges, broadening my perspective on data structures and storage mechanisms.

My journey continued in 2022 when I joined Shopify to further refine my skills in managing large-scale database systems. Shopify’s infrastructure presented unique challenges in scalability and performance, offering me the opportunity to work on globally distributed databases. My role here has been instrumental in shaping my expertise in database resilience, ensuring that mission-critical applications remain performant and highly available under heavy load.

I really love what I'm doing now, and my journey goes on.

Focus Areas

Databases are at the core of my work as an SRE. I specialize in ensuring their resilience, optimizing performance, and designing architectures that can withstand the demands of large-scale production environments. My focus is on maintaining high availability, minimizing query latency, and preventing system failures before they occur.

Scalability is another key area of my expertise. As systems grow, databases must be able to handle increasing workloads efficiently. I work on optimizing query execution, indexing strategies, and data distribution across clusters to ensure seamless scaling without compromising performance.

Reliability is fundamental in database operations. I focus on designing failover mechanisms, automated recovery processes, and robust backup strategies to mitigate risks and minimize downtime. Proactive monitoring, anomaly detection, and incident response planning ensure that databases remain resilient under unpredictable conditions, enabling continuous availability of critical services.

Contacts

I regularly share my insights and research on database internals, performance tuning, and distributed computing at kenwagatsuma.com.

Feel free to reach out if you’d like to discuss databases, distributed systems, or SRE best practices!

2025-03-11