About
Highly accomplished Site Reliability Engineer with 5 years of experience specializing in building and maintaining scalable, highly available, and fault-tolerant systems. Proven expertise in automating operational processes, optimizing system performance, and leading incident response to ensure robust infrastructure. Adept at leveraging cloud technologies and modern DevOps practices to drive efficiency and enhance system reliability.
Work
→
Summary
Engineered and maintained critical production systems, ensuring high availability and optimal performance for a rapidly growing SaaS platform serving over 1 million users.
Highlights
Automated key operational tasks using Python and Ansible, reducing manual intervention by 40% and improving deployment efficiency by 25%.
Improved system uptime from 99.5% to 99.9% by implementing proactive monitoring (Prometheus, Grafana) and robust alert mechanisms, significantly reducing critical incidents.
Led incident response and post-mortem analysis for major outages, decreasing Mean Time To Resolution (MTTR) by 30% through root cause identification and preventative measures.
Optimized cloud infrastructure costs on AWS by 15% through rightsizing instances, implementing auto-scaling policies, and managing resource allocation effectively.
Developed and maintained CI/CD pipelines using Jenkins and GitLab CI, enabling faster and more reliable software releases with a 99% success rate.
Languages
English
Skills
Programming Languages
Python, Go, Bash, Java.
Cloud Platforms
AWS, Azure, Google Cloud Platform (GCP).
Containerization & Orchestration
Docker, Kubernetes, Helm.
CI/CD & DevOps Tools
Jenkins, GitLab CI, GitHub Actions, Terraform, Ansible, Chef, Puppet.
Monitoring & Logging
Prometheus, Grafana, Datadog, ELK Stack (Elasticsearch, Logstash, Kibana), Splunk, PagerDuty.
Operating Systems & Networking
Linux (Ubuntu, CentOS, RHEL), Networking (TCP/IP, DNS, HTTP), Load Balancing, Firewalls.
Databases
PostgreSQL, MySQL, MongoDB, Redis, Cassandra.
Version Control
Git, GitHub, GitLab, Bitbucket.
Methodologies
Agile, Scrum, ITIL, Site Reliability Engineering (SRE), DevOps.