About
Highly experienced Senior Monitoring Analyst (SRE) specializing in designing, implementing, and optimizing high-availability systems. Proficient in Grafana, Prometheus, Kubernetes, and New Relic, consistently ensuring system reliability and minimal downtime. Adept at optimizing SLIs/SLOs and automating operational processes to enhance overall efficiency and reduce Mean Time to Recovery (MTTR) by up to 59%.
Work
New Delhi, Delhi, India
→
Summary
Leads SRE monitoring and alerting initiatives at PHREESIA, optimizing system reliability and operational efficiency for high-availability production environments.
Highlights
Developed and managed real-time URL monitoring dashboards in Grafana, leveraging Prometheus to track uptime, response time, and failures, ensuring comprehensive system visibility.
Configured Prometheus Alertmanager with OpsGenie for proactive incident detection, significantly reducing Mean Time to Recovery (MTTR) by 59% through automated alerts.
Automated recurring manual tasks ("Toils") using New Relic, saving over 10 hours weekly and improving overall operational efficiency and response time.
Defined and refined Service Level Indicators (SLIs) and Service Level Objectives (SLOs) across various services, collaborating with DevOps, Development, and DCOps teams to align monitoring with business needs.
Led structured postmortems to document and implement action items preventing recurrence; created comprehensive SRE playbooks to standardize incident response and monitoring strategies.
Monitored two releases weekly, ensuring smooth deployments and early regression identification, while providing 24x7 production support in a high-availability environment with rotational shifts.
New Delhi, Delhi, India
→
Summary
Managed end-to-end mortgage process flow and optimized loan operations through data analysis and cross-functional collaboration.
Highlights
Orchestrated the full process flow and communication for over 2,000 US housing mortgages, ensuring seamless operations.
Conducted data analysis using statistical software to identify bottlenecks and optimize loan processing efficiency, contributing to streamlined workflows.
Maintained an average Jira resolution time of 10 minutes, achieving 100% SLA compliance for over 7 consecutive months.
Led collaborative initiatives across Product Development, Underwriting, and Onshore teams to optimize workflows and enhance operational synergy.
Languages
English
Hindi
Skills
Monitoring & Observability
Grafana, Prometheus, New Relic, OpsGenie.
Incident Management & Alerting
Prometheus Alertmanager, On-Call Pager Duty.
Automation & Scripting
Python, New Relic Workflows.
Infrastructure & Cloud
Kubernetes.
DevOps & Collaboration Tools
Jira, Confluence, Salesforce.
Database & Analytics
SQL, MS Excel, Google Sheets.
Interests
Hobbies
Collecting Fountain Pens, Playing Cricket, Exploring Different Teas and Coffees.
Wellness
Fitness Enthusiast.