Principal Platform Engineer

Organization: Weekday

Location: Bengaluru, Karnataka, India

This role is for one of our clientsIndustry: Software DevelopmentSeniority level: Mid-Senior levelMin Experience: 10 yearsLocation: BengaluruJobType: full-time\n\n₹55,00,000 - ₹85,00,000 a yearWe are building a next-generation AI observability and trust platform that enables enterprises to safely deploy, monitor, and improve AI systems at scale—across traditional ML models, LLMs, generative AI, and agentic workflows.As a Principal Platform Engineer, you will architect and develop the backbone of this platform: high-performance backend services, distributed systems, and cloud-native infrastructure that power AI evaluation, monitoring, and reliability at production scale. This is a hands-on, high-ownership role where you will shape platform architecture, influence engineering standards, and help define what “trustworthy AI” looks like in real-world enterprise environments.What You’ll OwnPlatform & Backend ArchitectureDesign and build scalable backend services that power AI observability, evaluation, and governance workflows.Architect distributed systems capable of ingesting, processing, and querying high-volume AI telemetry and evaluation data.Develop APIs and services that expose AI performance, reliability, and risk signals to enterprise customers.Distributed Systems & Data InfrastructureBuild systems that compute and store advanced AI evaluation metrics such as accuracy, relevance, drift, latency, and hallucination indicators.Design resilient data pipelines using event-driven and streaming architectures.Optimize storage and query layers for scale, performance, and cost efficiency.Reliability, Scale & OperationsDefine and improve operational standards across availability, latency, SLOs, observability, and incident response.Lead efforts around performance tuning, failure handling, capacity planning, and system resiliency.Embed best practices for testing, CI/CD, and production readiness into platform development.AI Platform EvolutionPartner with product, ML, and customer teams to design new evaluation capabilities aligned with emerging AI risks and enterprise needs.Support observability for modern AI workloads including LLMs, GenAI pipelines, and agent-based systems.Contribute to the long-term technical roadmap for responsible and transparent AI systems.Technical LeadershipAct as a technical multiplier by reviewing designs and code, raising engineering standards, and guiding architectural decisions.Mentor senior and mid-level engineers, helping them grow in systems thinking and execution.Influence platform direction without formal people management responsibilities.What We’re Looking ForCore Experience10+ years of professional experience building backend or platform systems in production environments.Strong hands-on expertise in Python and backend service development.Deep understanding of distributed systems, concurrency, fault tolerance, and performance optimization.Experience designing APIs, microservices, and data-intensive systems.Infrastructure & CloudSolid experience with cloud-native architectures on AWS or GCP.Hands-on exposure to Kubernetes, containerized workloads, and modern CI/CD pipelines.Experience with technologies such as Postgres, Redis, Kafka, RabbitMQ, Ray, or similar systems.Familiarity with analytical data stores like ClickHouse or Druid is a strong plus.Leadership & OwnershipProven ability to work autonomously and drive complex initiatives from concept to production.Strong problem decomposition and decision-making skills in ambiguous environments.Excellent communication skills and comfort collaborating across distributed, cross-functional teams.A mentorship-oriented mindset with a passion for building durable systems and strong engineering culture.Bonus PointsExperience supporting ML, LLM, or GenAI systems in production.Familiarity with modern LLM frameworks, evaluation tooling, or AI monitoring platforms.Background in developer platforms, infra tooling, or internal platform teams.Why This Role Stands OutWork on a category-defining AI platform at the intersection of backend engineering and responsible AI.High-impact, high-ownership role with architectural influence across the stack.Exposure to cutting-edge AI workloads without requiring ML research background.Opportunity to shape how enterprises build trust, transparency, and reliability into AI systems.Key SkillsBackend Systems · Platform Engineering · Distributed Systems · Python · Cloud Infrastructure · Kubernetes · Kafka · Postgres · AI Observability · System Design · Reliability Engineering · API Design · Technical Leadership\n

Apply: https://jobs.lever.co/weekdayworks/01244dea-0e2d-4649-8e07-071c1ffde63a