We are not looking for someone to simply “fix servers.” We are looking for a builder - someone who values deterministic rollouts (FluxCD), strong observability (SigNoz / Prometheus), and who is excited about shaping the future of AI-assisted platform engineering with us.
In this role, you will act as a trusted expert, bridging the gap between code and production and helping our teams ship software safely and reliably.
We are also leaning into AI coding assistants - not just for boilerplate, but to build smarter guardrails across our CI/CD workflows and platform automation.
Key Responsibilities:
- Design, build, and operate scalable cloud infrastructure using Infrastructure as Code (Terraform, Ansible), ensuring repeatability, reliability, and reduced operational toil.
- Automate provisioning, lifecycle management, and upgrades of Kubernetes clusters across multiple environments (cloud, on-prem, hybrid).
- Own end-to-end observability: implement and maintain metrics (Prometheus), log aggregation (Signoz), dashboards (Grafana), distributed tracing, and alerting to enable fast incident detection and root-cause analysis
- Develop and maintain CI/CD pipelines for infrastructure and application artifacts; introduce and operate GitOps workflows (FluxCD) for deterministic, controlled rollouts
- Integrate DevSecOps practices including automated vulnerability scanning, secrets rotation, and policy enforcement to improve the security posture of production systems
- Reliability-focused initiatives: HA configurations, chaos/failure testing, capacity reviews, and service hardening
- Collaborate with cross-functional engineering teams on architecture reviews, troubleshooting of distributed workloads, and performance optimization of critical components
- Be the trusted expert and partner for the engineering teams to bring ideas to production - secure, reliable, scalable, maintainable and cost efficient.
Orchestration: Kubernetes (Cloud + On-prem)
IaC: Terraform, Terragrunt, Ansible
CD/GitOps: FluxCD, GitLab CI
Observability: SigNoz, Prometheus, Grafana
Security: HashiCorp Vault, DevSecOps automation
