Job Summary
We are seeking an experienced Azure Databricks DevOps/SRE Engineer to support and optimize cloud-based data platforms and distributed systems. The ideal candidate will have strong expertise in CI/CD automation, infrastructure as code, monitoring, and reliability engineering for Azure data services.
Required Skills
- Strong experience with Databricks, Spark, Azure (IaaS), Python, Cosmos DB
- Hands-on experience with Azure DevOps (GitHub, CI/CD pipelines, Boards)
- Experience with Docker and Azure Kubernetes Service (AKS)
- Monitoring & observability tools such as Grafana
- Experience with JUnit, Postman, SonarQube
- 5+ years of DevOps/SRE experience supporting data platforms or distributed systems
Additional Skills
- Strong expertise in CI/CD pipelines (Azure DevOps/GitHub Actions) for Databricks, Spark jobs, and Azure data components
- Infrastructure as Code (Terraform, Bicep, ARM) for provisioning Azure resources (Databricks, ADLS Gen2, Event Hubs, Key Vault)
- Solid understanding of build and release strategies for notebooks, JARs, wheels, and configuration artifacts
- Experience setting up monitoring, logging, and alerting using Azure Monitor / Log Analytics
- Strong scripting skills (Python, PowerShell, Bash)
- Knowledge of Azure security best practices (Managed Identities, RBAC, Secrets Management)
- Exposure to reliability engineering concepts (SLOs, SLIs, error budgets)
- Experience with containerization and orchestration (Docker, Kubernetes, AKS)
Roles & Responsibilities
- Design, create, and maintain CI/CD pipelines for Databricks, Spark jobs, and Azure data components
- Develop Infrastructure as Code (Terraform/Bicep/ARM) to provision and manage Azure resources
- Implement release strategies for notebooks, JARs, wheels, and configuration artifacts
- Set up monitoring, logging, and alerting for Spark jobs and data pipelines
- Ensure platform reliability, scalability, and performance optimization
- Collaborate and coordinate with offshore teams (India) for project execution and support