Site Reliability Engineer (SRE)

Neshent Tech

Deerfield, IL

Posted On: Apr 29, 2026

Job Overview

Job Type

Full-time

Experience

6 - 10 Years

Salary

Depends on Experience

Work Arrangement

On-Site

Travel Requirement

Required Skills

SRE
Azure
Terraform

Job Description

We are looking for an experienced Site Reliability Engineer (SRE) to ensure the reliability, availability, and performance of Azure-based services in a large-scale enterprise environment. This role involves managing cloud infrastructure, enhancing observability, implementing disaster recovery strategies, and driving reliability improvements through SLOs/SLIs and automation.

Key Responsibilities

Define and manage SLOs, SLIs, and Error Budgets for Azure-hosted services, reporting SLA compliance to stakeholders.
Lead architectural reviews, ensuring reliability targets (availability, RTO/RPO) are met from design to production.
Implement chaos engineering practices and conduct disaster recovery drills across Azure regions.
Serve as Incident Commander for P1/P2 incidents, owning the incident lifecycle and post-mortem actions.
Design and operate enterprise observability using Azure Monitor, Log Analytics, Application Insights, and Grafana.
Develop alerting frameworks and automate self-healing operations with Azure Automation and scripting (Python/PowerShell).
Embed reliability gates in CI/CD pipelines and manage AKS cluster reliability (scaling, upgrades, security).
Enforce infrastructure-as-code best practices with Terraform/Bicep for Azure Landing Zones.

Required Qualifications

7+ years in SRE, platform engineering, or cloud infrastructure in large-scale environments.
4+ years of hands-on Azure experience with AKS and cloud engineering.
Expertise in Terraform (required), Bicep, and managing Azure Landing Zones.
Proficiency in Python, Go, or PowerShell scripting.
Experience with Azure observability tools (Monitor, Log Analytics, Application Insights).
Proven track record of owning SLOs/SLIs and improving production reliability.

Job ID: NT221118

Posted By

Abhishek

Resource Manager