Works with enterprise platform, network, storage, etc. and external vendor teams to ensure Upgrade planning, platform migrations, app config standards, and high HA.
Bridge between Platform and app engineering/ partners with application SRE.
Collaborate with cross-functional teams to define and establish service level objectives (SLOs) and service level agreements (SLAs) for critical systems.
Perform capacity planning and resource allocation to ensure optimal system performance and scalability.
Monitor systems and applications, proactively identifying and resolving any performance bottlenecks or availability issues.
Develop and maintain monitoring tools, alerts, and dashboards to provide visibility into system health and performance.
Requirements
6+ years of experience as SRE and knowledge of Platform ( AWS/Kubernetes).
6+ years of Telemetry experience, Obsessive elimination of Single points of failure, Application config standards.
Deep knowledge of platform (AWS/ Kubernetes etc) as platform engineer.
Skill set is high in monitoring tools such as Grafana, Data Dog, EAPM, Splunk.
Proficiency in scripting languages such as Python, Shell, or Perl.
Understanding of networking principles and protocols (TCP/IP, HTTP, DNS, etc).