We are seeking a skilled Site Reliability Engineer (SRE) with a focus on L2 Production Support to join our dynamic team. The ideal candidate will have experience with Node.js, AWS Lambda, and strong production support capabilities.
Primary Responsibilities
- Provide Level 2 production support for applications and services, ensuring high availability and performance.
- Monitor system health and performance, utilizing tools like New Relic to set up alerts and dashboards.
- Troubleshoot and resolve incidents effectively and efficiently, minimizing downtime.
- Collaborate with development teams to enhance system reliability and performance.
- Automate routine tasks to improve efficiency and reduce manual intervention.
- Document processes, incidents, and resolutions for knowledge sharing and future reference.
Required Skills
- Proven experience in providing L2 support in a production environment.
- Proficiency in developing and troubleshooting applications built with Node.js.
- Experience in serverless architecture, specifically with AWS Lambda functions.
- Familiarity with ServiceNow for incident management and service requests.
- Experience in setting up alerts and monitoring through New Relic dashboards.
Preferred Qualifications
- Strong problem-solving skills and ability to work under pressure.
- Excellent communication skills for effective collaboration with cross-functional teams.
- Familiarity with CI/CD processes and tools.