We are seeking a Reliability Engineer (SRE) with expertise in Sterling OMS and experience in distributed systems. The role involves developing automations, leveraging AI/GenAI tools, and ensuring the reliability, scalability, and performance of OMS operations. The ideal candidate will have strong operational skills, a proactive attitude, and the ability to communicate effectively with customers and offshore teams.
Roles and Responsibilities
- Design, implement, and maintain reliable OMS systems and distributed applications.
- Develop automations using AI/GenAI tools to improve operational efficiency.
- Monitor, troubleshoot, and optimize system performance and reliability.
- Handle customer communication regarding incidents, changes, and system updates.
- Coordinate with offshore teams to ensure smooth operations and timely resolution of issues.
- Collaborate with development teams to implement reliability best practices.
- Scale operational processes and tools to support growing business requirements.
Required Skills & Competencies
- Hands-on experience with Sterling OMS.
- Strong knowledge of distributed systems and system reliability principles.
- Proficiency in automation and scripting (Python, Shell, or similar).
- Familiarity with AI/GenAI tools for process automation.
- Excellent communication skills for coordination with customers and offshore teams.
- Proactive, solution-oriented attitude with a focus on scalability and reliability.