Posted On: Jul 02, 2026
Job Overview
Salary
Depends on Experience
Required Skills
- OpenTelemetry
- CloudWatch
- Azure Monitor
- Apache Kafka
- Amazon SQS
- Python
Job Description
Roles and Responsibilities
- Design and implement OpenTelemetry instrumentation for applications, services, and cloud platforms.
- Develop and maintain OpenTelemetry Collector pipelines, including receivers, processors, exporters, and routing.
- Instrument distributed applications using OpenTelemetry SDKs (Java, Python, Node.js, Go).
- Configure tracing, metrics, and structured logging following OpenTelemetry semantic conventions.
- Implement observability for Kubernetes, containers, serverless platforms, and cloud services.
- Instrument asynchronous messaging systems such as Kafka and SQS with end-to-end trace propagation.
- Develop internal instrumentation libraries and reusable observability components.
- Define and implement SLIs, SLOs, and application performance monitoring (APM) standards.
- Optimize telemetry pipelines through sampling strategies, cardinality management, and performance tuning.
- Collaborate with Platform Engineering, SRE, DevOps, and development teams to establish observability best practices.
Required Qualifications
- Bachelor's degree in Computer Science, Information Technology, or a related field.
- 8+ years of software engineering or observability experience.
- Strong hands-on experience with OpenTelemetry and OpenTelemetry Collector.
- Experience with distributed tracing, metrics, logs, and APM solutions.
- Proficiency in one or more programming languages: Java, Python, Go, or Node.js.
- Experience with Kubernetes and cloud platforms including AWS, Azure, and GCP.
- Knowledge of serverless and containerized environments.
- Experience with cloud monitoring services such as CloudWatch, Azure Monitor, and Google Cloud Monitoring.
- Experience with messaging platforms such as Kafka or SQS.
- Strong understanding of telemetry pipelines, sampling, trace context propagation, and observability best practices.
Preferred Skills
- Experience with Prometheus, Grafana, Jaeger, or Tempo.
- Knowledge of network telemetry (NetFlow, sFlow, SNMP, IPFIX, BGP).
- Experience supporting AIOps or machine learning-driven observability platforms.
- Familiarity with DevOps, CI/CD, and Infrastructure as Code.
Job ID: NT221669