Observability Engineer – Grafana, Prometheus, Thanos
Neshent Tech
Chandler, AZ
Posted On: Mar 25, 2026
Posted On: Mar 25, 2026
Job Overview
Salary
Depends on Experience
Required Skills
- Observability
- ML/LLM
- Data Analysis/Visualization
- Grafana
- Tableau
- SQL
- PromQL
- Prometheus
Job Description
Roles and Responsibilities
- Design and implement observability frameworks for ML/LLM applications
- Build telemetry for AI models including latency, token usage, throughput, error rates, and SLOs
- Develop and maintain self-service observability dashboards
- Monitor model performance, data drift, reliability, and cost metrics
- Create dashboards using Grafana and Tableau for operational insights
- Implement monitoring using Prometheus and Thanos for scalable metrics collection
- Analyze time-series data and build actionable visualizations.
- Partner with ML engineers and platform teams to improve system reliability
- Define and track SLOs for AI endpoints and services
- Enable end-to-end observability using metrics, logs, and traces
Required Skills
- Strong experience in Data Analysis and Visualization
- Hands-on experience with Grafana dashboard creation
- Experience with Tableau, Grafana, Prometheus, and Thanos stack
- Strong knowledge of SQL and time-series data
- Experience with PromQL
- Working knowledge of Linux environments
- Expertise in building telemetry dashboards
- Understanding of different visualization graphs and charts
- Experience monitoring production systems and observability pipelines
Preferred Qualifications
- Experience with ML/AI observability
- Knowledge of LLM monitoring metrics (tokens, latency, hallucination tracking, etc.)
- Experience defining SLOs/SLIs
- Familiarity with distributed tracing and logging frameworks
- Experience with large-scale observability platforms
Job ID: NT220824
Related Jobs

COMPANY
Long Finch Technologies

experience
8 - 15 Years

Work Arrangement
On-Site

SALARY
Depends on Experience

SKILLS
- AWS
- Google Cloud
- Azure
- DevOps
- +6 more

COMPANY
Neshent Tech

experience
6 - 12 Years

Work Arrangement
On-Site

SALARY
Depends on Experience

SKILLS
- Observability
- ML/LLM
- Data Analysis/Visualization
- Grafana
- +4 more

COMPANY
Long Finch Technology

experience
9 - 14 Years

Work Arrangement
On-Site

SALARY
Depends on Experience

SKILLS
- Core Java
- Spring/Spring Boot
- Kubernetes
- Apache Kafka
- +4 more

COMPANY
2T Consulting

experience
7 - 10 Years

Work Arrangement
On-Site

SALARY
Depends on Experience

SKILLS
- Apache Kafka Operations
- Grafana
- Prometheus
- Splunk
- +1 more
- Contract - W2
- Contract - Independent

COMPANY
Neshent Tech

experience
8 - 15 Years

Work Arrangement
Hybrid

SALARY
Depends on Experience

SKILLS
- DevOps
- Grafana
- Prometheus
- OpenTelemetry
- +3 more