Grafana & Telemetry Engineer

Neshent Tech

Austin, TX

Posted On: Nov 20, 2024

Posted On: Nov 20, 2024

Job Overview

Job Type

Full-time

Experience

8 - 10 Years

Salary

Depends on Experience

Work Arrangement

Hybrid

Travel Requirement

0%

Required Skills

  • Dynatrace
  • Grafana
  • Telemetry
  • Loki
  • APM
Job Description
Roles and Responsibilities
  • Oversee the migration from Dynatrace to Sentry and Grafana, ensuring seamless continuity of monitoring, tracing, and alerting capabilities across applications.
  • Design, build, and maintain Grafana dashboards to visualize metrics, KPIs, and application health, while configuring alerts to proactively track performance and issues.
  • Implement and manage Sentry for comprehensive error tracking, performance monitoring, and diagnostics across multiple applications and services.
  • Set up and configure Prometheus or Loki for efficient metrics and log aggregation, providing real-time insights into application performance and behavior.
  • Collaborate with development, DevOps, and infrastructure teams to integrate monitoring tools into the CI/CD pipeline for continuous monitoring and automated testing.
  • Define and implement SLA-based alerts and notifications to track and measure application performance, reliability, and user experience.
  • Conduct in-depth root cause analysis (RCA) for critical incidents, leveraging distributed tracing and monitoring data to identify and resolve issues.
  • Automate monitoring and alerting tasks using Python, Bash, or similar scripting languages to streamline operations and improve efficiency.
  • Ensure secure and compliant access to monitoring tools by configuring roles and permissions, and ensuring adherence to security best practices.
  • Document the migration process, create knowledge base articles, and provide training to internal teams to ensure smooth transitions and long-term efficiency.

 

Required Skills & Experience
  • Proven experience in application monitoring, tracing, and observability tools like Dynatrace, Grafana, Sentry, Prometheus, and Loki.
  • Strong understanding of Application Performance Management (APM) concepts, distributed tracing, and error tracking practices.
  • Hands-on experience building custom Grafana dashboards and configuring alerting for application health monitoring.
  • Expertise in setting up and integrating Sentry for error tracking and performance monitoring across multiple applications.
  • Familiarity with Prometheus for metrics aggregation and Loki for log aggregation to provide full-stack observability.
  • Experience integrating monitoring tools into CI/CD pipelines, aligning with DevOps best practices.
  • Proficiency in Python, Bash, or similar scripting languages to automate monitoring tasks, such as alerting and incident response.
  • Solid understanding of incident management, performing root cause analysis (RCA), and SLA tracking to ensure high availability and minimal downtime.
  • Experience with API integration and data transformation between observability platforms to streamline monitoring workflows.
  • Knowledge of security and compliance principles, particularly regarding access management and data governance for monitoring tools.

 

Preferred Qualifications
  • Prior experience migrating from Dynatrace or similar observability platforms to other monitoring and telemetry tools.
  • Familiarity with microservices and cloud-native monitoring solutions, including Kubernetes and containerized environments.
  • Experience working in an Agile environment with cross-functional teams to deliver iterative improvements and features.
  • Certifications in Grafana, Prometheus, or other relevant observability platforms are a plus.

Job ID: NT240477


Posted By

Abhishek

HR Manager