Dynatrace Reliability Engineer

Techvilla Solutions

Plano, TX

Posted On: Feb 24, 2025

Posted On: Feb 24, 2025

Job Overview

Job Type

Full-time

Experience

6 - 12 Years

Salary

$110,000 - $130,000 Per Year

Work Arrangement

On-Site

Travel Requirement

0%

Required Skills

  • Python
  • Continous Intergration
  • Reliability engineer
  • DevOps
  • SRE
  • PowerShell
  • Bash
  • CI/CD
Job Description
Roles and Responsibilities
  • Design and implement observability strategies using Dynatrace to monitor application performance, infrastructure health, and user experience.
  • Set up real-time monitoring, dashboards, alerts, and anomaly detection for proactive incident management.
  • Analyze performance bottlenecks and optimize applications, APIs, databases, and cloud infrastructure.
  • Conduct root cause analysis (RCA) and post-mortems for incidents, driving long-term reliability improvements.
  • Work closely with DevOps, development, and infrastructure teams to improve system reliability and scalability.
  • Define and track Service Level Indicators (SLA)
  • Automate monitoring and performance tuning processes to enhance operational efficiency.
  • Provide expertise in log analysis, distributed tracing, and AI-driven performance insights.
  • Mentor teams on best practices for observability, incident response, and proactive system health management.
 
Required Skills/Qualifications
  • 5+ years of experience in site reliability engineering (SRE), performance monitoring, or infrastructure reliability.
  • Hands-on experience with Dynatrace, including setup, configuration, dashboarding, and troubleshooting.
  • Strong knowledge of APM (Application Performance Monitoring), log management, and distributed tracing.
  • Experience with incident response, root cause analysis, and system optimization.
  • Strong scripting skills (Python, Bash, PowerShell) for automation.
  • Familiarity with CI/CD pipelines and DevOps practices.
  • Strong analytical and problem-solving skills with a data-driven approach.
  • Experience with additional observability tools (New Relic, Splunk, Prometheus, Grafana) is a plus.
 
Preferred Experience
  • Dynatrace certification or relevant training.
  • Experience in performance testing and load testing (JMeter, LoadRunner).
  • Exposure to AIOps and machine learning-driven monitoring.
  • Experience with chaos engineering to test system resilience.
  • Optimize cloud-based environments (AWS, Azure, GCP) for high availability and resilience.
  • Proficiency in monitoring microservices, Kubernetes, cloud platforms (AWS/Azure/GCP), and containerized environments.

Job ID: TS250064


Posted By

Vivek

Information Technology Recruiter