Dynatrace Reliability Engineer Jobs in Plano, TX

Techvilla Solutions

Plano, TX

Posted On: Feb 24, 2025

Job Type

Full-time

Experience

6 - 12 Years

Salary

$110,000 - $130,000 Per Year

Work Arrangement

On-Site

Travel Requirement

Required Skills

Design and implement observability strategies using Dynatrace to monitor application performance, infrastructure health, and user experience.
Set up real-time monitoring, dashboards, alerts, and anomaly detection for proactive incident management.
Analyze performance bottlenecks and optimize applications, APIs, databases, and cloud infrastructure.
Conduct root cause analysis (RCA) and post-mortems for incidents, driving long-term reliability improvements.
Work closely with DevOps, development, and infrastructure teams to improve system reliability and scalability.
Define and track Service Level Indicators (SLA)
Automate monitoring and performance tuning processes to enhance operational efficiency.
Provide expertise in log analysis, distributed tracing, and AI-driven performance insights.
Mentor teams on best practices for observability, incident response, and proactive system health management.

5+ years of experience in site reliability engineering (SRE), performance monitoring, or infrastructure reliability.
Hands-on experience with Dynatrace, including setup, configuration, dashboarding, and troubleshooting.
Strong knowledge of APM (Application Performance Monitoring), log management, and distributed tracing.
Experience with incident response, root cause analysis, and system optimization.
Strong scripting skills (Python, Bash, PowerShell) for automation.
Familiarity with CI/CD pipelines and DevOps practices.
Strong analytical and problem-solving skills with a data-driven approach.
Experience with additional observability tools (New Relic, Splunk, Prometheus, Grafana) is a plus.

Dynatrace certification or relevant training.
Experience in performance testing and load testing (JMeter, LoadRunner).
Exposure to AIOps and machine learning-driven monitoring.
Experience with chaos engineering to test system resilience.
Optimize cloud-based environments (AWS, Azure, GCP) for high availability and resilience.
Proficiency in monitoring microservices, Kubernetes, cloud platforms (AWS/Azure/GCP), and containerized environments.

Job ID: TS250064

Posted By

Vivek

Information Technology Recruiter