Data Center Site Reliability Engineer

Techvilla Solutions

Culver City, CA

Posted On: Nov 12, 2025

Posted On: Nov 12, 2025

Job Overview

Job Type

Contract - W2, Contract - Independent

Experience

4 - 8 Years

Salary

Depends on Experience

Work Arrangement

On-Site

Travel Requirement

0%

Required Skills

  • Unix/Linux
  • Python
  • networking technologies
  • OpenStack
  • Kubernetes
Job Description
Roles and Responsibilities
  • Design, implement, and maintain data monitoring and alerting systems to improve issue detection and response times.
  • Ensure data accuracy and reliability through proactive quality checks and anomaly detection mechanisms.
  • Analyze, design, and implement system-level solutions to eliminate bottlenecks and enhance performance of edge services.
  • Create and maintain documentation for team processes, policies, SLOs, and operational workflows.
  • Participate in on-call rotations; troubleshoot and resolve incidents or escalate as needed to maintain service availability.
  • Operate and maintain Linux and Kubernetes environments; manage upgrades, deployments, and optimizations.
  • Work closely with software, network, and infrastructure teams to deliver high-availability and scalable systems.
 
Qualifications
  • Education: Bachelor’s degree or higher in Computer Science, Information Technology, or a related field.
  • Experience: Minimum of 2+ years of related experience (3+ years preferred).
  • Technical Expertise:
    • Strong background in Unix/Linux systems — from kernel to shell.
    • Familiarity with system libraries, file systems, and client-server protocols.
    • Ability to read and understand Python scripts used in platform operations.
    • Hands-on experience with networking technologies (TCP/IP, BGP, DNS, etc.) in carrier-grade environments.
    • Practical experience with one or more of the following: OpenStack, Kubernetes, Nginx, ipvs, ELK stack, Hadoop, or similar systems.
  • Excellent problem-solving skills, attention to detail, and ability to work effectively in a fast-paced, collaborative environment.

 

Preferred Skills (Nice to Have)
  • Experience with automation and infrastructure-as-code tools (e.g., Ansible, Terraform).
  • Knowledge of distributed systems, load balancing, and fault tolerance.
  • Familiarity with observability platforms (Prometheus, Grafana, ELK).

Job ID: TS250300


Posted By

Vivek

Information Technology Recruiter