Roles and Responsibilities
- Own 24x7 platform availability and SLA adherence; lead major incidents, problem management, and post-incident RCAs; enforce runbooks and audit-ready change management.
- Plan and execute vendor-supported upgrades; maintain software currency, PTF/RSU adoption, and security compliance; validate subsystem interdependencies.
- Tune performance (DB2, IMS, CICS, MQ); optimize workloads, batch windows, and WLM; drive automation and observability to reduce manual toil.
- Ensure audit readiness, enforce access controls, maintain configuration baselines, and conduct BCP/DR exercises.
- Define operational roadmaps, OKRs, and executive summaries; collaborate with applications and business teams; mentor staff and promote continuous improvement.
- Triage incidents, approve service requests, perform health checks, validate backups/restores, and ensure batch and SLA compliance.
Required Qualifications
- 10+ years in Mainframe Operations/System Programming (z/OS, DB2, IMS, CICS, MQ).
- Leadership in incident management, platform upgrades, and audit-compliant change management.
- Expertise in IBM tooling: SMP/E, PARMLIB/PROCLIB, ISPF, RACF, SDSF, JES2/3, OMEGAMON/IBM Z Observability, IWS/TWS/IWA, ChangeMan.
- Performance tuning, workload management, CAP/DR, and security hardening experience.
- Scripting/automation: Rexx, JCL; Python on z/OS or Ansible for z/OS a plus.
- Strong communication for executive summaries, runbooks, and RCAs.
Preferred Qualifications
- IBM z16 experience; DB2/IMS upgrades; MQ clustering and backlog resolution.
- DevOps/mainframe CI/CD exposure; observability platforms.
- Certifications: IBM z/OS, DB2, CICS, MQ; ITIL, COBIT; ISO/PCI/SOX compliance.
Tools & Environment
z/OS, DB2, IMS, CICS, MQ, JES2/3; Workload Scheduler/Automation; ChangeMan; RACF/ACF2/Top Secret; OMEGAMON, RMF/SMF, IBM Z Observability; Rexx/CLIST, JCL, Ansible (optional).