Site Reliability Engineer
Our client is looking for a Site Reliability Engineer to assist in effective health monitoring of the production systems within their technical operations organization. This is a long-term contract opportunity in which you will work on a 24/7 team responsible for deploying code changes, handling escalations, and communicating change management. As subject matter expert, you will liaise between departments, vendors and partners on organization-wide issues related to the production site, customer support, help desk and core systems.
- Perform 24x7x365 health monitoring of Unix and Windows environments hosting various web-based, mobile, and telephony platforms using server, network and application monitoring systems
- Deploy/release engineering code across multiple environments
- Ensure all builds/releases are communicated and applied to staging and production environments
- Provide application support for Unix and Windows applications
- Perform routine system administration tasks and operating procedures as needed
- Prioritize system events according to severity and escalation procedures
- Troubleshoot and/or resolve production issues in Unix and Windows environments
- Take the lead on production outages and delegate dashboard assignments.
- Manage escalations in real-time , driving them to resolution according to established procedures
- Accurately communicate production emergencies (via emergency conference bridge calls and outage notifications) with internal and external groups
- Develop quality control on communications for code releases, schedule, maintenances and service interruptions
- Monitor infrastructure change management policies and procedures
- Mentor junior staff members
- Work a flexible schedule as needed (day, evening and/or third shifts)
What Gets You the Job?
- A Bachelor's Degree in Computer Science or a related technical field
- Equivalent practical system administration and programming experience will be considered
- 3+ years’ experience in a operations center or equivalent experience
- Ability to work in a command line and GUI environments
- 3+ years’ experience in running scripts, grepping logs, processing event log messages, troubleshooting errors in Windows
- 3+ years of managing environments with VMware Horizon 7 and Commvault
- Hands-on experience with AWS, Jenkins, Chef, Ivanti Patch Management, Docker, Kubernetes/EKS, ServiceNow, New Relic, SumoLogic, Nagios or Opsview
- Programming knowledge in Powershell, Python, Go, Ruby, or PHP, etc.
- Proficient with networking protocols such as TCP/IP, HTTP, SSL, and DNS, or of Linux system internals
- Familiarity with large-scale system design, monitoring, and operational practices
- Ability to create accurate, timely reports
- Excellent team leadership and communication skills (written and verbal)
Send us your resume today!
We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.