Senior Site Reliability Engineer – Observability

Location: Santa Ana, CA

Skills: DevOps, Splunk/AppDynamics/Etc, .Net, AWS

Job Description

Senior Site Reliability Engineer – Observability
Our client is looking for a Senior Site Reliability Engineer – Observability to join their team! In this position, you will work on an extensive transformation effort, moving from a classic support model to a site reliability engineering model, using on-premise and cloud technologies. This is a key role collaborating with stakeholders and development and operations teams to ensure availability and reliability of the application and infrastructure.

Key responsibilities:

  • Build end-to-end monitoring infrastructure (Logging, Metrics, Tracing) and work with production engineers to provision the right tools
  • Manage live production incidents and serve as escalation point for systems administrators, engineers, and other technology teams
  • Debug/troubleshoot application and infrastructure issues, follow and implement SRE best practices
  • Monitor and analyze infrastructure performance using standard performance monitoring tools (knowledge of Perfmon, PerfView, ProcDump, DebugDiag, etc. is a plus)
  • Establish metrics such as SLIs, SLOs, Error Budgets, etc.
  • Maintain effective knowledgebase and runbooks
  • Provide weekend on-call rotation for production support
  • Keep up to date with relevant technical and business skills

What Gets You the Job?

  • 9+ years’ hands-on experience in an application and technical support role in a live production environment
  • 6+ years’ hands-on configuration and monitoring experience using tools such as Splunk, AppDynamics, ELK, Microsoft SCOM, Windows Processes, JavaScript Framework, etc.
  • 4+ years’ experience monitoring web-based applications, webservices, and database-driven applications using C#, .Net 4.5, Azure DevOps, and SQL Server 2016
  • 2+ years’ experience monitoring on AWS Workloads using AWS CloudWatch, AWS X-Ray, etc.
  • Extensive experience following development, DevOps, and SRE best practices
  • Preferred automation experience using PowerShell, Python scripting or similar
  • Excellent communication skills (written and verbal)
  • Self-driven with strong organizational skills
  • Bachelor's Degree in Computer Science or equivalent combination of education and experience

Send us your resume today!

We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.

For immediate consideration please click Apply or email resumes to:

Russell Wolf
Apply With Linkedin Back to Job Listings