Site Reliability Engineer Job at Omniscius Consulting, United States

eFdkTU0yZlQ5bTVjY3duaFdiQ3Evd2dkc0E9PQ==
  • Omniscius Consulting
  • United States

Job Description

Our client is seeking a Site Reliability Engineer (SRE) that will be responsible for ensuring the reliability, performance, and scalability of the software, websites, and applications. This role requires a combination of software engineering and systems administration skills to monitor, control, and automate systems. The ideal candidate will have a deep understanding of cloud infrastructure, automation tools, and best practices for maintaining high availability and performance. This position plays a critical role in maintaining the overall health and efficiency of our platform.

Key Responsibilities:

System Monitoring and Maintenance:
‍- Monitor the performance and reliability of Kubernetes clusters, software, websites, and applications.
- Automate routine maintenance tasks to ensure system stability and performance.

Incident Response and Troubleshooting:
- Respond to and resolve incidents in a timely manner, minimizing downtime and impact on users.
- Conduct root cause analysis to identify and address underlying issues.
- Develop and implement strategies to prevent future incidents and improve system resilience.

Automation and Infrastructure Management:
‍- Design, build, and maintain automated systems and processes to improve efficiency and reduce manual intervention.
- Manage cloud infrastructure, including provisioning, scaling, and optimizing resources.
- Collaborate with development teams to ensure seamless deployment and integration of new features and updates.

Performance Optimization:
‍- Analyze system performance and identify areas for improvement.
- Implement performance tuning and optimization techniques to enhance system efficiency.
- Collaborate with cross-functional teams to ensure optimal performance of all components.

Security and Compliance:
‍- Ensure compliance with security best practices and industry standards.
- Implement and maintain security measures to protect systems and data.
- Conduct regular security audits and vulnerability assessments.

Documentation and Reporting:
‍- Maintain accurate and up-to-date documentation of systems, processes, and procedures.
- Generate and analyze reports on system performance, incidents, and other key metrics.
- Provide regular updates to management and stakeholders on system health and performance.

Continuous Improvement:
- Identify opportunities for improving system reliability, performance, and scalability.
- Stay up-to-date with industry trends and best practices in site reliability engineering.
- Participate in training and development opportunities to enhance skills and knowledge.

Qualifications:
- Deep expertise of Kubernetes and containers.
- Strong understanding of cloud infrastructure, automation tools, and best practices for maintaining high availability and performance.
- Experience with monitoring and logging tools such as Loki, Grafana.
- Minimum of 3 years of experience in site reliability engineering, Kubernetes administration, or a related role.
- Excellent problem-solving skills and attention to detail.
- Strong communication and interpersonal skills, with the ability to work effectively with cross-functional teams.

Job Tags

Full time,

Similar Jobs

Baron Ad Work

Virtual Chat Assistant-Part Time Job at Baron Ad Work

 ...Job Title: Virtual Chat Assistant (Part-Time) Job Description: We are looking for a Virtual...  ...to detail and accuracy Ability to work independently and manage time effectively...  ...: Flexible part-time hours (work from home) Competitive pay with weekly or bi-... 

Cisco

AI Researcher Job at Cisco

 ...hybrid in the San Francisco office at least 3 days a week; however, remote candidates will be considered.Meet the TeamFoundation AI's...  ...the Robust Intelligence acquisition in fall 2024 to lead research and development of security and AI at Cisco. The team has deep... 

OralEye

Dentist, Teledentistry - Remote (Additional Earning Opportunity) Job at OralEye

 ...qualified Dentists licensed in Wyoming to work remotely for our growing teledentistry platform...  ...company, working from your own home or office, and leading the future of dentistry...  ...encourage you to consider joining the OralEye Dental Network, the worlds largest teledentistry... 

FlexDox

Neurosurgeon Opportunity! Job at FlexDox

(Physician/MD qualifications required) Neurological Surgery - We are actively seeking a highly skilled neurosurgeon to join our team in South Carolina covering the beautiful areas of Charleston, Greenville, Ft. Mill, Spartanburg and West Columbia . Our company jet will... 

Two95 International Inc.

SQL Database Administrator Job at Two95 International Inc.

 ...Pasadena, CA Job Type: Full Time Skills / Attributes Required 8+ years of in-depth SQL Server Database Administration experience Extensive experience of managing multiple MS SQL 2008/2012/2014/2016 servers on enterprise-class servers clusters running Windows 20...