What You"ll Do
Design and implement solutions that enhance application reliability, performance, scalability, and resilience.
Build and maintain monitoring, alerting, observability, and telemetry to drive proactive detection and incident analysis.
Lead incident management efforts, perform root cause analysis, and implement action based improvements.
Implement operational workflows using scripting, IaaC, and configuration management tools.
Manage capacity, performance, and scaling solutions to forecast demand and optimize infrastructure.
Collaborate with engineering teams to embed operability, resilience, and security into application architectures.
Build and automate reliable deployments through CI/CD pipelines, release governance, and version control systems.
Maintain clear runbooks, architecture diagrams, and operational documentation that enable efficient production support.
Experience Required
Managing Kubernetes and containerized workloads (EKS, AKS, GKE), including scaling, networking, upgrades, and monitoring.
Experience with cloud platforms (AWS, Azure, or Google Cloud Platform) across compute, storage, networking, IAM, and cost governance.
Using observability and APM tools such as Dynatrace, Splunk, Prometheus, Grafana, Datadog, Elastic/ELK.
Strengthening security and compliance controls in regulated environments (e.g., PCI DSS, SOC 2), including secure management of workloads.
Infrastructure automation experience using Terraform, CloudFormation, Ansible, or similar tools.
Designing and maintaining CI/CD pipelines using Jenkins, GitLab CI, GitHub Actions, or Azure DevOps.
Scripting and automation using Bash, PowerShell, or Python.
Experience in environments of electricity, engineering, or military related background (preferred).
Good to Have
Certifications such as AWS SysAdmin, AWS DevOps Engineer, Google Cloud DevOps Engineer, or CKA.
Experience with legacy applications, IBM iSeries, and/or library systems.
Hands on database operations and performance tuning (Oracle, SQL Server, PostgreSQL).
Prior experience as a major incident commander, stakeholder communicator, or ops lead/coordinator.
Experience with ITIL and ServiceNow (change, incident, and configuration management).
...Full-time Are you passionate about wildlife and ready to make a measurable impact? Asia Wild is seeking a creative and driven Fundraiser to help rebuild and grow our donor base. As our Fundraiser, you will design and execute campaigns, steward donors, secure grants...
...essential. What are we expecting from you? The Atelier de New York has created a Apprenticeship program dedicated to young Jeweller graduates to train and prepare them for the manufacture of our High Jewelery pieces.In this context, you will be accompanied by...
...Overview Job Title: Payroll & Accounting Specialist Experience: 2 years plus of experience in payroll and accounting... ...automation services and innovative environmental and safety products to industrial facilities. We are a young company with immense...
...We are seeking a Community Behavioral Health Worker! Pickaway County, OH Join our Team! Integrated Services for Behavioral Health (ISBH) is a community-minded, forward-thinking behavioral health organization helping people along the road to health and well...
...Develop comprehensive event plans from concept to execution, ensuring alignment with organizational objectives, target audience, and event goals[1] Collaborate with clients, stakeholders, and internal teams to understand requirements, preferences, timeline, and budget...