What You"ll Do
Design and implement solutions that enhance application reliability, performance, scalability, and resilience.
Build and maintain monitoring, alerting, observability, and telemetry to drive proactive detection and incident analysis.
Lead incident management efforts, perform root cause analysis, and implement action based improvements.
Implement operational workflows using scripting, IaaC, and configuration management tools.
Manage capacity, performance, and scaling solutions to forecast demand and optimize infrastructure.
Collaborate with engineering teams to embed operability, resilience, and security into application architectures.
Build and automate reliable deployments through CI/CD pipelines, release governance, and version control systems.
Maintain clear runbooks, architecture diagrams, and operational documentation that enable efficient production support.
Experience Required
Managing Kubernetes and containerized workloads (EKS, AKS, GKE), including scaling, networking, upgrades, and monitoring.
Experience with cloud platforms (AWS, Azure, or Google Cloud Platform) across compute, storage, networking, IAM, and cost governance.
Using observability and APM tools such as Dynatrace, Splunk, Prometheus, Grafana, Datadog, Elastic/ELK.
Strengthening security and compliance controls in regulated environments (e.g., PCI DSS, SOC 2), including secure management of workloads.
Infrastructure automation experience using Terraform, CloudFormation, Ansible, or similar tools.
Designing and maintaining CI/CD pipelines using Jenkins, GitLab CI, GitHub Actions, or Azure DevOps.
Scripting and automation using Bash, PowerShell, or Python.
Experience in environments of electricity, engineering, or military related background (preferred).
Good to Have
Certifications such as AWS SysAdmin, AWS DevOps Engineer, Google Cloud DevOps Engineer, or CKA.
Experience with legacy applications, IBM iSeries, and/or library systems.
Hands on database operations and performance tuning (Oracle, SQL Server, PostgreSQL).
Prior experience as a major incident commander, stakeholder communicator, or ops lead/coordinator.
Experience with ITIL and ServiceNow (change, incident, and configuration management).
...competencies, we strive to create an environment where employees feel welcome, respected and valued. Summary Spire is seeking an Office Coordinator to work in the Opelika location. This position will be responsible for performing clerical, general administrative, and data...
...About Rivian Rivianis on a mission to keep the world adventurous forever. This goes for the emissions-free Electric Adventure Vehicles we build, and the curious, courageous souls we seek to attract. As a company, we constantly challenge whats possible, never...
...muffin or bagel? Or perhaps snacked on a Sara Lee, Entenmanns or Marinela cake or donut? If the answer is yes, then you know Bimbo Bakeries USA! More than 20,000 associates in bakeries, sales centers, offices and on sales routes work to ensure our consumers have the...
**_Hiring Medical Device Assemblers for a Catheter Medical Device Company in Santa Clara, CA!_****Description:**+ Working in a clean room environment+ Working under a microscope to assemble the catheter+ Utilizing a variety of hand tools such as tweezers, hot box,...
...gov General Definition This is field investigatory and claims adjusting work of limited scope and difficulty. Employees in this class... ...Call (***) ***-**** if you wish to schedule a test review. Level of Education Required HighSchool/ GED Minimum...