(M) Staffing – 3x Operations Engineer

Remote

Full Time

Devops - Kubernettes - Docker - Terraform

Senior Manager/Supervisor

Kubernetes On-Premise Operations Engineer

Location: Remote (only Bolivia candidates)
Type: Full-Time
Project Scope: Iridium Panama (end of 2025)

We are seeking a Kubernetes On-Premise Operations Engineer to manage and maintain our on-premise Kubernetes infrastructure. This role is focused on day-to-day operations, proactive monitoring, troubleshooting, and ensuring high availability and system stability. The engineer will collaborate closely with Level 3 Engineers who provide the infrastructure backbone, ensuring seamless and reliable production operations.

Scope of Applications Supported

Mi Tigo – Serving 6 countries
Tigo Sports – Available in 6 countries
Apigee – Active in 1 country
KannelGateway – Used across 9 countries

Key Responsibilities

Kubernetes Cluster Management
- Apply patches and updates
- Monitor and troubleshoot performance issues
Incident Management & On-Call Support
- Participate in on-call rotation
- Respond to incidents, perform root cause analysis (RCA), and document resolutions
Networking & Ingress Management
- Operate and troubleshoot Cilium, Nginx Ingress Controller, and Traefik
Storage & Databases
- Support and maintain NFS, MongoDB, MySQL, PostgreSQL ensuring performance and data integrity
Observability & Monitoring
- Manage Prometheus, Grafana, and Loki for proactive alerting and system logging
Automation & Configuration Management
- Use Helm, Ansible, and CI/CD pipelines to apply and manage infrastructure configurations
Production Deployments
- Execute, monitor, and manage production deployments with proper rollback strategies
OS & Security Management
- Maintain Ubuntu-based systems, ensuring they are patched, secure, and performant

Requirements

5+ years in Operations, SRE, or DevOps roles
3+ years managing on-premise Kubernetes clusters
Strong troubleshooting skills in:
- Kubernetes
- Networking
- Databases (MongoDB, MySQL, PostgreSQL)
Proficient in monitoring tools: Prometheus, Grafana, Loki
Familiar with operational processes, incident management, and runbooks
Experience with Helm, Ansible, and optionally Terraform
Prior experience with production on-call support and incident resolution
Competent in performing production deployments under change management practices
Experience managing Ubuntu systems

Apply for this position

Required*

Apply with Indeed

First Name*

Last Name*

Email Address*

Phone*

Address

Resume*

We've received your resume. Click here to update it.

Attach resume or Paste resume

Attach resume as .pdf, .doc, .docx, .odt, .txt, or .rtf (limit 5MB) or Paste resume

Paste your resume here or Attach resume file

Human Check*

Submit Application

BELIEVE Solutions