(M) Staffing – 3x Operations Engineer

Remote
Full Time
Devops - Kubernettes - Docker - Terraform
Senior Manager/Supervisor

Kubernetes On-Premise Operations Engineer

Location: Remote (only Bolivia candidates)
Type: Full-Time
Project Scope: Iridium Panama (end of 2025)

We are seeking a Kubernetes On-Premise Operations Engineer to manage and maintain our on-premise Kubernetes infrastructure. This role is focused on day-to-day operations, proactive monitoring, troubleshooting, and ensuring high availability and system stability. The engineer will collaborate closely with Level 3 Engineers who provide the infrastructure backbone, ensuring seamless and reliable production operations.


Scope of Applications Supported

  • Mi Tigo – Serving 6 countries

  • Tigo Sports – Available in 6 countries

  • Apigee – Active in 1 country

  • KannelGateway – Used across 9 countries


Key Responsibilities

  • Kubernetes Cluster Management

    • Apply patches and updates

    • Monitor and troubleshoot performance issues

  • Incident Management & On-Call Support

    • Participate in on-call rotation

    • Respond to incidents, perform root cause analysis (RCA), and document resolutions

  • Networking & Ingress Management

    • Operate and troubleshoot Cilium, Nginx Ingress Controller, and Traefik

  • Storage & Databases

    • Support and maintain NFS, MongoDB, MySQL, PostgreSQL ensuring performance and data integrity

  • Observability & Monitoring

    • Manage Prometheus, Grafana, and Loki for proactive alerting and system logging

  • Automation & Configuration Management

    • Use Helm, Ansible, and CI/CD pipelines to apply and manage infrastructure configurations

  • Production Deployments

    • Execute, monitor, and manage production deployments with proper rollback strategies

  • OS & Security Management

    • Maintain Ubuntu-based systems, ensuring they are patched, secure, and performant


Requirements

  • 5+ years in Operations, SRE, or DevOps roles

  • 3+ years managing on-premise Kubernetes clusters

  • Strong troubleshooting skills in:

    • Kubernetes

    • Networking

    • Databases (MongoDB, MySQL, PostgreSQL)

  • Proficient in monitoring tools: Prometheus, Grafana, Loki

  • Familiar with operational processes, incident management, and runbooks

  • Experience with Helm, Ansible, and optionally Terraform

  • Prior experience with production on-call support and incident resolution

  • Competent in performing production deployments under change management practices

  • Experience managing Ubuntu systems

Share

Apply for this position

Required*
Apply with Indeed
We've received your resume. Click here to update it.
Attach resume as .pdf, .doc, .docx, .odt, .txt, or .rtf (limit 5MB) or Paste resume

Paste your resume here or Attach resume file

Human Check*