New
3 weeks ago
ML Ops Engineer (AWS / Terraform)
GCS GLOBAL CAPABILITY LIMITED
Engineering & Technology
IT & Telecoms
Confidential
Easy Apply
Job Summary
We are seeking an experienced ML Ops Engineer to help scale the deployment and management of multiple AI models across AWS.
- Experience Level : Mid level
- Experience Length : 3 years
Job Description/Requirements
Location: Remote
Work Hours: 11:30 AM – 8:30 PM EAT (08:30 AM – 5:30 PM GMT)
Employment Type: Permanent or Contract
Compensation: Salary dependent on experience, skill set, and project scope
About the Role
We are seeking an experienced ML Ops Engineer to help scale the deployment and management of multiple AI models across AWS. You will join a growing team that has developed a suite of image-based machine learning models — including classification, recognition, and prediction systems — and now needs to operationalise these models efficiently and securely in production environments.
This role sits at the core of the platform and infrastructure strategy. You will be responsible for designing scalable
deployment pipelines, building and managing infrastructure using Terraform, and ensuring the entire ML lifecycle — from experimentation to production — runs efficiently, securely, and cost-effectively.
The ideal candidate is proactive, detail-oriented, and confident working in a fast-paced, cloud-first, international environment.
Key Responsibilities
• Design and implement scalable, automated infrastructure for deploying ML models in AWS using Terraform.
• Manage and optimise existing AWS environments (SageMaker, ECS/EKS, Lambda, Batch, and GPU-backed instances).
• Build and maintain CI/CD pipelines for ML model delivery and monitoring.
• Ensure infrastructure supports both real-time inference and batch processing workloads.
• Collaborate closely with Data Scientists and Engineers to productionise models efficiently.
• Monitor system performance and costs, identifying opportunities for optimisation and automation.
• Maintain infrastructure reliability, security, and compliance with best practices.
Skills & Experience
• Minimum 3–5 years of proven experience in ML Ops, DevOps, or Cloud Infrastructure Engineering, preferably in large-scale production environments.
• Extensive hands-on experience with Terraform, including provisioning and managing complex AWS environments.
• Strong knowledge of AWS services relevant to ML Ops:
• SageMaker for model training and deployment
• ECS/EKS or Elastic Beanstalk for containerised workloads
• Lambda and Batch for inference pipelines
• S3, CloudWatch, IAM, Glue, and related orchestration tools
• Proven experience deploying GPU-accelerated ML models in production.
• Solid understanding of ML model lifecycle management, including versioning, packaging, and scaling.
• Proficiency in Python, with experience using FastAPI or Flask for serving models.
• Strong understanding of CI/CD, Infrastructure as Code, and DevOps principles.
• AWS or Terraform certifications are highly regarded.
• Familiarity with Kubernetes, Docker, and MLflow is an advantage.
• Experience in AWS cost optimisation and performance tuning preferred.
Important Safety Tips
- Do not make any payment without confirming with the BrighterMonday Customer Support Team.
- If you think this advert is not genuine, please report it via the Report Job link below.