DevOps Engineer – AI Infrastructure & Kubernetes

Raydian Cloud Pte Ltd

Singapore


Job Description

 About the Role
Raydian Cloud is seeking a forward-thinking DevOps Engineer to help build and scale infrastructure that powers cutting-edge AI workloads. You’ll work at the intersection of cloud-native technologies and Artificial Intelligence operations (AIOps), enabling high-performance, secure, and automated environments for AI development and deployment. Your expertise in Infrastructure as Code and Kubernetes will be critical in supporting scalable AI pipelines and platform services.

 Key Responsibilities
• Design and manage cloud infrastructure optimized for AI/ML workloads using Infrastructure as Code (Terraform, Pulumi, etc.)
• Deploy and maintain Kubernetes clusters tailored for GPU scheduling, distributed training, and inference workloads
• Build CI/CD pipelines for AI model training, validation, and deployment across environments
• Collaborate with data scientists and ML engineers to streamline model lifecycle management
• Implement observability and monitoring for AI services (e.g., Prometheus, Grafana, OpenTelemetry)
• Ensure infrastructure security, compliance, and cost-efficiency in multi-tenant AI environments
• Automate provisioning of AI-specific resources (e.g., GPU nodes, storage volumes, feature stores)
• Document infrastructure patterns, DevOps workflows, and platform architecture

  Why Join Raydian Cloud?
• Shape the future of AI infrastructure and platform services
• Work with a visionary team blending deep tech and strategic execution
• Influence architecture decisions in a fast-moving AI startup environment
• Competitive compensation, flexible work culture, and growth opportunities

Job Requirements

Required Skills & Qualifications
• Strong experience with Kubernetes, including GPU scheduling and Helm
• Proficiency in Infrastructure as Code tools (Terraform, Pulumi, etc.)
• Familiarity with cloud platforms (AWS, Azure, GCP) and AI services (e.g., SageMaker, Vertex AI)
• Experience with CI/CD tools (GitHub Actions, GitLab CI, Argo Workflows)
• Scripting skills in Python, Bash, or Go
• Understanding of ML model lifecycle and data pipeline orchestration
• Excellent communication and collaboration skills across technical and business teams

 Nice to Have
• Experience with Kubeflow, MLflow, or similar MLOps frameworks
• Knowledge of containerized AI workloads (e.g., TensorFlow Serving, Triton Inference Server)
• Familiarity with service mesh technologies (Istio, Linkerd) in AI microservices
• Certifications in Kubernetes or cloud platforms (CKA, AWS DevOps Engineer)

Skills Requirements

kubernetes data science python terraform aws

About Company

Raydian Cloud is a leader in AI-driven digital transformation, delivering secure, scalable, and sovereign cloud solutions for enterprises and governments. By leveraging strategic partnerships with industry leaders like NVIDIA, Rafay Systems, and Monetize360, we provide a complete ecosystem for AI innovation—from infrastructure to talent development. We empower organizations in highly regulated sectors such as healthcare, finance, and telecommunications to harness the power of AI while ensuring data sovereignty and strict regulatory compliance.

Apply For This Job

Job Overview

  • Job Type Full Time
  • Salary $5000 - $5000
  • Industry Information Technology and Services
  • Job Category IT - Software Jobs
  • Min Qualification Diploma

Career Conversion Programme

CCP for ICT Professionals (Software and Applications) - SGTech

Company Address

Social Profiles

Share This Job


Facebook Linkedin