Job Description
Platform Reliability & Operations
– Maintain and improve the reliability, availability, and performance of production systems.
– Implement and manage SLOs, SLIs, and error budgets for key services.
– Perform root cause analysis for incidents and drive permanent corrective actions.
CI/CD & Automation
– Design, maintain, and optimize CI/CD pipelines (build, test, release, promote).
– Automate repetitive operational tasks using scripting and infrastructure-as-code.
– Enhance release governance, deployment quality, and environment consistency.
Cloud Infrastructure
– Manage and support Azure cloud-hosted applications.
– Implement secure, scalable infrastructure components (networking, storage, compute).
– Ensure configuration drift control, policy enforcement, and adherence to best practices.
Monitoring & Observability
– Enhance logging, metrics, and distributed tracing systems.
– Continuously identify performance bottlenecks and optimize system behavior.
Security & Compliance
– Apply DevSecOps practices across pipelines and infrastructure.
– Support vulnerability scanning, remediation, and secure configuration baselines.
– Ensure compliance with organizational, regulatory, and platform standards.
Collaboration & Continuous Improvement
– Work closely with developers, architects, and product teams to improve service quality.
– Contribute to operational guidelines, runbooks, and architectural documentation.
– Champion reliability engineering culture and best practices.
Requirements
– Strong experience as a DevOps or SRE in Azure cloud-based environments.
– Solid understanding of CI/CD tooling (Azure DevOps, GitHub Actions, Jenkins, etc.).
– Hands-on experience with IaC (Terraform, ARM/Bicep).
– Expertise in Windows/Linux systems engineering and shell scripting.
– Strong troubleshooting skills across application, infrastructure, and network layers.
– Experience with logging/monitoring tools, including Azure Monitor, Application Insights, NewRelic…
– Knowledge of containerization and orchestration (Docker, Kubernetes).
– Experience in production incident handling and post-incident reviews.
– Problem-solving skills and ability to troubleshoot complex technical issues.
– Excellent communication (English) and collaboration skills.
– Ability to generate innovative ideas for applying AI in company workflows and products, along with leading and training team members on essential AI skills and knowledge.
Having the following skills is an advantage:
– Familiarity with hybrid cloud architectures and multi-cloud (AWS is a plus).
– Certification in Azure or equivalent Cloud (e.g., Azure DevOps Expert, GCP Associate Engineer).
– Experience with Microservice systems is an advantage
– Efficient and flexible and want to learn new things.
Benefits
At SmartOSC, we offer the best to your values
– Competitive salary package and salary review twice a year
– Flexible working hour (between 7:30am – 7:30pm on staff’s preference)
– Premium health care up to $3,000/year
– Working in One of the largest Ecommerce Agencies in Southeast Asia – Professional English environment
– Free English, Japanese and professional training packages
– Firm’s Certified Qualifications Sponsorship for career development
– Annual company trip inside or outside Vietnam
– Other fun activities: happy hour, quarterly team building, football club, badminton club, charity activities, etc.
– Free entertainment parties: Birthday party, Anniversary party, Sum-up Party, Year-End Party, etc.