What you’ll Do:
- Support services before they go live such as system design consulting, capacity planning, and launch reviews.
- Maintain services once they are live by measuring and monitoring availability, latency, and overall system health.
- Scale systems sustainably through mechanisms like automation; evolve systems by pushing for changes that improve reliability and velocity.
- Improve monitoring, alerting and resilience of systems.
- Practice sustainable incident response and blameless postmortems.
What you’ll Need:
- 5+ years of experience in DevOps or SRE-related roles.
- Experience with JVM-based service operation, including performance optimization and troubleshooting.
- Ability to operate monitoring, logging, and tracing systems and support problem resolution.
- Understanding and experience with Linux, VMs, containers, Kubernetes, and cloud environments.
- Possession of a systematic approach to problem-solving, effective communication skills, and strong drive.
- Fluency in both Korean and English (advanced business level).
- No disqualifications for international travel. (Mostly Thailand)
It’d be Great if you have:
- Skilled in using one or more programming languages such as Java, Golang, or Python.
- Experience in building and operating infrastructure and data storage for global services and handling large-scale traffic.
- Experience in designing and managing databases like MongoDB or MySQL.
- Knowledge of security and security testing.
- Experience defining SLI/SLO metrics for services and infrastructure, and optimizing monitoring and operational environments based on these metrics.
- Proficiency in business-level English or Thai.