As Service Reliability Supervisor, you will report to the Global NOC manager and work closely with the Network Operations Center team globally to establish and maintain a high-performing and highly available game service for players around the world. You’ll help manage a team at Riot Singapore that monitors and supports all aspects of production environments, development environments, and general system needs. Your management skills and understanding of operations will help you diagnose and communicate potential issues to Rioters and the community, improving the quality of the player experience.
Some of the challenges that you’ll encounter include overseeing incident management of a 24/7/365 team in an environment where every minute counts. You’ll provide guidance during stressful situations and be the paragon of steadfast decision-making. You’ll also help evolve the strategic direction, implement tactical goals, and maintain the health of the team.
- Ensure that your direct reports are meeting team and individual Measurements.
- Be the escalation point for the Network Operations Center which responds, mitigates, and resolves incidents.
- Maintain performance of the team through hiring, training, assigning and evaluating work, and taking corrective action where necessary.
- Guide team members’ technical and professional growth.
- Ensure that the team is operating in compliance with local laws and regulations.
- Plan, design, and implement solutions that support NOC operations.
- Contribute to the strategic direction of growth and capacity planning established by the Global NOC manager.
- Develop and collaborate on policies and processes for all Network Operations Center environments with the NOC Leadership team.
- Manage NOC Service level agreement commitments with product owners and service teams.
- Coordinate communication, training, and work over a global 24/7/365 team.
- Establish plans and policies for business disaster recovery.
- 4 years experience working in technology operations.
- 2 years experience leading a team and managing performance.
- Strong verbal and written communication skills in English.
- Understanding of basic technologies around running an online service and the advancements the industry is making
- Knowledge in general networking and system triage and configuration, understanding metrics, and distill essential action points taken during incidents from a technical perspective.
- ITIL Foundation v4 certification
- Degree in information technology, information system, technical operations, or equivalent experience.
- Experience with SRE (Site Reliability Engineering)
- Experience in time critical/multiple data center supported NOC that is globally distributed
- Gamer empathy for understanding impact of outages
- Experience managing teams through transitional change
- Strong communication with the team, cross site, stakeholders, service owners, senior leadership
- Proficient in Mandarin
For this role, you'll find success through craft expertise, a collaborative spirit, and decision-making that prioritizes the delight of players. We will look at your past studies and experience, but for this role, we also look for dedicated people with a personal relationship with games. If you embody player empathy and care about the experiences of players, this could be the role for you!
- Full relocation support
- Full health insurance for you, your spouse and children
- Open paid time off
- Savings benefit with company matching
- Life insurance, parental leave, plus short-term and long-term disability
- Play Fund so you can broaden and deepen your knowledge of our players and community through games
- Wellness Fund to encourage a balanced body and mind
- Monthly phone bill allowance
- Monthly food allowance
- We will double down on your donations of time and money to non-profits
Don’t forget to include a resume and cover letter. We receive a lot of applications, but we’ll notice a fun, well-written intro that shows us you take play seriously.