This position is with TikTok's Reliability & Quality Assurance Team. The team is responsible for ensuring that the services provided by TikTok are highly reliable with low-latency. Reliability assurance is complex and systematic, for any massive application system. We focus on optimizing the application architecture from end to end, driven by data analysis and aim at automatic and intelligent failure recovery.
In this role you are:
- Responsible for the optimization of TikTok's core function architecture, designing a highly available, high performance, and highly maintainable system to ensure the stability of core functions;
- Design and achieve the guarantee system of end-to-end full-link stability for the core functions
This role will allow you to:
- Optimize the quality and process of the entire R&D process, conduct in-depth research on code quality improvement, automatic testing, and automatic deployment.
- Optimize the architecture of TikTok, improve the architecture for better reliability, better scalability and lower latency, design automatic disaster recovery solutions.
- Build a massive service governance system with the ability to visualize the entire system architecture, and locate and resolve faults automatically. Help develop an automated chaos system
- Build a stability measurement system, systematically measure the system's ability to prevent, detect, and resolve failures, and provide a one-click solution to solve stability problems
- Become an SRE expert, gain insight into the hidden dangers of the system, establish operation and maintenance standards, improve the automaticity of operation and maintenance with the R&D team, and improve the stability guarantee system
- Bachelor's degree in Computer Science or a related technical background involving software/system engineering, or equivalent working experience.
- Good programming experience with high concurrency/complex business system/service management
- Proficient in at least one of the following backend languages: C/C++/Java/Go/Python/Shell/PHP
- Positive and optimistic, strong sense of responsibility, self-driven, serious, good team communication and collaboration skills