Our software engineers for product infrastructure role combines software and systems engineering disciplines to run high-performance, large-scale distributed infrastructure.
This means you will be deeply involved in the developmental lifecycle of critical software services, collaborating closely with product engineers to combine software code and systems knowledge to ensure that TikTok e-Commerce's services are reliable, fault-tolerant, efficiently scalable and cost-effective.
You will also be leveraging your software engineering expertise to develop software platforms and tools to optimise the operational and engineering efficiencies of complex systems at scale, with particular focus on improving the systems' observability, performance and maintainability.
- Lead the engineering team to build, expand and operate TikTok e-Commerce global infrastructure, including large-scale systems in public and private clouds, data centers and content delivery networks.
- Develop automation, data visualization and automated monitoring processes to facilitate the optimization of the TikTok e-Commerce platform infrastructure.
- Drive the design and engineering of tools, as well as platform solutions, to optimize product engineering and operation efficiencies.
- Own governance, cost management, security & compliance, backup & disaster recovery for the platform infrastructure.
- Manage oncall processes to respond to performance and reliability issues, and establish best practices for coordinating escalation to resolve issues and minimize downtime.
- Bachelor's degree or higher in Computer Science or a related technical discipline, or equivalent working experience.
- 6+ years experience in Cloud Architecture or DevOps.
- 4+ years experience working with Unix Linux systems from kernel to shell and beyond with experience working with system libraries, file systems, and client-server protocols.
- Demonstrable experience in one or more programming languages such as Java, C++, Go, or scripting experience in Shell and Python.
- Expertise in designing, analyzing, and troubleshooting large-scale distributed systems.
- Agile, quick self-learner, highly self-motivated with strong sense of product ownership and creative problem solver.
- Operational experience running a 24x7 production infrastructure at scale.
- Ability to lead independent research to solve complex technical problems.
- Empathetic and results-oriented leader and mentor.
- Good collaborator and team player, comfortable working in a fast moving, culturally diverse and globally distributed team environment.