As a Site Reliability Engineer, you will work with Agile engineering teams to provide production insight into running and operating software at-scale in a globally distributed highly available cloud based system. You will guide the team to consider scale-out implications and operational support implications of the choices they make during the development cycle to help foster an ownership in production mentality within the team. You will play a hands-on role helping the team meet ambitious technical, operational, schedule, and business requirements. The ideal candidate will be a systems problem solver with a passion for creating products that deliver incredible customer experiences and have deep experience with operational automation, data driven metrics collection, and a true desire to automate it rather than do it repeatedly.
Responsibilities
- Aid in building and supporting a container based platform architecture that can ensure microservice teams can own their code all the way into production.
- Build and support tooling and automated infrastructure necessary to deliver metrics from production environments, deploy releases to production environments, and manage multiple versions of multiple microservices operating in tandem to produce a fully functional system.
- Work with a team of backend and application development engineers to further institutionalize the notion of the team owns their code in production and offer suggestions on how to better do that in the context of the development sprint work.
Required Qualifications
- Demonstrated philosophy of automating your way out of repetitive operational work
- Experience running an at-scale global cloud based application in AWS or Google Cloud platforms.
- Experience automating in Python, Ruby or rapid-development languages.
- Experience in networking protocols, IP routing issues, TLS and diagnosing routing or high-level network transactional issues through logs and metrics.
- Experience building and operating log aggregation, tracing, metrics aggregation, and alerting systems that aid in diagnosing and understanding what is going on in complex microservice environments
- Experience operating and implementing distributed & highly concurrent service based architectures, including microservices, containerized services, and/or serverless architectures.
- Experience in fast-paced, iterative design environments such as consumer internet or mobile applications/gaming industries
- Start-up experience a plus
- Skills and Attributes:
- Interest in customer-first mentality with focus on recovery before diagnosis but ensuring sufficient information is available to properly diagnose.
- Preference to build self-healing and self-monitoring systems with a drive to minimize or eliminate the on-call needs of operations
- Excellent verbal and written communication skills
- Interests in driving to infrastructure-as-code and container orchestration
Benefits And Perks
- Regional specific competitive benefits
- Build your own Benefits (BYOB) perk
- Fully stocked kitchen and catered or reimbursed lunches
- Many other fun and exciting benefits and activities!
Compensation
- Competitive salary
- Bonus Plan
- Benefits and Perks vary based on location.
We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status. We will ensure that individuals with disabilities are provided reasonable accommodation to participate in the job application or interview process, to perform essential job functions, and to receive other benefits and privileges of employment. Please contact us to request accommodation.