As a Site Reliability Engineer (SRE), you'll help build a meaningful engineering discipline, combining software and systems to develop creative engineering solutions to operations problems. Much of our support and software development focuses on optimizing existing systems, building infrastructure and reducing work through automation. You’ll join a team of curious problem solvers with a diverse set of perspectives who are thinking big and taking risks. In this environment, you’ll take the lead on relevant projects, supported by an organization that provides the support and mentorship you need to learn and grow. As an SRE, you’ll be focused on running better production applications and systems.
In this role as a senior SRE, you will be providing production support to the JPMC Public Cloud team on the public cloud. You will be working with cloud engineers to build the platform, pipeline, and monitor systems to ensure the application landscape is designed to take most advantage of JPMC’s global cloud solution.
The responsibilities for this role include, but are not limited to:
- Implement SRE frameworks to support globally multi-cloud environments, and ensure the highest level of SLA through operational excellence
- Provide failure analysis / root cause analysis when required
- Provide support to develop & improve the quality of technical engineering documentation
- Provide support to drive the maturity of the software development lifecycle
- Provide quality control of engineering deliverables
- Provide technical consultation to product management
- Perform deployment, administration, management, configuration, testing, and integration tasks related to the platforms in cloud environment
- Help to develop new cloud engineering strategies and implementations for the firm
- Champion a DevOps model so that services are automated and elastic across all platforms
- Help coach and mentor less experienced team members.
- Write operation documentation and knowledge base of known issues with solutions
- Ready to participate in 24x7 SRE on-call rotations and escalation workflows as needed, such as an occasional weekend
- Bachelor's degree in Computer Science, Information Technology, or equivalent technical field
- 8 or more years of IT experience with expertise in Enterprise Cloud infrastructure (AWS, Azure, GCP) in a mission critical environment
- In-depth OS experience (RHEL, Ubuntu, Windows Server) with strong debugging, troubleshooting, and problem-solving skills
- Expertise in programming language Python or Java, with focus on Site Reliability Engineering and support of cloud services
- Hands-on experience with cloud-based technologies and tools especially in deployment, monitoring and operations, such as Data Dog, Prometheus, Splunk, Elasticsearch, Grafana
- Strong working knowledge of modern development technologies and tools such Agile, CI/CD, Git, Terraform and Jenkins
- Deep knowledge of Internet protocols and web services technologies such as HTTP, DNS, TCP/UDP, SOAP, JSON and REST
- Good understanding of networking protocols and cybersecurity best practices in cloud environment
- Experience in PowerShell, shell scripting or GO is highly desirable
- AWS certification is highly desirable
We recognize that our people are our strength and the diverse talents they bring to our global workforce are directly linked to our success. We are an equal opportunity employer and place a high value on diversity and inclusion at our company. We do not discriminate on the basis of any protected attribute, including race, religion, color, national origin, gender, sexual orientation, gender identity, gender expression, age, marital or veteran status, pregnancy or disability, or any other basis protected under applicable law. In accordance with applicable law, we make reasonable accommodations for applicants’ and employees’ religious practices and beliefs, as well as any mental health or physical disability needs.
About the Team
The Chief Technology Office oversees enabling components inclusive of the top quality engineering and architecture tools and practices, key program management and processes as well as the technology workforce strategy required to make us a leading technology company for our customers, clients and colleagues around the world.