Group Technology and Operations (T&O) enables and empowers the bank with an efficient, nimble and resilient infrastructure through a strategic focus on productivity, quality & control, technology, people capability and innovation. In Group T&O, we manage the majority of the Bank's operational processes and inspire to delight our business partners through our multiple banking delivery channels.
- Lead daily operations of the SRE support team and embedding SRE philosophy and practices and tooling
- proactive design of SRE tooling, process, people and skills to ensure quality and stable software release and production support
- Maintain services and monitoring availability latency, and overall system health, output of customer satisfaction surveys, response, and resolution objectives
- Creation and continuous enhancements to reporting, dashboards and visibility of the platform health, usage from a core platform and infrastructure level perspective
- Influencing product roadmap to enhance customer experience based on customer interactions and trends and incidents
- Taking ownership of incidents reported and coordinating with L3 and engineering teams for resolution.
- Manage escalations from 24/7 escalation ladder for production issues
- Directly provide and ensure appropriate technical and soft skills training and mentoring of SRE team
- Ensure release management design of the core platform is synchronized and has automated testing gates
- Assist SRE team in solving customer issues, prioritize issues, negotiate customer priorities, and set expectations
- Ensure appropriate Technical and Management escalation procedures are in place and effectively used. Monitor high severity issues and drive the communication with stakeholders.
- Direct and oversee customer escalations and engage external escalation teams and partners as necessary
- Performing impact analysis and Root cause analysis, Analyze issues report by users and provide solutions
- Ensure software development and SRE philosophy and discipline by participating in code review cycle and ensuring test coverage, security and observability are baked into the stack
- Knowledge sharing via creation and maintenance of knowledge base articles, blogs, guilds and upkeep of documentation.
- Plan staffing requirements and recruit, train and retain top talent, in line with the capacity plan. Coach and develop your existing team to help employees achieve their career development aspirations
- Understand customer roadblocks and pain points and advocate in a data driven way with product management and engineering teams to enhance the product and experience
- A minimum of 10 years IT Industry experience
- 5+ years production SRE experience in enterprise environment within public or hybrid cloud environments.
- Production SRE support leadership experience in an enterprise, vendor, or service provider environment
- Demonstrable critical thinking and prioritization skills
- Knowledge on Incident Management, Change management process Co-ordinating with vendors, internal partners
- Highly proficient in written and spoken business English
- Well organized, adaptable and makes clear and effective decisions
- Knowledge and experience with observability and SRE tooling and SRE philosophy. Mastery of the observability tool chain or similar
- ELK Stack / Graylog ect.
- Deep expertise on timeseries reporting and monitoring tools and underlying aggregation models
- Stackdriver, Cloudwatch, Dynatrace, Solarwinds, Prometheus, InfluxDB
- General knowledge of infrastructure components Firewalls, TCP/IP, DNS, ICMP, Networking, Switching, PKI, TLS
- General knowledge of web technology fundamentals HTTP, Websockets, Content Distribution, WAF, REST, JSON, YAML, HTTP, CORS
- Strong experience with any flavour of Linux
- Experience in Spring Boot, Spring framework, or Angular is an advantage
- 5+ Experience with a continuous delivery and continuous integration development environment git,
- 3+ Years experience in at least one of the following in enterprise security aware setting: K8S, Ansible, Terraform, CloudFormation
- Knowledge and Experience with Containers and Serverless Architecture and virtualization.
We offer a competitive salary and benefits package and the professional advantages of a dynamic environment that supports your development and recognises your achievements.