Senior Associate, SRE Operations and Incident Management, EASRE, Technology & Operations

DBS Bank logo

DBS Bank

View Salaries, Reviews, and more  

Job Summary


Salary
S$9,031 - S$15,862 / Monthly EST

Job Type
Permanent

Seniority
Senior

Years of Experience
At least 5 years

Tech Stacks
Oracle CI Dynatrace MariaDB ELK Nexus SonarQube Grafana Prometheus Kubernetes SQL MongoDB Jenkins Linux Bitbucket

Job Description

This position is for an SRE Operations & Blameless Incident Retrospective Senior Analyst within the enabling group, Enterprise Architecture & Site Reliability Engineering (EASRE) department.

This role is expected to effectively contribute in the conduct of Blameless incident retrospective operations and in other SRE activities in general which pertains to maintenance management that includes availability, latency, performance, change management, monitoring, capacity planning & also the solutions offered derived from emergency response.

Key Accountabilities

  • Effective facilitation & conduct blameless retrospectives (RCA) activities from end to end
  • Absorb new technology rapidly & apply effectively.
  • Evaluate & demonstrate new cloud technologies as required.
  • Communicate well with technical & non-technical colleagues.
  • Mentoring of other colleagues as necessary.
  • Work to a high standard with agreed timescales.
  • Undertake any other tasks or duties that are reasonable & requested by your supervisor or a member of the senior management team.
  • Code reviewing
  • Ability to apply knowledge in supporting "Run" operations.
  • Perform data analysis & provide suggestion on identifying Service Level Indicators & Service Level Objectives.
Responsibilities

  • Responsible for facilitating effectively the Blameless Incident Retrospectives (BIR).
  • Able to demonstrate authority in the BIR calls while with coordinating with other stakeholders & solve the discrepancy in blameless ways.
  • Responsible for efficient allocation of time & resources given parallel major incidents and problem activities
  • Point of contact for assigned incidents of higher severity (from incident retrospective calls all the way up to Management Report (MR) documentation and publishing
  • Manage the updates of systems such as problem management module, internal sharepoint, etc
  • Proposes & participates on the enhancement activities related to SRE
  • Collaborates with Engineering Teams within EASRE and with LOBs on enabling activities as part of the preventive measures
Requirements

  • In depth understanding of Public/Private/Bybrid cloud solutions.
  • Hands on experience with popular CI/CD tools like Jenkins, Nexus, SonarQube, Bitbucket etc.
  • Good exposure to logging & monitoring tools like Dynatrace, Prometheus, Grafana, ELF/ELK
  • Good understanding of cloud native technologies like Containers, Kubernetes etc.
  • Develop & enhance production monitoring & management capabilities leveraging existing platforms & tools.
  • Minimum 5 years of root cause analysis (RCA) exposure & involvement leading discussions as a problem manager or incident commander
  • In depth understanding of Incident & Problem Management functions & activities
  • Good understanding of Identity and access management
  • Software incident & problem management
  • Work with stakeholders & command centre in trouble shooting, escalating & solutioning critical site incidents.
  • Identify recurring system/ application issues & work with cloud team, infra teams, product development, vendors & other stakeholders in investigating & resolving cause
  • Maintain accurate documentation of incidents including impact details, timelines, steps taken for mitigation/resolution.
  • Strong verbal & written communication skills particularly effective documentation skills
  • Degree with Min 6+ yrs of software development or technical support or operations experience.
  • Experience with Jira, Confluence
  • Basic knowledge of Linux
  • Exposure to Enterprise databases e.g Oracle, SQL server, Maria DB, MongoDB & Sybase.
  • Knowledge in systems & multi-tier application & network troubleshooting
  • Experience with load balancing principles.
  • Essential knowledge & awareness of Public/Private/Hybrid cloud solutions.
  • Good exposure to logging & monitoring tools like Dynatrace, Prometheus, Grafana, ELG/ELK
Apply Now

We offer a competitive salary and benefits package and the professional advantages of a dynamic environment that supports your development and recognises your achievements.

banner icon
Prepare For Your Interview in 1 Week?
Equip yourself with possible questions that interviewers might ask you, based on your work experience and job description.
Get Started!

Achieve your dream job with our top-notch tools!

Resume Checker Illustration

Resume Checker

Our free resume checker analyzes the job description and identifies important keywords and skills missing from your resume in just a minute!

Check Now
Resume Checker Illustration

Interview Preparation

Utilizing advanced AI, our tool generates tailored interview questions based on your industry, role, and experience. Practice and receive feedback on your answers in real time!

Let's Prepare
Resume Checker Illustration

Resume Builder

Let us show you the differences between a bad, good, and great resume, and guide you in building a resume that helps you stand out to employers, ensuring you land your next position faster!

Build Resume