The Site Reliability Engineer (SRE) working on the Global Security team will be responsible for building and maintaining security systems. Tasks will include installing, configuring, and updating hardware and software; establishing and managing user accounts; overseeing or conducting backup and recovery tasks; implementing operational and technical security controls; and adhering to organizational security policies and procedures.
Primary/Essential Duties and Key Responsibilities:
- Build, install, configure, and test dedicated hardware and software solutions.
- Manage system/server resources including performance, capacity, availability, serviceability, and recoverability.
- Monitor and maintain system/server configuration.
- Ensure all systems security operations and maintenance activities are properly documented and updated.
- Maintain baseline system security according to organizational policies.
- Manage accounts and access to systems and equipment.
Knowledge, Skills and Abilities:
- Ability to monitor measures or indicators of system performance and availability.
- Ability to script or code OS/application-level automation tasks.
- Ability to perform basic data/log parsing.
- Ability to work with open-source orchestration software (Kubernetes) for deploying, managing, and scaling container-based applications.
- Skill in conducting system/server planning, management, and maintenance.
- Knowledge of automation tools such as Ansible, Puppet, Chef, Salt, etc.
- Knowledge of system administration, network, and operating system hardening techniques.
- Knowledge of infrastructure tools such as Terraform preferred.
- Knowledge or familiarity with HAProxy and nginx.
- Experience in system administration with professional skills in Linux on distributed systems.
- Experience with AMQP message-broker software preferred.
- Experience with distributed, open-source search and analytics engines preferred.
- Experience with Public Cloud provider infrastructure, system deployments and product release operations.
- Experience with and usage of metrics systems (e.g. Elastic Stack, Graphite, Prometheus, Influx) to diagnose issues and quantify impacts.
- Familiarity with SLAs, SLOs, and SLIs
- Minimum three (3) years of experience in a security administrative role