As the Platform Engineer in the Data Services team, you will be working on all aspects of Data, from Platform and Infra build out to pipeline engineering and writing tooling/services for augmenting and fronting the core platform. You will be responsible for building and maintaining the state-of-the-art data Life Cycle management platform, including acquisition, storage, processing and consumption channels. The team works closely with Data scientists, Product Managers, Legal, Compliance and business stakeholders across the SEA in understanding and tailoring the offerings to their needs. As a member of the Data Services, you will be an early adopter and contributor to various open source big data technologies and you are encouraged to think out of the box and have fun exploring the latest patterns and designs in the fields of Software and Data Engineering.
The Day-to-day Activities
- Build and manage the data asset using some of the most scalable and resilient open source big data technologies like Airflow, Spark, Apache Atlas, Kafka, Yarn, HDFS, ElasticSearch, Presto/Dremio, HDP, Visualization layer and more .
- Design and deliver the next-gen data lifecycle management suite of tools/frameworks, including ingestion and consumption on the top of the data lake to support real-time, API-based and serverless use-cases, along with batch (mini/micro) as relevant
- Build and expose metadata catalog for the Data Lake for easy exploration, profiling as well as lineage requirements
- Enable Data Science teams to test and productionize various ML models, including propensity, risk and fraud models to better understand, serve and protect our customers
- Lead technical discussions across the organization through collaboration, including running RFC and architecture review sessions, tech talks on new technologies as well as retrospectives
- Apply core software engineering and design concepts in creating operational as well as strategic technical roadmaps for business problems that are vague/not fully understood
- Obsess security by ensuring all the components, from a platform, frameworks to the applications are fully secure and are compliant by the group’s infosec policies.
The Must Haves
- At least 4 years of relevant experience in developing scalable, secured, fault tolerant, resilient & mission-critical Big Data platform.
- Able to maintain and monitor the ecosystem with 99.9999% availability
- Candidates will be aligned appropriately within the organization depending on experience and depth of knowledge.
- Must have sound understanding for all Big Data components & Administration Fundamentals. Hands-on in building a complete data platform using various open source technologies.
- Must have good fundamental hands-on knowledge of Linux and building big data stack on top of AWS/Azure using Kubernetes.
- Strong understanding of big data and related technologies like HDFS, Spark, Presto, Airflow, apache atlas etc.
- Good knowledge of Complex Event Processing (CEP) systems like Spark Streaming, Kafka, Apache Flink, Beam etc.
- Experience with NoSQL databases – KV/Document/Graph and similar
- Proven Ability to contribute to the open source community and up-to-date with the latest trends in the Big Data Space.
- Able to drive devops best practices like CI/CD, containerization, blue-green deployments, 12-factor apps, secrets management etc in the Data ecosystem.
- Able to develop an agile platform with auto scale capability up & down as well vertically and horizontally.
- Must be in a position to create a monitoring ecosystem for all the components in use in the data ecosystem.
- Proficiency in at least one of the programming languages Java, Scala, Python or Go along with a fair understanding of runtime complexities.
- Must have the knowledge to build Data metadata, lineage and discoverability from scratch. “Educated” on the latest developments in the areas of Good understanding on Machine Learning models and efficiently supporting them is a plus.