Get to know the Role:
Data Engineers in Grab get to work on one of the largest and fastest growing datasets of any company in South East Asia. We operate in a challenging, fast paced and ever changing environment that will push you to grow and learn. You will be involved in various areas of Grab’s Data Ecosystem including reporting & analytics, data infrastructure, and various other data services that are integral parts of Grab’s overall technical stack.
The day-to-day activities (Duties and Responsibilities):
- Build, deploy and manage big data tools with solid devops functions. Be able to manage CI/CD pipelines and terraform as well as cloud infrastructure.
- Deep understanding of the different data formats and table formats in distributed data processing and data storage:
- Data Formats: Parquet, Avro, ORC, Arrow;
- Open Table Formats: Delta Lake, Iceberg, Hudi and Hive;
- Streamline data access and security to enable data scientists, analysts and backend engineers to easily access data whenever they need to.
- Developing automation framework using programming languages such as python and automate the big data workflows such as ingestion, aggregation, ETL processing etc.
- Maintain and optimize the performance of our data analytics infrastructure to ensure accurate, reliable and timely delivery of key insights for decision making.
- Run Modern high performance analytical databases, with Solid understanding of distributed computing, and be able to build out scalable and reliable ETL pipelines and processes to ingest data from a large number and variety of data sources with high performance analytical databases and computation engines like Spark, Flink, Presto, Synapse, BigQuery, Greenplum and others.
- Understand most of SQL interface to tabular, relational datasets. Some distributed analytic engines like Trino(formerly Presto), Druid, Clickhouse, Redshift, Snowflake, Synapse, BigQuery, Greenplum (and other tools commonly referred to as “data warehouses”) integrate proprietary storage services with the analytics engine, creating self-contained data lakes functionality.
- A degree or higher in Computer Science, Electronics or Electrical Engineering, Software Engineering, Information Technology or other related technical disciplines.
- Designed high performance scalable infrastructure stacks for Big Data Analytics.
- Write unit, functional and end-to-end tests.
- Real passion for data, new data technologies, and discovering new and interesting solutions to the company’s data needs
- Excellent communication skills to communicate with the product development engineers to coordinate development of data pipelines, and or any new products features that can be built on top of the results of data analysis
Good to have:
- Experience in handling large data sets (multiple PBs) and working with structured, unstructured and geographical datasets
- Good experience in handling big data within a distributed system and knowledge of SQL in distributed OLAP environments.
- Knowledgeable on cloud systems like AWS, Azure, or Google Cloud Platform
- Familiar with tools within the Hadoop ecosystem, especially Presto and Spark.
- Good experience with programming languages like Python, Go, Scala, Java, or scripting languages like Bash.
- Design and implement RESTful APIs, and build, deploy performant modern web applications in React, NodeJS and TypeScript.
- Deep understanding on databases and best engineering practices - include handling and logging errors, monitoring the system, building human-fault-tolerant pipelines, understanding how to scale up, addressing continuous integration, knowledge of database administration, maintaining data cleaning and ensuring a deterministic pipeline