Understand the data thoroughly to provide quality customer insights. This role demands strong data and technical skills to source data from various source systems into Google Cloud Platform.
Main tasks:
Source and validate data from various sources on premise.
Gather requirements from users and build Interface agreements for onboarding new files. Develop accelerators to generate metadata files from interface agreements.
Define and conduct tests to ensure that the data is fit for the purpose.
Peer review other data analyst's code to make sure that the solution accomplishes the defined requirement. Analyse data and maintain data integrity.
Treat/Tokenise the restricted data using proper encryption algorithms before it is exposed to business users/customers.
Terraform scripts to add new datasets or buckets as required for onboarding the new sources.
Relevant skills:
Bachelor’s degree and/or relevant work experience.
Active Certification in any track on GCP.
Several years of designing, building and operationalizing large-scale enterprise data solutions and applications using one or more of GCP data and analytics services in combination with 3rd parties -Spark, Hive, Databrick, Cloud DataProc, Cloud Dataflow, Apache Beam/ composer, Big Table, CloudBigQuery, Cloud PubSub, Cloud storage Cloud Functions & Github.
Have data transformation abilities across large complex relational datasets using languages like SQL.
Build ETL pipelines using GCP services such as dataflow, google cloud storage, big query, python etc. Build and deploy datasets using terraform.
Working knowledge of Apache Airflow.
Several years of experience in performing detail assessments of current state data platforms and creating an appropriate transition path to GCP cloud.