Develop and maintain automated ETL pipelines (with monitoring) using scripting languages such as Python, Spark SQL, and AWS services such as S3, Glue, Lambda, SNS, SQS, and KMS
Implement and support reporting and analytics infrastructure for internal business customers
Develop and maintain data security and permissions solutions for enterprise-scale data warehouse and data lake implementations including data encryption, database user access controls, and logging
Develop data objects for business analytics using data modeling techniques
Develop and optimize data warehouse and data lake tables using best practices for DDL, physical and logical tables data, partitioning compression, and parallelization
Develop and maintain data warehouse and data lake metadata data catalog and user documentation for internal business customers
Work with internal business customers and software development teams to capture and document requirements
Qualifications
Experience building enterprise-scale data warehouse and data lake solutions end-to-end
Experience transitioning on-premise big data platforms into cloud-based platforms.
Knowledgeable about a variety of strategies for ingesting modeling processing and persisting big data.
Experience with native AWS technologies for data and analytics such as Redshift Spectrum Athena S3 Lambda Glue EMR Kinesis SNS CloudWatch etc.
Write secure stable testable maintainable code with minimal defects.
4+ years of experience with one or more query languages (e.g. SQL), schema definition languages (e.g. DDL), and scripting languages (e.g. Python) to build data solutions.
Prior experience with Infrastructure as Code technology such as Terraform or Cloudformation
Experience architecting/building real-time data ingestion/delivery streams
Experience with NoSQL databases like DynamoDB/Mongo DB