Modernizing Data Lakes and Data Warehouses with Google Cloud (GC-MDLDWGC)

The two main components of any data pipeline are data lakes and warehouses. This course highlights use-cases for each type of storage and dives into the available data lake and warehouse solutions on Google Cloud in technical detail. Also, this course describes the role of a data engineer, the benefits of a successful data pipeline to business operations, and examines why data engineering should be done in a cloud environment.


Who should attend?

This course is intended for developers who are responsible for querying datasets, visualizing query results, and creating reports.

Specific job roles include:

  • Data engineer
  • Data analyst
  • Database administrators
  • Big data architects


Certifications

This course is part of the following Certifications:

  • Google Cloud Certified Professional Data Engineer


Prerequisites

Basic proficiency with a common query language such as SQL.


Objectives

  • Differentiate between data lakes and data warehouses.
  • Explore use-cases for each type of storage and the available data lake and warehouse solutions on Google Cloud.
  • Discuss the role of a data engineer and the benefits of a successful data pipeline to business operations.
  • Examine why data engineering should be done in a cloud environment.
Details anzeigen


Detailed Course Outline

Module 1 - Introduction to Data Engineering

Topics:

  • The role of a data engineer
  • Data engineering challenges
  • Introduction to BigQuery
  • Data lakes and data warehouses
  • Transactional databases versus data warehouses
  • Partnering effectively with other data teams
  • Managing data access and governance
  • Build production-ready pipelines
  • Google Cloud customer case study

Objectives:

  • Discuss the role of a data engineer.
  • Discuss benefits of doing data engineering in the cloud.
  • Discuss challenges of data engineering practice and how building data pipelines in the cloud helps to address these.
  • Review and understand the purpose of a data lake versus a data warehouse, and when to use which.


Module 2 - Building a Data Lake

Topics:

  • Introduction to data lakes
  • Data storage and ETL options on Google Cloud
  • Building a data lake by using Cloud Storage
  • Securing Cloud Storage
  • Storing all sorts of data types
  • Cloud SQL as your OLTP system

Objectives:

  • Discuss why Cloud Storage is a great option to build a data lake on Google Cloud.
  • Explain how to use Cloud SQL for a relational data lake.


Module 3 - Building a Data Warehouse

Topics:

  • The modern data warehouse
  • Introduction to BigQuery
  • Getting started with BigQuery
  • Loading data into BigQuery
  • Exploring schemas
  • Schema design
  • Nested and repeated fields
  • Optimizing with partitioning and clustering

Objectives:

  • Discuss the requirements of a modern warehouse.
  • Explain why BigQuery is the scalable data warehousing solution on Google Cloud.
  • Discuss the core concepts of BigQuery and review options of loading data into BigQuery.