IBM InfoSphere QualityStage Essentials v11.7 (2M214G-SPVC)

Overview

This course teaches how to build QualityStage parallel jobs that investigate, standardize, match, and consolidate data records. This course covers common data quality issues, QualityStage architecture, QualityStage clients and their functions, importing metadata, running jobs and reviewing results, building Investigate jobs, the Standardize stage and rule sets, identifying matching records and applying multiple Match passes, building a Survive job, and using a Two-Source match.

Students will gain experience by building an application that combines customer data from three source systems into a single master customer record.

Audience

This course is intended for Data Analysts responsible for data quality using QualityStage, Data Quality Architects, and Data Cleansing Developers.

Prerequisites

  • Participants should have the following skills:
    • Familiarity with the Windows Operating System
    • Familiarity with a text editor
    • Helpful, but not required:
      • Some understanding of elementary statistics principles such as weighted averages and probabilities.

Objective

After completing this course, learners should be able to:

  • List common data quality contaminants
  • Describe QualityStage architecture, clients, and their functions
  • Build and run DataStage and QualityStage jobs and review results
  • Use Character Discrete, Concatenate, and Word Investigations to analyze data fields
  • Build jobs using the Standardize stage
  • Build a QualityStage job to identify matching records
  • Interpret, improve, and consolidate match results
Pokaz szczególy

Course Outline

Unit 1 - Data Quality Issues

 

Unit 2 - QualityStage Overview

  • Exercise 1: QualityStage Logon

 

Unit 3: Developing with QualityStage

  • Exercise 1: Import Table Definition Metadata
  • Exercise 2: Build a QualityStage Job

 

Unit 4: Investigate

  • Exercise 1: Build Investigate Jobs

 

Unit 5: Standardize

  • Exercise 1: Standardize Country
  • Exercise 2: Select US Records
  • Exercise 3: Standardize USPREP
  • Exercise 4: Standardize USNAME, USADDR, and USAREA
  • Exercise 5: Investigate unhandled Patterns
  • Exercise 6: Apply Rule Set Override

 

Unit 6: Match

  • Exercise 1: Create Match Frequency Job
  • Exercise 2: One-Source Match Specification
  • Exercise 3: Build One-Source Job using Match Specification

 

Unit 7: Survive

  • Exercise 1: Survivorship
  • Exercise 2: Create Customer Master Load File

 

Unit 8: Two-Source Match

  • Exercise 1: Read the Case Study
  • Exercise 2: Prepare the Data Environment
  • Exercise 3: Run the Two-Source Match Job