IBM InfoSphere QualityStage Essentials v11.7 (KM214G)

Overview

This course teaches how to build QualityStage parallel jobs that investigate, standardize, match, and consolidate data records. This course covers common data quality issues, QualityStage architecture, QualityStage clients and their functions, importing metadata, running jobs and reviewing results, building Investigate jobs, the Standardize stage and rule sets, identifying matching records and applying multiple Match passes, building a Survive job, and using a Two-Source match.

Students will gain experience by building an application that combines customer data from three source systems into a single master customer record.

Audience

  • Data analysts responsible for data quality using QualityStage
  • Data quality architects
  • Data cleansing developers

Prerequisites

Participants should have the following skills:

  • Familiarity with the Windows Operating System
  • Familiarity with a text editor
  • Helpful, but not required: Some understanding of elementary statistics principles such as weighted averages and probabilities.

Objective

After completing this course, learners should be able to:

  • List common data quality contaminants
  • Describe QualityStage architecture, clients, and their functions
  • Build and run DataStage and QualityStage jobs and review results
  • Use Character Discrete, Concatenate, and Word Investigations to analyze data fields
  • Build jobs using the Standardize stage
  • Build a QualityStage job to identify matching records
  • Interpret, improve, and consolidate match results
Show details

Course Outline

  1. Data Quality Issues
    1. Exercise 1: Pre-lab Prep
  2. QualityStage Overview
    1. Exercise 1: QualityStage Logon
  3. Developing with QualityStage
    1. Exercise 1: Import Table Definition Metadata
    2. Exercise 2: Build a QualityStage Job
  4. Investigate
    1. Build Investigate Jobs
  5. Standardize
    1. Exercise 1: Standardize Country
    2. Exercise 2: Select US Records
    3. Exercise 3: Standardize USPREP
    4. Exercise 4: Standardize USNAME, USADDR, and USAREA
    5. Exercise 5: Investigate Unhandled Patterns
    6. Exercise 6: Apply Rule Set Overrides
  6. Match
    1. Exercise 1: Create match Frequency Job
    2. Exercise 2: One-source Match Specification
    3. Exercise 3: Build a One-source Job using Match Specification
  7. Survive
    1. Exercise 1: Survivorship
    2. Exercise 2: Create Customer Load File
  8. Two-Sort Match
    1. Exercise 1: Read the Case Study
    2. Exercise 2: Prepare the Data Environment
    3. Exercise 3: Run the Two-Source Match Job