IBM InfoSphere Advanced QualityStage V11.5
Contains: PDF course guide, as well as a lab environment where students can work through demonstrations and exercises at their own pace.
This course will step you through the QualityStage data cleansing process. You will transform an unstructured data source into a format suitable for loading into an existing data target. You will cleanse the source data by building a customer rule set that you create and use that rule set to standardize the data. You will next build a reference match to relate the cleansed source data to the existing target data.
If you are enrolling in a Self Paced Virtual Classroom or Web Based Training course, before you enroll, please review the Self-Paced Virtual Classes and Web-Based Training Classes on our Terms and Conditions page, as well as the system requirements, to ensure that your system meets the minimum requirements for this course. /terms
The intended audience for this course are: • QualityStage programmers • Data Analysts responsible for data quality using QualityStage • Data Quality Architects • Data Cleansing Developers • Data Quality Developers needing to customize QualityStage rule sets
Participants should have: • Compled the QualityStage Essentials course, or have equivalent experience • familiarity with Windows and a text editor • familiarity with elementary statistics and probability concepts (desirable but not essential)
1: QualityStage Review • Course project • QualityStage review • Data Quality • Master Data Management • Investigate • Standardize • Match
2: Structure of a Rule Set • Rule Sets and Rule Set files • Classes and Classification tables • Thresholds • Dictionary files • Pattern action files • Optional tables
3: Creation of a Custom Rule Set • Custom Rule Set development cycle • Investigate data file • Parsing • SEPLIST/STRIPLIST updates
4: Initial Investigation of Data to Be Standardized • Word Investigation • Pattern report • Token report
5: Classification Table • Create the Classification Table • Classification schema • What to classify • Process • Resulting Classification File with Legend • Pattern review: refining the Classification Table
6: Pattern Action File • Pattern Action Language • Development of Pattern Action Sets • Refining Pattern Action Sets • Investigation of Standardized Results
7: Standardization Rules Designer • What is Standardization Rules Designer or SRD? • Using the SRD • SRD work areas • Rule Set revision and selection • Embedded assistance
8: Match Frequency • Match frequency job • Column mapping • Match frequency data set • Using match frequencies in a match job
9: Two-Source (Reference Match) Advanced Implementation • Create a reference match between standardized product data and warehouse data • Refine the match results using the description fields of the standardized product data and the warehouse data.
After completing this course, you should be able to:• Modify rule sets• Build custom rule sets• Standardize data using the custom rule set• Perform a reference match using standardized data and a reference data set• Use advanced techniques to refine a Two-source match