Fuzzy Matching with SAS® PROC SQL
We must often join data from disparate data sources where exact matching by primary key is not possible. In this paper, I'll discuss and demonstrate fuzzy matching techniques that can help get your data ready for analysis. Here is our scenario:
As the owners of "Cats Adore" cat food, we've purchased a mailing list from "The Cat Gourmet" magazine with the goal of identifying potential new customers. We want to offer them a free sample in hopes they will buy our product. We have a nice, clean dataset containing our own customer names, addresses, and demographics such as birthdays, gender, etc. The Cat Gourmet mailing list contains only the information used to generate mailing labels.
In this paper I'll cover:
• Standardizing values for cleaner matches
• Identifying exact matches
• Conducting fuzzy matching using traditional SAS functions, including:
∘ SOUNDEX()
∘ SPEDIS()
∘ COMPLEV()
∘ COMPGED()
• Generating and using Data Quality match codes in SQL
As the owners of "Cats Adore" cat food, we've purchased a mailing list from "The Cat Gourmet" magazine with the goal of identifying potential new customers. We want to offer them a free sample in hopes they will buy our product. We have a nice, clean dataset containing our own customer names, addresses, and demographics such as birthdays, gender, etc. The Cat Gourmet mailing list contains only the information used to generate mailing labels.
In this paper I'll cover:
• Standardizing values for cleaner matches
• Identifying exact matches
• Conducting fuzzy matching using traditional SAS functions, including:
∘ SOUNDEX()
∘ SPEDIS()
∘ COMPLEV()
∘ COMPGED()
• Generating and using Data Quality match codes in SQL
Beyond Macro - Data-Driven Programming in SAS Viya
With adoption of SAS Viya accelerating processing in CAS is becoming more common, and CAS speaks CASL. Seasoned SAS coders often use SAS macro to produce data-driven programs, automating tedious programming tasks. When working with CAS from the SAS Compute Server, it's important to know where, when, and how the CASL that executes is generated. Whatever your experience level, the interactions between SAS code, CASL, and Macro can be intimidating.
This presentation will cover:
• A brief review of code tokenization and macro triggers on the SAS Compute Server.
• A brief introduction to CASL and CAS actions, including the effect of macro triggers in CAS programs.
• An introduction to CASL variables and data types, including arrays and dictionaries.
• Practical techniques for advanced data-driven programming in CASL
This presentation will cover:
• A brief review of code tokenization and macro triggers on the SAS Compute Server.
• A brief introduction to CASL and CAS actions, including the effect of macro triggers in CAS programs.
• An introduction to CASL variables and data types, including arrays and dictionaries.
• Practical techniques for advanced data-driven programming in CASL
About the Presenter
Mark Jordan (a.k.a. SAS Jedi) grew up in northeast Brazil as the son of Baptist missionaries. After 20 years as a US Navy submariner pursuing his passion for programming as a hobby, in 1994 he retired, turned his hobby into a dream job, and has been a SAS programmer ever since. Mark writes and teaches a broad spectrum of SAS programming classes, and his book, "Mastering the SAS® DS2 Procedure: Advanced Data Wrangling Techniques" is in its second edition. When he isn't writing, teaching, or posting "Jedi SAS Tricks", Mark enjoys playing with his grand and great-grandchildren, hanging out at the beach, and reading science fiction novels. His secret obsession is flying toys – kites, rockets, drones – and though he usually tries to convince Lori that they are for the grandkids, she isn't buying it. Mark lives in historic Williamsburg, VA with his wife, Lori, and Stella, their cat.
|