Binning Procedures for Logistic Regression
Binary logistic models for credit risk and direct marketing campaigns are generally built on large samples and with many classification predictors. It is prudent to bin (i.e. collapse) the levels of a classification predictor to achieve a smaller number of binned levels. This reduces parameters, eliminates levels with low counts, and can achieve desirable relationships to the target, such as monotonicity between an binned ordered predictor and sample odds of the target. Although controversial, continuous numeric predictors are also binned by some modelers. This paper presents two SAS® macros that perform binning. %NOD_BIN applies to nominal predictors. The second, %ORDINAL_BIN, applies to ordered predictors. Both macro’s optimize information value (IV) or, alternatively, entropy when binning. %ORDINAL_BIN provides the best possible result, both in finding the best solutions with respect to IV or entropy and also the best monotonic solutions. With some limitations, these macros can be applied to continuous numeric predictors. Also, the presentation reviews the potential of using PROC HPSPLIT as a binning procedure. Additionally, other SAS-based or R-based methods of binning continuous numeric predictors are reviewed. This presentation uses Base SAS® and SAS/STAT®.
About the Presenter
Bruce Lund is a statistical modeling consultant and trainer. For 15 years he was a analytics manager and then a consultant for OneMagnify of Detroit. Before OneMagnify, he was the customer database manager at Ford Motor Company and a mathematics professor at University of New Brunswick, Canada. At Ford and OneMagnify he developed numerous predictive models to support automotive marketing. Bruce has a mathematics PhD from Stanford University. He has presented at SAS Global Forum, AnalyticsX, ASA CSP, and regional SAS user group conferences.
|