Logistic Modeling Without Split-Samples
A traditional method for fitting and validating a Logistic Model is “split-sampling”. Fit on TRAIN and Validate on VALIDATION. But Split-Sampling cannot be applied to small analysis datasets. An alternative method is explained in this talk. This method involves fitting and validating on the full analysis dataset. This accommodates smaller analysis datasets but also this method is fully applicable for large analysis datasets. Model Validation is achieved by an “Optimism Correction Adjustment” that involves repeated bootstrap sampling and model fitting to compute the adjustment. Care must be exercised in predictor preparation on the full analysis dataset to avoid “double dipping”. In this regard, regression splines are utilized. Familiarity with logistic modeling, PROC LOGISTIC, and moderate SAS coding skill is assumed. Topics are introduced from first principles. Inspiration for this talk follows from work of F. Harrell and E. Steyerberg. The talk uses Base SAS® and SAS/STAT®
About the Presenter
Bruce Lund is a statistical modeling trainer. For 15 years he was a manager or consultant for OneMagnify of Detroit. Before OneMagnify, he was the customer database manager at Ford Motor Company and a mathematics professor at University of New Brunswick, Canada. At Ford and OneMagnify he developed numerous predictive models to support automotive marketing. Bruce has a mathematics PhD from Stanford University. He has presented at MSUG many times before. He has also presented at SAS Global Forum, AnalyticsX, ASA CSP, and regional SAS user group conferences. In May 2023 he presented at Iowa and Nebraska SAS Days.
|