Influence Statistics in Linear Regression and SAS PROC REG
Influence statistics measure the change in the fit of a linear regression model that results from deleting an observation from the sample and then re-fitting. These statistics were first developed for linear regression in the 1970’s and are reported by SAS® PROC REG. Widely used influence statistics include: (1) “Studentized residual”, (2) “R-student (externally studentized residual)”, (3) “dfbetas (difference in fit of coefficients)”, (4) “dffits (difference in fit of Y’s”, (5) “Cook’s D”, (6) “covariance ratio”. Central to (1)–(6) is the concept of the hat matrix and “leverage” of a sample point. These statistics are discussed and illustrated in this presentation. Influence statistics also apply to the case of weighted linear regression and the presentation includes this topic. This presentation is expository in nature with the goal of explaining the concepts of influence statistics as well as to illustrate their reporting by PROC REG. Audience should have familiarity with multiple linear regression.
About the Presenter
Bruce Lund is a statistical modeling consultant and trainer. For 15 years he was a consultant for OneMagnify of Detroit. Before OneMagnify, he was the customer database manager at Ford Motor Company and a mathematics professor at University of New Brunswick, Canada. At Ford and OneMagnify he developed numerous predictive models to support automotive marketing. Bruce has a mathematics PhD from Stanford University. He has presented at MSUG each year since 2015 and has also presented at SAS Global Forum, AnalyticsX, ASA CSP, and at regional SAS user group conferences.
|