An Overview of Multivariate Statistical Analysis of Quantitative Data (PCA, FA, and Clustering)
This training will focus on the most common forms of multivariate statistics. Principal Components Analysis (PCA) and Principal Factor Analysis (PFA) are popular analytic tools that can help bring structure and understanding to datasets with many unique variables. PCA is the backbone of these methods and serves as a mechanism for reducing the dimensionality of a large dataset into a smaller set of variables or characteristics that retain information from the original data. PFA is a core component in classical testing theory and psychometric analyses for instrument creation. These methods are complex but have a rich history of use across the entire realm of data sciences. This slides parallel a hands-on workshop which provided a general overview of PCA and PFA. The focus and use cases are be two-fold. First, we will discuss PCA in the context of data reduction and insight generation. Second, we discussed the basics of creating composite measures via PFA and exploratory factor analysis. In contrast, Cluster Analysis works directly within the data to find groupings of data that are similar across one or many dimensions. Clustering algorithms vary greatly depending on data type with nominal and ordinal data often working best with hierarchical clustering and continuous data working with summary statistic based clustering (e.g. k-means). Philosophical and applied aspects of clustering will be discussed. The workshop is hands-on and designed for non-statisticians who have a background in both descriptive statistics and regression analysis.
An Overview of Causal Inference, Counterfactual Data Analysis, and Propensity Score Methods
This training will focus on a broad overview of Causal Inference with a focus on counterfactual data analysis, potential outcome frameworks, and the use of propensity score based methods. Significant time will be devoted to foundations as assumptions are critical in this type of work. We will work from the ground up by applying potential outcomes frameworks directly to create custom propensity score models and use that output directly in later outcome effect estimation. We will then move to custom SAS procedures (CAUSALTRT and PSMATCH) to illustrate how SAS can automate that workflow. A good understanding of basic statistics and regression modeling will be essential for this training. Time permitting, mediation and moderation models via the CAUSALMED procedure and augmented inverse propensity score (i.e. doubly robust estimation) will also be discussed.