IASS Webinar 29: Fitting Classification Trees to Complex Survey Data
|Date||31 May 2023|
|Level of instruction||Intermediate|
Free of charge
Classification tree algorithms are a convenient method to perform variable selection and obtain interpretable structures relating covariates and an outcome of interest. When fitting classification trees to survey data, it is common to ignore sampling weights as well other design characteristics such as stratification and clustering. However, unless the survey design is uninformative, there is a risk that the inference for the classification tree is incorrect. A particular application in which this is a concern is the construction of nonresponse adjustment cells, a key step in the development of survey weights. We propose an extension of the popular Chi-square Automatic Interaction Detector (CHAID) approach that accounts for the design by applying a Rao-Scott correction in its classification criterion. We discuss the statistical properties of the resulting algorithm under a design-based framework. We compare its performance to existing weighted and unweighted algorithms, and we illustrate the use of the method using data from the U.S. American Community Survey.
About the instructor
Jean Opsomer, PhD, is a senior statistician with 25+ years of experience applying statistical methods to answer research questions. He is currently responsible for the statistical methodology of several large-scale Westat survey projects. He has served on 6 panels of the National Academies of Sciences, Engineering, and Medicine and is a current member of the Statistics Canada Advisory Committee on Statistical Methods. He is the Chair of the Survey Research Methods Section of the American Statistical Association and Associate Editor for Survey Methodology. He is also Adjunct Professor in the Department of Mathematics at the University of Maryland, College Park.