There is growing interest in a data integration approach to survey sampling, particularly where a sample is linked to one or more population registers. The reason for doing this is simple – it is only by linking the same individuals in the different sources that it becomes possible to create a data set suitable for analysis.
Data linkage is not error free. Many linkages are non-deterministic, based on how likely a particular linking decision corresponds to a correct match, i.e., it brings together the same individual in both sources. High quality linking will ensure that the probability of this happening is high. But this may not be the case, and analysis of the resulting linked data should take account of this additional source of error. This is especially true for secondary analysis carried out without access to the linking information.
We describe an inferential framework that allows for linkage errors under sample to register linkage. After first reviewing current research activity in this area, we focus on secondary analysis and linear regression modelling, including the important special case of estimation of subpopulation and small area means. In doing so we consider both robustness and efficiency of the resulting linked data inferences.
After registering, you will receive a confirmation email containing information about joining the webinar. There will be time for questions. The webinar will be recorded and made available on the IASS and ISI web site. See below for the abstract and biography of the speakers.
- RAY CHAMBERS
Ray Chambers is Honorary Professorial Fellow at the National Institute for Applied Statistics Research Australia, University of Wollongong, Australia. He is an elected member of the International Statistical Institute and a Fellow of the American Statistical Association. He was co-Editor in Chief of the International Statistical Review 2015-2019 and has been an Associate Editor for the Journal of Official Statistics, Survey Methodology, the Journal of the Royal Statistical Society (Series A and B) and the Annals of Statistics. He was President of the International Association of Survey Statisticians, 2011-2013 and International Representative on the Board of the American Statistical Association, 2011-2014. His research is focused on robust model-based methods for inference from complex data, particularly where this complexity arises through integration of data from multiple sources. With Chris Skinner, he jointly edited Analysis of Survey Data, Wiley, 2003. More recently, he co-authored Maximum Likelihood Estimation for Sample Surveys, CRC Press, 2012, with David Steel, Alan Welsh and Suojin Wang, and An Introduction to Model-Based Survey Sampling with Applications, Oxford University Press, 2012, with Robert Clark.
- NICOLA SALVATI
Nicola Salvati is Associate Professor in Statistics in the Department of Economics and Management, University of Pisa, Italy. He is Associate Editor for the Biometrical Journal, the Journal of the Royal Statistical Society (Series A) and Statistical Methods & Applications. His research is focused on small area estimation, and particularly its use to estimate poverty measures when based on M-quantile and latent variable models. His research interests also include survey sampling, model-assisted and design-based inference, robust regression and spatial statistics. His most recent area of research involves development of new statistical methods based on latent variable models for estimating parameters from non-deterministically linked data.
- ENRICO FABRIZI
Enrico Fabrizi is associate professor in Business and Economics Statistics at the Università Cattolica del Sacro Cuore (Catholic University of the Sacred Heart), Milan, Italy. He holds a PhD in Statistics from the Department of Statistics, University of Bologna. His research interests cover survey sampling methodology, Bayesian inference applied to the analysis of complex survey data and small area estimation. His recent research has focused on development of new statistical methods for the analysis of linked or matched data sets and synthetic panels.