In recent years, a range of methods have been developed for handling unusual data sources – Big Data and non-probability samples –to enable the production of public and official statistics. The core methods available are quasi-randomization, superpopulation modelling and doubly-robust estimation. They rely on the use of generalized linear models and aim to produce estimates with reliability like that of estimates from traditional probability samples of similar sizes. Quasi-randomization involves using a probability sample survey as reference to estimate pseudo-weights for units in a non-probability sample or big data-type source, where coverage of the target population is insufficient or unknown. We present a brief review of the available methods and an application in which quasi-randomization was used successfully to make inference from a web-panel survey carried out by CETIC.br.
23 February 2022