## Instructors

### Prof. Li-Chun Zhang

Li-Chun Zhang is Professor of Social Statistics at University of Southampton, Senior Researcher at Statistics Norway, and Professor of Official Statistics at University of Oslo. His research interests include finite population sampling design and coordination, graph sampling, sample survey estimation, non-response and measurement errors, small area estimation, index number calculations, editing and imputation, register-based statistics, population size estimation, statistical matching, record linkage. Some of the international research projects he participated in are the EU framework projects EURAREA, DACSEIS, RISQ and BLUE-ETS; the ESSnet projects Small Area Estimation, Data Integration, Quality of Multisource Statistics; the H2020-project InGRID-2; the ESRC-projects ADRCE, NCRM-SAE.

### Dr. Melike Oguz-Alper

Melike Oguz-Alper is a junior researcher at Statistics Norway. She has a BSc degree in Mathematics from Middle East Technical University, Turkey, and MSc and PhD degrees in Statistics from University of Southampton, UK. Her research interests include the theory and application of survey sampling, variance estimation, empirical likelihood and graph sampling. She has articles published at Biometrika, CSDA, and JOS. The courses she delivered partly or fully online are a master-level course on Statistical Methods for Social Sciences at University of Oslo (as a co-lecturer) and a Survey Sampling course for the employees of State Statistics Service of Ukraine.

## Course description

This is a 3-day course.

**Finite population sampling** has found numerous applications in the past century. The validity of sampling inference of real populations derives from the known probability sampling design under which the sample is selected, “irrespectively of the unknown properties of the target population studied” (Neyman, 1934). This is the key theoretical justification for its universal applicability.

**Valued graph** is a more powerful representation, which allows one to incorporate the connections among the population units in addition to the units on their own. The underlying structure is a graph given as a finite collection of nodes (units) and edges (connections). Attaching measures to the nodes and/or edges yields a valued graph. Many technological, socio-economic, biological phenomena exhibit a graph structure that may be the central interest of study, or the edges may provide effectively access to those nodes that are the primary targets. Either way, graph sampling is a statistical approach to study real graphs. Just like finite population sampling, it is universally applicable based on exploring the variation over all possible subgraphs (i.e. sample graphs), which can be taken from the given population graph, according to a specified method of sampling.

**Graph sampling** thus encompasses finite population sampling, because any latter situation can be represented as a special case of the former. All the so-called “unconventional” finite population sampling techniques, such as indirect, network, adaptive cluster or line-intercept sampling, can be more effectively studied as special cases of graph sampling. Whereas snowball sampling and targeted random walk sampling are probabilistic versions of breadth- or depth-first non-exhaustive search algorithms in graphs.

The course provides an introduction to the central concepts of graph sampling, the most common sampling methods, and the construction of graph sampling strategy. An emphasis is the extension from the traditional sampling strategy (finite population sampling, Horvitz-Thompson estimator) to a much more general strategy consisting of bipartite incidence graph sampling (BIGS) and incidence weighting estimator (IWE). The application of the BIGS-IWE strategy will be illustrated for all the aforementioned unconventional situations of finite population sampling, as well as the more complicated graph sampling situations such as snowball sampling and targeted random walk sampling.

## Target audience

Graduate students, statisticians at national statistical offices or other organisations working with sampling methods, data scientists interested in network analysis, graph mining or compression.

## Syllabus

The course will be based on the book Graph Sampling.

- Day 1
- Bipartite incidence graph sampling (BIGS) and incidence weighting estimator (IWE)
- Applying BIGS-IWE strategy to unconventional sampling situations
- Practical session with R

- Day 2
- Adaptive cluster sampling (ACS) as BIGS
- ACS designs for epidemic prevalence estimation
- Practical session with R

- Day 3
- Introduction to general theory of graph sampling
- Strategies for snowball sampling and targeted random walk sampling
- Practical session with R

## Required software

- RStudio.