Thomas W. Yee
Thomas W. Yee obtained his degrees in statistics, mathematics and computer science from The University of Auckland during 1988–1993. After stints at the medical school and Massey University he accepted a lecturership in 1997 at The University of Auckland and has been there since. His work mainly centres on regression methodologies and statistical computing. Author of the VGAM R package, he also has interests in statistical ecology and biostatistics. He is an Elected Member of the International Statistical Institute. Recently with colleagues he has developed Generally-Altered, -Inflated, -Truncated and -Deflated (GAITD) regression for handling count data having spikes, dips and/or truncation with underdispersion with respect to the Poisson as a possible addition too.
- Introduction and Background Revision
- Selected count data distributions: Poisson, negative binomial (NB), zeta, logarithmic.
- Examples of heaped and seeped (digit preference) data in self-reported surveys. Other examples of inflated (spiked), truncated and deflated (dipped) count data.
- Generalized linear models (GLMs) and Vector GLMs (VGLMs).
- Underdispersion, equidispersion and overdispersion.
- The multinomial logit model (MLM).
- The VGAM R package.
- Using glm() and vglm() in R.
- GAITD Regression Theory
- Notation and the 7 special value types. Parametric versus nonparametric variants.
- Probability mass function of the full `combo’ model.
- Seven mode example.
- With covariates, what questions GA, GI, GD models can answer.
- Properties: moments, cumulative distribution function.
- The Generally-Truncation-Expansion (GTE) method for underdispersion.
- The GAITD-Poisson and GAITD-NB models.
- Two measures of AITD: Kullback-Leibler divergence and the Xi measure.
- Practical Tutorial Examples
- Basic use of spikeplot(), plotdgaitd() and dgaitdplot().
- Sleep duration data (GTE + GI).
- Self-reported smoking data.
- Other Topics
A selection from (time permitting):
- Choosing initial values and monitoring convergence.
- Advanced modelling, e.g., order of the linear predictors, extractor functions, vgam() and rrvglm() for additive modelling and reduced-rank regression.
- Joint effects of the 7 special value types on underdispersion and overdispersion.
- Hypothesis testing, model selection via AIC, Vuong’s test.
- Comments on special cases such as the zero-inflated Poisson (ZIP), zero-altered Poisson (ZAP; a hurdle model), zero-truncated (positive) Poisson (ZTP).
Learning outcomes to be covered
By the end of the workshop delegates should achieve the following learning outcomes.
- Comprehend the capabilities and limitations of GAITD regression on realistic and unrealistic data sets. This includes the types of questions that can be addressed by the 3 operators.
- Understand the probability mass function (PMF) of the combo model and its consequences. Choose good values for the special value types.
- Confidently being able to use the VGAM package to display the data and the fitted model; to fit regression models to data exhibiting spikes (including heaping), dips (including seepage), and truncation. This also includes knowing how to apply the GTE method to underdispersed data. Being able to choose improved initial values if necessary.
- Being mindful of overfitting and other pitfalls.
Description of course materials for online teaching
The workshop slides will be made available on the instructor’s personal webpage. All code and data will also be supplied. Delegates will need a computer running the latest
version of R and several packages.
Proposed delivery structure, including elements of engagement
Zoom will be used. In Chapters 1, 2 and 4 I will mainly be speaking but there will be opportunities to run several R code chunks to reinforce the material. In Chapter 3 it would be expected that they will run R with me closely.
Statistical practitioners, especially of regression and those with count responses from self-reported survey data. Postgraduate and advanced undergraduate students in statistics would also very suitable.
Knowledge assumed (prerequisites)
Delegates need to have a working knowledge of R and a basic understanding of linear models (LMs) and generalized linear models (GLMs).
Level of instruction
A set of slides will be made available for delegates to read in preparation. This would be on the topics of: LMs, GLMs, S formulas, ZIP, ZAP, ZTP, the basic use of VGAM (including coef(), predict(), summary(), fitted() and the ‘zero’ argument).
|Type of participant||Course duration||Fee|
|Developed country||2 days||€ 100|
|2 days||€ 60|