Instructor
Prof. Dr. Ivo D. Dinov
Dr. Dinov is a professor of Computational Medicine and Bioinformatics, and Health Behavior and Biological Sciences at the University of Michigan. He serves as a co-Director of the multi-institutional Probability Distributome Project, Associate Director of the Michigan Institute for Data Science (MIDAS), and Associate Director of the Michigan Neuroscience Graduate Program (NGP).
Dr. Dinov is a member of the American Statistical Association (ASA), International Association for Statistical Education (IASE), American Mathematical Society (AMS), American Physical Society (APS), American Association for the Advancement of Science (AAAS), and the Institutional Statistical Institute (ISI). His research involves mathematical modeling, statistical inference, computational processing, scientific visualization, spacekime inference, and predictive analytics.
Course description
This 2-day virtual course is based on the Data Science and Predictive Analytics (DSPA) course that the instructor teaches at the University of Michigan.
The training will provide intermediate to advanced learners with a solid data science foundation to address challenges related to collecting, managing, processing, interrogating, analyzing and interpreting complex health and biomedical datasets using R. Participants will gain skills and acquire a tool-chest of methods, software tools, and protocols that can be applied to a broad spectrum of Big Data problems.
Before diving into the mathematical algorithms, statistical computing methods, software tools, and health analytics, we will discuss a number of driving motivational problems. These will ground all the subsequent scientific discussions, data modeling, and computational approaches. The training will involve active-learning and integrate driving motivational challenges with mathematical foundations, computational statistics, and modern scientific inference.
Building on open-science principles, training will focus on effective, reliable, reproducible, and transformative data-driven discovery. Trainees will develop scientific intuition, computational skills, and data-wrangling abilities to tackle Big biomedical and advanced health data problems. The instructor will provide well-documented R-scripts and software recipes implementing atomic data-filters, as well as complex end-to-end predictive big data analytics solutions.
Target audience
Intermediate to advanced level learners, e.g., graduate students, postdocs, fellows, data science practitioners, engineers, mathematical modelers, (technology) team leaders, hea;th analysts. The prerequisites incluyde some college-level quantitative training.
Syllabus
Upon successful completion of this course, participants are expected to have moderate competency in at least two of each of the three competency areas:
- Algorithms and Applications,
- Data Management,
- Analysis Methods.
Specifically, participants will get end-to-end R-protocols, gain ML/AI algorithm knowledge, explore data validation, wrangling, and visualization, experiment with statistical inference and model-free Machine Learning tools.
Topics
- Motivation
- Foundations of R
- Managing Data in R
- Data Visualization
- Linear Algebra & Matrix Computing
- Dimensionality Reduction
- Lazy Learning: Classification Using Nearest Neighbors
- Probabilistic Learning: Classification Using Naive Bayes
- Decision Tree Divide and Conquer Classification
- Forecasting Numeric Data Using Regression Models
- Black Box Machine-Learning Methods: Neural Networks and Support Vector Machines
- Apriori Association Rules Learning
- k-Means Clustering
- Model Performance Assessment
- Improving Model Performance
- Specialized Machine Learning Topics
- Variable/Feature Selection
- Regularized Linear Modeling and Controlled Variable Selection
- Big Longitudinal Data Analysis
- Natural Language Processing/Text Mining
- Prediction and Internal Statistical Cross Validation
- Function Optimization
- Deep Learning, Neural Networks
Required software
R and RStudio.