Home

Membership

Community

Events and Awards

Media

About ISI

Resources

SCB

  • Register
Topic Data Science

Instructor

Affiliated with University of Michigan, USA

Prof. Dr. Ivo D. Dinov

Dr. Dinov is a professor of Computational Medicine and Bioinformatics, and Health Behavior and Biological Sciences at the University of Michigan. He serves as a co-Director of the multi-institutional Probability Distributome Project, Associate Director of the Michigan Institute for Data Science (MIDAS), and Associate Director of the Michigan Neuroscience Graduate Program (NGP).

Dr. Dinov is a member of the American Statistical Association (ASA), International Association for Statistical Education (IASE), American Mathematical Society (AMS), American Physical Society (APS), American Association for the Advancement of Science (AAAS), and the Institutional Statistical Institute (ISI). His research involves mathematical modeling, statistical inference, computational processing, scientific visualization, spacekime inference, and predictive analytics.

Course description

This 2-day virtual course is based on the Data Science and Predictive Analytics (DSPA) course that the instructor teaches at the University of Michigan.

The training will provide intermediate to advanced learners with a solid data science foundation to address challenges related to collecting, managing, processing, interrogating, analyzing and interpreting complex health and biomedical datasets using R. Participants will gain skills and acquire a tool-chest of methods, software tools, and protocols that can be applied to a broad spectrum of Big Data problems.

Before diving into the mathematical algorithms, statistical computing methods, software tools, and health analytics, we will discuss a number of driving motivational problems. These will ground all the subsequent scientific discussions, data modeling, and computational approaches. The training will involve active-learning and integrate driving motivational challenges with mathematical foundations, computational statistics, and modern scientific inference.

Building on open-science principles, training will focus on effective, reliable, reproducible, and transformative data-driven discovery. Trainees will develop scientific intuition, computational skills, and data-wrangling abilities to tackle Big biomedical and advanced health data problems. The instructor will provide well-documented R-scripts and software recipes implementing atomic data-filters, as well as complex end-to-end predictive big data analytics solutions.

Target audience

Intermediate to advanced level learners, e.g., graduate students, postdocs, fellows, data science practitioners, engineers, mathematical modelers, (technology) team leaders, health analysts. The prerequisites incluyde some college-level quantitative training.

Syllabus

Upon successful completion of this course, participants are expected to have moderate competency in at least two of each of the three competency areas:

  • Algorithms and Applications,
  • Data Management,
  • Analysis Methods.

Specifically, participants will get end-to-end R-protocols, gain ML/AI algorithm knowledge, explore data validation, wrangling, and visualization, experiment with statistical inference and model-free Machine Learning tools.

Topics

  1. Motivation
  2. Foundations of R
  3. Managing Data in R
  4. Data Visualization
  5. Linear Algebra & Matrix Computing
  6. Dimensionality Reduction
  7. Lazy Learning: Classification Using Nearest Neighbors
  8. Probabilistic Learning: Classification Using Naive Bayes
  9. Decision Tree Divide and Conquer Classification
  10. Forecasting Numeric Data Using Regression Models
  11. Black Box Machine-Learning Methods: Neural Networks and Support Vector Machines
  12. Apriori Association Rules Learning
  13. k-Means Clustering
  14. Model Performance Assessment
  15. Improving Model Performance
  16. Specialized Machine Learning Topics
  17. Variable/Feature Selection
  18. Regularized Linear Modeling and Controlled Variable Selection
  19. Big Longitudinal Data Analysis
  20. Natural Language Processing/Text Mining
  21. Prediction and Internal Statistical Cross Validation
  22. Function Optimization
  23. Deep Learning, Neural Networks

Related web resource.

Required software

R and RStudio.