Intro to Biomedical Machine Learning in Python

Welcome Bioinformatics Bootcamp: Intro to Biomedical Machine Learning in Python!

This two-part workshop is designed to prepare biomedical scientists to use the python programming language for machine learning. Part I will teach fundamental python programming and basic python data science. Part II will teach machine learning fundamentals, building towards capstone projects involving real-world patient datasets.

This page contains all the lectures and materials needed to participate. 

 

For any questions, please contact Henry Miller.

 

How to participate

Enrollment link: HERE

Asynchronously: You can watch the videos and complete the activities at your own speed. No enrollment is necessary. 

Semi-synchronously: You watch the videos and complete the activities, keeping pace with the workshop. You will also get access to DataCamp. This requires that you enroll

Synchronously: Same as semi-synchronous, except that you are invited to participate in the live online sessions. This requires that you enroll AND do the following:

Complete the (1) Introduction to Python and (2) Intermediate Python courses. However, you do not have to complete these if you score > 60% on both the Python Programming assessment and the Data Manipulation with Python assessment.  Once you complete either the courses or the assessments, let Henry know and you will get the invite link to the live sessions. Prior to the live sessions, please also download and install Anaconda. NOTE: you can complete these assignments and join synchronously until module #5 (July 6th) or until max capacity is reached (100 people). 

**NOTE**: Before attending office hours or emailing instructors for assistance, you must have already attempted the DataCamp assignments. 

Github Workshop Repo HERE
Download the Syllabus HERE

PART I: Python for Data Science

Module #1: Orientation and Introductory Python

In this lecture, we give an overview of the workshop and recap the introductory Python concepts covered in Introduction to Python

Lecturer: Henry Miller

Data/time: June 8th, 2021 (5PM CST)

Materials:

- Slides: Here

- Code (download the Zip file): Here 

Activity/Homework

Complete any remaining Module #1 challenge questions, practice on DataCamp, and complete the Python Programming assessment. 

Module #2: Intermediate Python

In this lecture, we continue with intermediate python concepts, such as lists, if...elif...else, function, list comprehensions, dictionaries, and numpy arrays. 

Lecturer: Henry Miller

Data/time: June 15th, 2021 (5PM CST)

Materials:

- Code (download the Zip file): Here 

Activity/Homework

Complete the first 6 challenge questions in the Module_2_challenge_problems.ipynb notebook in the Module 2 folder before next lecture. Complete questions 7-10 before module #4. 

Module #3: Python for Data Science

In this lecture, we continue with data science concepts such as numpy, pandas, matplotlib, and scipy. We finish the last part of Module #2 and all of Module #3. 

Lecturer: Simon Levy

Data/time: June 22th, 2021 (5PM CST)

Materials:

- Code (download the Zip file): Here 

Activity/Homework

Complete the last 4 challenge problems in the Module_2_challenge_problems.ipynb notebook and of the challenge problems in the Module_3_challenge_problems.ipynb. 

Module #4: Review Week

In this lecture, we wrap up Part I and we finish going through the homework answers.

Lecturer: Simon Levy

Data/time: June 29th, 2021 (5PM CST)

Materials:

- Code (download the Zip file): Here 

Activity/Homework

Begin the Supervised Learning with scikit-learn course on DataCamp and download Weka

Module #5: Getting to know your data

In this lecture, we begin part II by discussing statistical considerations in ML and data tidying. 

Lecturer: Daniel Montemayor, PhD

Data/time: July 6th, 2021 (5PM CST)

Materials:

- Code (download the Zip file): Here 

Activity/Homework

Finish the Supervised Learning with scikit-learn course on DataCamp.

PART II: Intro to Biomedical Machine Learning

Module #6: Feature Selection and Parsimony

In this lecture, we discuss parsimony and feature selection in python. 

Lecturer: Daniel Montemayor, PhD

Data/time: July 13th, 2021 (5PM CST)

Materials:

- Code (download the Zip file): Here 

Activity/Homework

Finish the Supervised Learning with scikit-learn course on DataCamp and complete the module #6 homework.

Module #7: Classification

In this lecture, we discuss classification models.

Lecturer: Daniel Montemayor, PhD

Data/time: July 20th, 2021 (5PM CST)

Materials:

- Code (download the Zip file): Here 

Activity/Homework

Finish the Supervised Learning with scikit-learn course on DataCamp.

Module #8: Regression

In this lecture, we discuss regression models.

Lecturer: Daniel Montemayor, PhD

Data/time: July 27th, 2021 (5PM CST)

Materials:

- Code (download the Zip file): Here 

Activity/Homework

Complete the Module #8 Homework assignment.

Module #9: Leukemia Project Prt I

In this lecture, we discuss the Leukemia project.

Lecturer: Daniel Montemayor, PhD

Data/time: Aug 3rd, 2021 (5PM CST)

Materials:

- Code (download the Zip file): Here 

- DREAM Leukemia Dataset: Here

Activity/Homework

Begin the Leukemia Project Homework Challenge.

Module #10: Leukemia Project Prt II

In this lecture, we continue working on the Leukemia project.

Lecturer: Daniel Montemayor, PhD

Data/time: Aug 10rd, 2021 (5PM CST)

Materials:

- Code (download the Zip file): Here 

- DREAM Leukemia Dataset: Here

Activity/Homework

Continue the Leukemia Project Homework Challenge.