Machine Learning in R for the Biomedical Sciences: Methods for Prediction, Pattern Recognition, and Data Reduction
Machine learning refers to the use of computers to perform automated statistical algorithms to solve certain problems or fulfill certain objectives. Machine learning can be viewed as an intersection between statistics and computer science. What distinguishes these problems from routine problems that can be addressed with manual techniques are large numbers of numbers of variables or elements. In biomedical research, machine learning is most commonly used for the objectives of prediction (predicting either a unknown current state or a future condition from a variety of currently available data elements), pattern recognition (also known as cluster analysis, where the goal is to group a set of persons or objects in such a way that objects in the same group are more similar to each other than to persons/objects in other groups), and data reduction (also known as dimensionality reduction where the goal is to summarize many data elements into fewer elements while retaining much of the contain ed information). Through use of the R software environment, this course will provide an introduction to use of machine learning to solve problems of prediction, pattern recognition, and data reduction in the various fields related to biomedical research (e.g., biological sciences, imaging sciences, epidemiologic research; and clinical research). The specific objectives include learning how to:
- Understand the rationale and mechanics of a range of machine learning techniques (e.g., regression trees and random forests, support vector machines, neural networks and deep learning, penalized regression techniques, clustering, and principal components analysis) to address problems of prediction, pattern recognition, and/or data reduction;
- Implement a variety of machine learning techniques in R software; and
- Apply the knowledge and techniques to the completion of a real-world biomedical project in the realm of prediction, pattern recognition, and/or data reduction.
Prior completion or equivalent experience:
- Opportunities and Challenges of Complex Biomedical Data: Introduction to the Science of "Big Data" (BIOSTAT 202)
- Biostatistical Methods for Clinical Research II (BIOSTAT 208)
- Introduction to Computing in the R Software Environment (BIOSTAT 213)
Prior completion or concurrent enrollment:
- Biostatistical Methods for Clinical Research III (BIOSTAT 209)
- Clinical Epidemiology (EPI 204)
John Kornak, PhD
Mark Segal, PhD
Each week, new material is introduced via an interactive lecture and recommended readings. Learning is reinforced via computer labs, structured discussion sections, and homework.
Lectures: Thursdays: 1:00 PM to 2:30 PM, April 4 through June 6.
Lecture recordings will be available online later in the day.
Computer Laboratory: Content: Assistance with use of R software and project-specific mentoring.
Time: Thursdays, 2:45 to 3:45 PM
Structured Discussions:Content: Faculty-led discussion of content from lectures and recommended readings.
Time: Thursdays, 3:45 to 4:45 PM, April 11, 25, and May 9, 23, and 30.
All course materials and handouts will be posted on the course's online syllabus.
Grades will be based on total points achieved on the homework assignments and class project. Please note that late assignments are not accepted.
This course is sponsored by the Training in Clinical Research (TICR) Program, and space is limited. Preference is given to UCSF-affiliated personnel. We regret that auditing is not permitted.
To apply for this course, please fill out and submit the application below. Please see our fees page for cost information. The deadline for application is March 22, 2019. Only one application needs to be completed for all courses desired during the quarter.
The application is best completed using the latest version of Firefox, Chrome or Safari.
Information for how to pay
please read before applying