Machine Learning in R for the Biomedical Sciences: Methods for Prediction, Pattern Recognition, and Data Reduction

BIOSTAT 216 Spring 2019 (3 units)
Course Director: John Kornak, PhD
Department of Epidemiology & Biostatistics


Machine learning refers to the use of computers to perform automated statistical algorithms to solve certain problems or fulfill certain objectives. Machine learning can be viewed as an intersection between statistics and computer science. What distinguishes these problems from routine problems that can be addressed with manual techniques are large numbers of numbers of variables or elements. In biomedical research, machine learning is most commonly used for the objectives of prediction (predicting either a unknown current state or a future condition from a variety of currently available data elements), pattern recognition (also known as cluster analysis, where the goal is to group a set of persons or objects in such a way that objects in the same group are more similar to each other than to persons/objects in other groups), and data reduction (also known as dimensionality reduction where the goal is to summarize many data elements into fewer elements while retaining much of the contain ed information). Through use of the R software environment, this course will provide an introduction to use of machine learning to solve problems of prediction, pattern recognition, and data reduction in the various fields related to biomedical research (e.g., biological sciences, imaging sciences, epidemiologic research; and clinical research). The specific objectives include learning how to:

  • Understand the rationale and mechanics of a range of machine learning techniques (e.g., regression trees and random forests, support vector machines, neural networks and deep learning, penalized regression techniques, clustering, and principal components analysis) to address problems of prediction, pattern recognition, and/or data reduction;
  • Implement a variety of machine learning techniques in R software; and
  • Apply the knowledge and techniques to the completion of a real-world biomedical project in the realm of prediction, pattern recognition, and/or data reduction.

Prior completion or equivalent experience:

Prior completion or concurrent enrollment:

Highly recommended:

Course Director:

John Kornak, PhD
Phone: 415-514-8028

Lecturer: Mark Segal, PhD
Teaching Assistants: Sarah Tan

Each week, new material is introduced via an interactive lecture and recommended readings. Learning is reinforced via computer labs, structured discussion sections, and homework.

  1. Lectures: Thursdays: 1:00 PM to 2:30 PM, April 4 through June 6.
    Lecture recordings will be available online later in the day.
  2. Computer Laboratory: Content: Assistance with use of R software and project-specific mentoring.
    Time: Thursdays, 2:45 to 3:45 PM
  3. Structured Discussions:Content: Faculty-led discussion of content from lectures and recommended readings.
    Time: Thursdays, 3:45 to 4:45 PM, April 11, 25, and May 9, 23, and 30.

All course materials and handouts will be posted on the course's online syllabus.




Grades will be based on total points achieved on the homework assignments and class project. Please note that late assignments are not accepted.

UCSF Graduate Division Policy on Disabilities


This course is sponsored by the Training in Clinical Research (TICR) Program, and space is limited. Preference is given to UCSF-affiliated personnel. We regret that auditing is not permitted.

To apply for this course, please fill out and submit the application below. Please see our fees page for cost information. The deadline for application is March 22, 2019. Only one application needs to be completed for all courses desired during the quarter.

The application is best completed using the latest version of Firefox, Chrome or Safari.

APPLICATION Information for how to pay ;
please read before applying