Introduction to Programming for Biostatistics & Health Data Science with R
Vast amounts of health-related data are being generated daily and at an increasing rate. Our ability to extract insights and make the most of these resources depends in large part on the use of computational tools to preprocess, analyze and present data. The goal of this course is to provide a solid understanding of and hands-on experience with quantitative programming using the R language. This is an introductory to intermediate programming course to lay the foundation for further work in biostatistics, epidemiology, and machine learning. This course will cover:
- Data structures (vectors, matrices, lists, data frames, etc.);
- Control flow (if/then, for & while loops);
- Data input/output;
- Writing functions;
- Data preprocessing (cleaning, transformations, imputation, etc.);
- Table operations (merge datasets);
- Classes and methods (intro to object-oriented programming);
- Data visualization;
- String operations; and
- Writing scientific reports in PDF and HTML format.
Programming in any language relies on performing a series of simple steps put together to form something bigger and more complex. The emphasis of this course is for you to understand each command you use (and never to blindly copy-paste someone else’s code). A little practice goes a long way: the more you learn, the more confident you get, the faster and more enjoyable coding gets.
There are no formal course requirements. Prior programming experience (especially in R, MATLAB, or Python) is helpful but not required.
Efstathios (Stathis) Gennatas, MBBS, PhD
Anita Oh, MD
The class meets once a week each Monday of the Fall quarter, 2.45 PM - 5:00 PM, on Zoom. Time is split between teaching via live demonstration in R and a lab, with a short break in between. All code and content discussed in class will be made available on the online class website/book. At the beginning of each lab, we will walk through the previous week’s lab exercises and answer any questions. All parts are meant to be highly interactive to help you get the most out of class.
Live R demonstration
We will walk through each week’s syllabus using code in RStudio and Programming for Data Science in R.
Weekly exercises will be assigned using Rmarkdown (.Rmd files) each week. They must be submitted by the end of the week (Friday midnight) in order to be graded. Please keep notes of any questions and bring them to the following class where we will go over them.
All course materials will be posted on the course's online syllabus.
Programming for Data Science in R by E. Gennatas (2020)
R version 4.0.2 or higher
RStudio version 1.3.1073 or higher
git version-control system
External R packages will be installed in class as needed
To install git
MacOS: one way is to install Xcode command line tools using the following command in the terminal: xcode-select —install
Linux: use your distribution’s package manage
Final grades will be based on the weekly lab assignments (60%) and the final project (40%). Final projects will be in the form of a brief article on your choice of a dataset to be written in Rmarkdown, output to PDF and HTML, with code shared on a GitHub repository.
Students not in full-year TICR Programs who satisfactorily pass all course requirements will, upon request, receive a Certificate of Course Completion.
To apply for this course, please fill out and submit the application below. Please see our fee page for cost information. The deadline for application is September 11, 2020. Only one application needs to be completed for all courses desired during the quarter.
The application is best completed using the latest version of Firefox, Chrome or Safari.
Information for how to pay;
please read before applying