Introduction to Programming for Health Data Science in R
Vast amounts of health-related data are being generated daily and at an increasing rate. Our ability to extract insights and make the most of these resources depends in large part on the use of computational tools to preprocess, analyze and present data. The goal of this course is to provide an introduction to quantitative programming using the R language. The course lays the foundation for later courses in advanced programming and data analysis in R. This course will cover:
- R package ecosystems: CRAN, Bioconductor, GitHub;
- Reading, inspecting, transforming and saving data;
- Data in R: Types and structure;
- Control flow (if/then, for & while loops);
- Indexing data;
- Writing functions; and
- Summarizing data and visualizing data.
Programming in any language relies on performing a series of simple steps put together to form something bigger and more complex. The emphasis of this course is for you to understand each command you use (and never to blindly copy-paste someone else’s code). A little practice goes a long way: the more you learn, the more confident you get, the faster and more enjoyable coding gets.
There are no formal course requirements. Prior programming experience (especially in R, MATLAB, or Python) is helpful but not required.
The class meets once a week on Wednesdays from 10:15 AM to 12:30 PM. Using an interactive format, the initial portion of the session reviews the prior week's material and exercises. Following a short break, new material is introduced with a live demonstration in R as well as students having the opportunity to practice on their own. This session will be based in-person at Mission Hall. While, for most students, in-person attendance offers the best opportunity for learning, the session will also be accessible by web-conferencing software (Zoom) for students who are unable to attend in person.
Weekly homework exercises consolidate learning and are due by the end of the week (Friday 11:59 PM).
All course materials will be posted on the course's online syllabus.
Programming for Data Science in R by E. Gennatas (2020)
R version 4.0.2 or higher
RStudio version 1.3.1073 or higher
git version-control system
External R packages will be installed in class as needed
To install git
MacOS: one way is to install Xcode command line tools using the following command in the terminal: xcode-select —install
Linux: use your distribution’s package manage
Final grades will be based on the weekly lab assignments (60%) and the final project (40%). Final projects will be in the form of a brief article on your choice of a dataset to be written in Rmarkdown, output to PDF and HTML, with code shared on a GitHub repository.
Students not in full-year TICR Programs who satisfactorily pass all course requirements will, upon request, receive a Certificate of Course Completion.
This course is sponsored by the Training in Clinical Research (TICR) Program, and space is limited. Preference is given to UCSF-affiliated personnel. We regret that auditing in the classroom is not permitted, but most of the course materials (with the exception of videotapes, answer keys, examinations, and copyrighted documents) are freely available (without formal enrollment) on the course’s online syllabus. Many students can glean the majority of the course’s content from this free access, but, importantly, formal enrollment also provides access to faculty for questions and individual-level extension of the curriculum, a community of other engaged students for in-person real-time discussion, and personalized correction and feedback on homework and projects.
To enroll in this course, please fill out and submit the application below. Please see our fees page for cost information. The deadline for application is July 12, 2021. Only one application needs to be completed for all courses desired during the quarter.
The application is best completed using the latest version of Firefox, Chrome or Safari.
Information for how to pay;
please read before applying