Data Management and Advanced R Programming

BIOSTAT 214 Winter 2020 (2 units)
Course Director: Efstathios (Stathis) D. Gennatas, MBBS, PhD
Research Scientist
Department of Epidemiology & Biostatistics


The growing availability of large amounts of data — obtained either through research or electronic capture of everyday activity — has been termed "big data". This course will provide advanced instruction in managing and manipulating big data with contemporary software in the different phases of data science: obtaining data, cleaning data, visualizing data, analyzing data and drawing conclusions. At the conclusion of this course, students will be able to:

  • master data cleaning, preprocessing, and wrangling in R software; and
  • implement basic programming in R to order to efficiently manipulate data.

Note: This is not an introductory R class and assumes working familiarity of R at the beginning of the course. It does, however, begin with an in-depth review of the basics of R.


Working familiarity with R software either through BIOSTAT 213 or equivalent experience; and BIOSTAT 202 or equivalent experience.

Course Director:

Efstathios (Stathis) Gennatas, MBBS, PhD


Weekly lectures with demonstration and hands-on exercises. Sessions will be held on Wednesdays, 1:00 to 3:30 PM, January 8 through March 16.

In addition, all students will be required to submit a final project in which they manipulate, clean, and analyze data emanating from a large data source. Students will be given a choice of datasets and guidelines for performing the project.

All course materials will be posted on the course's online syllabus.


Students should install R (


Grades will be based on the final project and project presentation.

Students not in full-year TICR Programs who satisfactorily pass all course requirements will, upon request, receive a Certificate of Course Completion.

