This course is an introduction to data science. Data science is the study of data to extract knowledge and meaningful insights from noisy data. It is an emerging interdisciplinary field that uses techniques and theories from mathematics, statistics, computer sciences, and domain knowledge to analyze large amounts of data. The objective of this course is to provide students with a principled introduction to data science that properly combines problem solving skills and computational thinking. Students will learn the fundamental pipeline of data science, ranging from data acquisition, data clean-up, data exploration and visualization, modeling and inference, to professional reporting. This course will cover fundamental concepts, methods and tools in data science and how to apply data science methods in health science and biomedical research.
This course will introduce students to the analysis of various types of data used in public health. The initial section deals with different types of random variables, how their distributions are summarized and displayed graphically. Next, the concepts of hypothesis testing are discussed, along with a variety of measures used in public health. The course concludes with an overview of various methods used to model data, including linear regression and alternative regression approaches.