+ - 0:00:00
Notes for current slide
Notes for next slide

Course Overview

Introduction to Data Science (BIOL7800)

https://introdatasci.dlilab.com/

Daijiang Li

LSU

2023/08/21

1 / 17

Who am I?

li's picture

Daijiang Li

Assistant Professor

Department of Biological Sciences

Center for Computation & Technology

https://www.dlilab.com

2 / 17

My role

  • Introuce new materials

  • Help you learn these materials

  • Help you learn how to use what we learned in your research

  • Help you learn how to ask for help

  • Be a potential future resource

3 / 17

My role

Third time teaching this course

4 / 17

My role

Third time teaching this course

I still plan to make a lot of mistakes or even fail

4 / 17

My role

Third time teaching this course

I still plan to make a lot of mistakes or even fail

In public

4 / 17

Who are you?

5 / 17

Introduce yourself: department, lab, research direction/interest, etc.

Go Through the Syllabus

6 / 17

How are we going to go through this course?

Learning by doing (through trials and errors)

Use lots of online resources

Peer-teaching and learning; collaborative coding

Google and Stack Overflow

7 / 17

What is data science ?

8 / 17

Data science is interdisciplanary

To gain insights into data through ...

9 / 17

Data Scientist = statistician + programmer + coach + storyteller + artist

Shlomo Aragmon

Good data science is distinguished from bad data science primarily by a repeatable, thoughtful, skeptical application of an analytic process to data in order to arrive at supportable conclusions.

Jeff Leek

Applies to every discipline

10 / 17

The age of big data

Between the dawn of civilization and 2003, we only created five exabytes1 of information; now we’re creating that amount every two days.

Eric Schmidt et al, Google

[1] 1 exabyte = 1 billion gigabytes

11 / 17

ask students what kind of large datasets exist in their field of research

Data science processes

  1. Define the question of interest

  2. Get the data

  3. Clean and prepare the data

  4. Explore the data

  5. Fit models to extract insights

  6. Tell, explain, and illustrate results

12 / 17

Data science processes

  1. Define the question of interest

  2. Get the data

  3. Clean and prepare the data

  4. Explore the data

  5. Fit models to extract insights

  6. Tell, explain, and illustrate results

13 / 17

These steps are the most time consuming ones, so better to make them (and others) reproducible

data science workflow Blitzstein & Pfister, 2015

14 / 17

Questions in data science

15 / 17

Leek & Peng 2015

16 / 17

Examples

Types Examples
Descriptive Proportion of different races in the USA
Exploratory Investigate correlations among multiple variables
Inferential (most common) Does air pollution correlate with life expectancy at the state level in the USA?
Predictive Using polling data to predict election results; not necessarily explain why
Causal Average risk of COVID for vaccination vs non-vaccination
Mechanistic Impacts of wing design on air flow over a wing; rare outside of engineering
17 / 17

Who am I?

li's picture

Daijiang Li

Assistant Professor

Department of Biological Sciences

Center for Computation & Technology

https://www.dlilab.com

2 / 17
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow