Course Overview

Introduction to Data Science (BIOL7800)

https://introdatasci.dlilab.com/

Daijiang Li

LSU

2023/08/21

1 / 17

Who am I?

li's picture

Daijiang Li

Assistant Professor

Department of Biological Sciences

Center for Computation & Technology

https://www.dlilab.com

2 / 17

My role

Introuce new materials
Help you learn these materials
Help you learn how to use what we learned in your research
Help you learn how to ask for help
Be a potential future resource

3 / 17

My role

Third time teaching this course

4 / 17

My role

Third time teaching this course

I still plan to make a lot of mistakes or even fail

4 / 17

My role

Third time teaching this course

I still plan to make a lot of mistakes or even fail

In public

4 / 17

Who are you?5 / 17

Introduce yourself: department, lab, research direction/interest, etc.

Go Through the Syllabus

6 / 17

How are we going to go through this course?Learning by doing (through trials and errors)Use lots of online resourcesPeer-teaching and learning; collaborative codingGoogle and Stack Overflow7 / 17

What is data science ?8 / 17

Data science is interdisciplanary

To gain insights into data through ...

9 / 17

Data Scientist = statistician + programmer + coach + storyteller + artist

Shlomo Aragmon

Good data science is distinguished from bad data science primarily by a repeatable, thoughtful, skeptical application of an analytic process to data in order to arrive at supportable conclusions.

Jeff Leek

Applies to every discipline

10 / 17

The age of big data

Between the dawn of civilization and 2003, we only created five exabytes¹ of information; now we’re creating that amount every two days.

Eric Schmidt et al, Google

[1] 1 exabyte = 1 billion gigabytes

11 / 17

ask students what kind of large datasets exist in their field of research

Data science processes

Define the question of interest
Get the data
Clean and prepare the data
Explore the data
Fit models to extract insights
Tell, explain, and illustrate results

12 / 17

Data science processes

Define the question of interest
Get the data
Clean and prepare the data
Explore the data
Fit models to extract insights
Tell, explain, and illustrate results

13 / 17

These steps are the most time consuming ones, so better to make them (and others) reproducible

data science workflow Blitzstein & Pfister, 2015

14 / 17

Questions in data science15 / 17

Leek & Peng 2015

16 / 17

Examples

Types
Examples


Descriptive
Proportion of different races in the USA

Exploratory
Investigate correlations among multiple variables

Inferential (most common)
Does air pollution correlate with life expectancy at the state level in the USA?

Predictive
Using polling data to predict election results; not necessarily explain why

Causal
Average risk of COVID for vaccination vs non-vaccination

Mechanistic
Impacts of wing design on air flow over a wing; rare outside of engineering

17 / 17

↑, ←, Pg Up, k	Go to previous slide
↓, →, Pg Dn, Space, j	Go to next slide
Home	Go to first slide
End	Go to last slide
Number + Return	Go to specific slide
b / m / f	Toggle blackout / mirrored / fullscreen mode
c	Clone slideshow
p	Toggle presenter mode
t	Restart the presentation timer
?, h	Toggle this help

Types	Examples
Descriptive	Proportion of different races in the USA
Exploratory	Investigate correlations among multiple variables
Inferential (most common)	Does air pollution correlate with life expectancy at the state level in the USA?
Predictive	Using polling data to predict election results; not necessarily explain why
Causal	Average risk of COVID for vaccination vs non-vaccination
Mechanistic	Impacts of wing design on air flow over a wing; rare outside of engineering

Course Overview

Introduction to Data Science (BIOL7800)

Daijiang Li

LSU

2023/08/21

Who am I?

Daijiang Li

Assistant Professor

Department of Biological Sciences

Center for Computation & Technology

https://www.dlilab.com

My role

My role

My role

My role

Who are you?

Go Through the Syllabus

How are we going to go through this course?

Learning by doing (through trials and errors)

Use lots of online resources

Peer-teaching and learning; collaborative coding

Google and Stack Overflow

What is data science ?

Data science is interdisciplanary

To gain insights into data through ...

Applies to every discipline

The age of big data

Data science processes

Data science processes

Questions in data science

Examples

Who am I?

Daijiang Li

Assistant Professor

Department of Biological Sciences

Center for Computation & Technology

https://www.dlilab.com

Help