We can get data into R through keyboard, from the clipboard, or from an external file (local or online).
We have learned the c()
function to concatenate data as
a vector.
We can also use the scan()
function if we want to type
or paste a few numbers into a vector from the keyboard.
Demo
You can also use scan to paste in groups of numbers from the
clipboard. In Excel, highlight the column of numbers you want, then type
Ctrl+C
. Now go back into R. At the 1: prompt just type
Ctrl+V
and the numbers will be scanned into R.
Demo
If you use Excel, try to export the data as text file (e.g.,
.csv
, tsv
). It is possible that you cannot
open a 10-year old Excel file, but plain text files will always
work.
When write code to read files, always use relative path instead of absolute path.
Existing datasets provided by the base package of R.
## function (file, header = FALSE, sep = "", quote = "\"'", dec = ".",
## numerals = c("allow.loss", "warn.loss", "no.loss"), row.names,
## col.names, as.is = !stringsAsFactors, tryLogical = TRUE,
## na.strings = "NA", colClasses = NA, nrows = -1, skip = 0,
## check.names = TRUE, fill = !blank.lines.skip, strip.white = FALSE,
## blank.lines.skip = TRUE, comment.char = "#", allowEscapes = FALSE,
## flush = FALSE, stringsAsFactors = FALSE, fileEncoding = "",
## encoding = "unknown", text, skipNul = FALSE)
## NULL
## X names views
## 1 1 An Interview with Gilbert Strang on Teaching Linear Algebra 531657
## 2 2 1. The Geometry of Linear Equations 749756
## 3 3 2. Elimination with Matrices. 1651140
## 4 4 3. Multiplication and Inverse Matrices 1149974
## 5 5 4. Factorization into A = LU 431759
## 6 6 5. Transposes, Permutations, Spaces R^n 669672
## 7 7 6. Column Space and Nullspace 633672
## 8 8 7. Solving Ax = 0: Pivot Variables, Special Solutions 512615
## 9 9 8. Solving Ax = b: Row Reduced Form R 463494
## 10 10 9. Independence, Basis, and Dimension 486194
## 11 11 10. The Four Fundamental Subspaces 450159
## 12 12 11. Matrix Spaces; Rank 1; Small World Graphs 339305
## 13 13 12. Graphs, Networks, Incidence Matrices 283376
## 14 14 13. Quiz 1 Review 246720
## 15 15 14. Orthogonal Vectors and Subspaces 368634
## 16 16 15. Projections onto Subspaces 361465
## 17 17 16. Projection Matrices and Least Squares 328095
## 18 18 17. Orthogonal Matrices and Gram-Schmidt 87774
## 19 19 18. Properties of Determinants 290485
## 20 20 19. Determinant Formulas and Cofactors 261363
## 21 21 20. Cramer's Rule, Inverse Matrix, and Volume 253171
## 22 22 21. Eigenvalues and Eigenvectors 269982
## 23 23 22. Diagonalization and Powers of A 351739
## 24 24 23. Differential Equations and exp(At) 261975
## 25 25 24. Markov Matrices; Fourier Series 49059
## 26 26 24b. Quiz 2 Review 17764
## 27 27 25. Symmetric Matrices and Positive Definiteness 56173
## 28 28 26. Complex Matrices; Fast Fourier Transform 189655
## 29 29 27. Positive Definite Matrices and Minima 184409
## 30 30 28. Similar Matrices and Jordan Form 50385
## 31 31 29. Singular Value Decomposition 58854
## 32 32 30. Linear Transformations and Their Matrices 291459
## 33 33 31. Change of Basis; Image Compression 37131
## 34 34 32. Quiz 3 Review 113571
## 35 35 33. Left and Right Inverses; Pseudoinverse 164589
## 36 36 34. Final Course Review 150043
## names views
## 1 An Interview with Gilbert Strang on Teaching Linear Algebra 531657
## 2 1. The Geometry of Linear Equations 749756
## 3 2. Elimination with Matrices. 1651140
## 4 3. Multiplication and Inverse Matrices 1149974
## 5 4. Factorization into A = LU 431759
## 6 5. Transposes, Permutations, Spaces R^n 669672
## 7 6. Column Space and Nullspace 633672
## 8 7. Solving Ax = 0: Pivot Variables, Special Solutions 512615
## 9 8. Solving Ax = b: Row Reduced Form R 463494
## 10 9. Independence, Basis, and Dimension 486194
## 11 10. The Four Fundamental Subspaces 450159
## 12 11. Matrix Spaces; Rank 1; Small World Graphs 339305
## 13 12. Graphs, Networks, Incidence Matrices 283376
## 14 13. Quiz 1 Review 246720
## 15 14. Orthogonal Vectors and Subspaces 368634
## 16 15. Projections onto Subspaces 361465
## 17 16. Projection Matrices and Least Squares 328095
## 18 17. Orthogonal Matrices and Gram-Schmidt 87774
## 19 18. Properties of Determinants 290485
## 20 19. Determinant Formulas and Cofactors 261363
## 21 20. Cramer's Rule, Inverse Matrix, and Volume 253171
## 22 21. Eigenvalues and Eigenvectors 269982
## 23 22. Diagonalization and Powers of A 351739
## 24 23. Differential Equations and exp(At) 261975
## 25 24. Markov Matrices; Fourier Series 49059
## 26 24b. Quiz 2 Review 17764
## 27 25. Symmetric Matrices and Positive Definiteness 56173
## 28 26. Complex Matrices; Fast Fourier Transform 189655
## 29 27. Positive Definite Matrices and Minima 184409
## 30 28. Similar Matrices and Jordan Form 50385
## 31 29. Singular Value Decomposition 58854
## 32 30. Linear Transformations and Their Matrices 291459
## 33 31. Change of Basis; Image Compression 37131
## 34 32. Quiz 3 Review 113571
## 35 33. Left and Right Inverses; Pseudoinverse 164589
## 36 34. Final Course Review 150043
## function (file, header = TRUE, sep = "\t", quote = "\"", dec = ".",
## fill = TRUE, comment.char = "", ...)
## NULL
## function (file, header = TRUE, sep = ",", quote = "\"", dec = ".",
## fill = TRUE, comment.char = "", ...)
## NULL
## function (con = stdin(), n = -1L, ok = TRUE, warn = TRUE, encoding = "unknown",
## skipNul = FALSE)
## NULL
## function (file, envir = parent.frame(), verbose = FALSE)
## NULL
## function (file, refhook = NULL)
## NULL
Some useful functions
One of the most popular package is readr
, which is a
core package of the tidyverse
. From its webpage:
The goal of readr is to provide a fast and friendly way to read rectangular data (like csv, tsv, and fwf). It is designed to flexibly parse many types of data found in the wild, while still cleanly failing when data unexpectedly changes.
read_csv()
: comma separated (CSV) filesread_tsv()
: tab separated filesread_delim()
: general delimited filesread_fwf()
: fixed width filesread_table()
: tabular files where columns are separated
by white-space.read_log()
: web log files## Rows: 32 Columns: 11
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (11): mpg, cyl, disp, hp, drat, wt, qsec, vs, am, gear, carb
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## # A tibble: 32 × 11
## mpg cyl disp hp drat wt qsec vs am gear carb
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4
## 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4
## 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
## 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1
## 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
## 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1
## 7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4
## 8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2
## 9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2
## 10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4
## # ℹ 22 more rows
Other packages:
haven
reads SPSS, Stata, and SAS files.readxl
reads excel files (both .xls and .xlsx).jsonlite
for json, and
xml2
for XMLLarge data? If data size is too large to be read
into R (R read data into memory), then R package DBI
, along
with a database specific backend (e.g. RMySQL, RSQLite, RPostgreSQL etc)
allows you to run SQL queries against a database and return a data
frame. Another useful package is dbplyr
if you are
used to the dplyr
package. dbplyr
is the
database backend for dplyr.
It allows you to use remote
database tables as if they are in-memory data frames by automatically
converting dplyr code into SQL.
The fst
package
for R provides a fast, easy and flexible way to serialize data frames.
With access speeds of multiple GB/s, fst is specifically designed to
unlock the potential of high speed solid state disks that can be found
in most modern computers. Data frames stored in the fst format have full
random access, both in column and rows.
Also check out the vroom
package: the
fastest delimited reader for R, 1.23 GB/sec.
After we finished data cleaning, we genrally want to save the cleaned data as external files so that we can use them directly next time.
If the data size is relatively small, try to save data as plain text
files such as .csv
files. Otherwise, we can save data as
compressed binary files. It will be smaller but will need specific tools
(R here) to open them.
With R or RStudio, when we quit, the program normally will ask us
whether we want to save the workspace. If we choose so, every objects we
created in R will be saved into one file (default to be
.RData
) at the root directory. Next time, when we open R or
RStudio, this file will be automatically loaded so that we can have
access to all objects we created previously (recall the
load()
in the data input section). This is convient, but
I don’t recommend it as your code may not be
reproducible. For example, if we created an object but did not save the
code; next time when we use R, we still have the object in our computer.
But if we share the code with others, they won’t be able to run it.
In RStudio, I recommend to set the
Save the workspace as an image on exit
to
never
.
It is better to save key objects/data as their own external files.
# writeClipboard(as.character(numeric.variables)) # go to Excel, Ctrl+V
# write.csv()
args(write.csv)
## function (...)
## NULL
## function (x, file = "", append = FALSE, quote = TRUE, sep = " ",
## eol = "\n", na = "NA", dec = ".", row.names = TRUE, col.names = TRUE,
## qmethod = c("escape", "double"), fileEncoding = "")
## NULL
## function (..., list = character(), file = stop("'file' must be specified"),
## ascii = FALSE, version = NULL, envir = parent.frame(), compress = isTRUE(!ascii),
## compression_level, eval.promises = TRUE, precheck = TRUE)
## NULL
## function (file = ".RData", version = NULL, ascii = FALSE, compress = !ascii,
## safe = TRUE)
## NULL
## function (object, file = "", ascii = FALSE, version = NULL, compress = TRUE,
## refhook = NULL)
## NULL
Be sure to take a look at this excellent slide!
Again, readr
has corresponding write functions. They are
an improvement to analogous base R functions.
## function (x, file, na = "NA", append = FALSE, col_names = !append,
## quote = c("needed", "all", "none"), escape = c("double",
## "backslash", "none"), eol = "\n", num_threads = readr_threads(),
## progress = show_progress(), path = deprecated(), quote_escape = deprecated())
## NULL
## function (x, file, compress = c("none", "gz", "bz2", "xz"), version = 2,
## refhook = NULL, text = FALSE, path = deprecated(), ...)
## NULL
Download this file: https://figshare.com/ndownloader/files/17461766 to your
computer. Then try to read it into R. You can try
read.table
, read.csv
, read_csv
,
etc.
Then save it to your disk. Again, you can try write.csv
or write_csv
or write_rds
.