This lecture will cover briefly about how to visualize, analyze, and model geographic data with open source software. We will focus on using the R ecosystem for spatial data analysis. Over the past decade, a wide range of packages based on the free and open source software for geospatial (FOSS4G) have been developed in R to support advanced geospatial statistics, visualization, and modeling. To learn more, one good free book is Geocomputation with R.

Two fundamental geographic data models exist: vector and raster.

The vector data model represents the world using points, lines and polygons. For example, administrative boundary of counties across all states in the US. Such data usually have high level of precision and are mostly used to describe properties of units with discrete borders (e.g., human population density of different counties). The geographic vector data model is based on points located within a coordinate reference system (CRS); one common example is latitude/longitude coordinates. Lines and polygons are just collections of points. A key R package to work with geographic vector data is sf.

The raster data model divides the surface up into cells of constant size. Raster datasets are the basis of web-mapping based on aerial photography and satellite-based remote sensing data. Rasters aggregate properties to a specific spatial resolution (e.g., 1 km by 1 km). Such data tends to dominate environmental sciences given their reliance on remote sensing data (e.g., average temperature over the past 20 years at the global scale at 50 km by 50 km resolution). Key R packages to work with rasters are raster and its replacement terra.

Geographic and projected Coordinate Reference Systems (CRS)

CRS, either geographic or projected, defines how the spatial elements of the data relate to the surface of the Earth. Geographic coordinate systems identify any location on the Earth’s surface using two values — longitude and latitude. The surface of the Earth in geographic coordinate systems is represented by a spherical or ellipsoidal surface.

All projected CRSs are based on a geographic CRS and rely on map projections to convert the 3-d surface of the Earth into Easting and Northing (x and y) values in a projected CRS. Projected CRSs are based on Cartesian coordinates on an implicitly flat surface. They have an origin, x and y axes, and a linear unit of measurement such as meters.

This transition cannot be done without adding some deformations. Therefore, some properties of the Earth’s surface are distorted in this process, such as area, direction, distance, and shape. A projected coordinate system can preserve only one or two of those properties.

projection
projection

A quick summary of different projections, their types, properties, and suitability can be found at https://www.geo-projections.com/.

sf and simple features for geographic vector data

Simple Features is a hierarchical data model that represents a wide range of geometry types. Simple features is an open standard developed and endorsed by the Open Geospatial Consortium (OGC). 7 out of 18 geometry types are used in most of geographic research and are supported by the R package sf (POINT, LINESTRING, POLYGON, MULTIPOINT, MULTILINESTRING, MULTIPOLYGON and GEOMETRYCOLLECTION).

sf geometry types
sf geometry types

sf includes functions from three previous widely used R package: sp for data classes, rgdal for data read and write via interfaces to GDAL and PROJ, and rgeos for spatial operation via an interface to GEOS. Therefore, I recommend you to learn sf instead. In fact, rgdal and rgeos will be discontinued by the end of 2023. The best place to start is the vignettes of the sf package.

Another neat feature of sf is that the simple feature objects are saved as data frames and work nicely with the tidyverse packages. Therefore, we can use the verbs from packages such as dplyr to filter, select, aggregate (group_by), summarize, etc. on the simple features, making working with geographic vector data in R enjoyable.

Almost all functions of sf start with st_ (stands for spatial type).

To install sf, follow the instruction from its webpage.

if(!require("xfun")) install.packages("xfun")
## Loading required package: xfun
## 
## Attaching package: 'xfun'
## The following objects are masked from 'package:base':
## 
##     attr, isFALSE
xfun::pkg_attach2(c("sf", "spData", "tidyverse"))
## Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.2     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.3     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
nc <- st_read(system.file("shape/nc.shp", package="sf"))
## Reading layer `nc' from data source `/Users/dli/R/sf/shape/nc.shp' using driver `ESRI Shapefile'
## Simple feature collection with 100 features and 14 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
## Geodetic CRS:  NAD27
class(nc)
## [1] "sf"         "data.frame"
nc
## Simple feature collection with 100 features and 14 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
## Geodetic CRS:  NAD27
## First 10 features:
##     AREA PERIMETER CNTY_ CNTY_ID        NAME  FIPS FIPSNO CRESS_ID BIR74 SID74
## 1  0.114     1.442  1825    1825        Ashe 37009  37009        5  1091     1
## 2  0.061     1.231  1827    1827   Alleghany 37005  37005        3   487     0
## 3  0.143     1.630  1828    1828       Surry 37171  37171       86  3188     5
## 4  0.070     2.968  1831    1831   Currituck 37053  37053       27   508     1
## 5  0.153     2.206  1832    1832 Northampton 37131  37131       66  1421     9
## 6  0.097     1.670  1833    1833    Hertford 37091  37091       46  1452     7
## 7  0.062     1.547  1834    1834      Camden 37029  37029       15   286     0
## 8  0.091     1.284  1835    1835       Gates 37073  37073       37   420     0
## 9  0.118     1.421  1836    1836      Warren 37185  37185       93   968     4
## 10 0.124     1.428  1837    1837      Stokes 37169  37169       85  1612     1
##    NWBIR74 BIR79 SID79 NWBIR79                       geometry
## 1       10  1364     0      19 MULTIPOLYGON (((-81.47276 3...
## 2       10   542     3      12 MULTIPOLYGON (((-81.23989 3...
## 3      208  3616     6     260 MULTIPOLYGON (((-80.45634 3...
## 4      123   830     2     145 MULTIPOLYGON (((-76.00897 3...
## 5     1066  1606     3    1197 MULTIPOLYGON (((-77.21767 3...
## 6      954  1838     5    1237 MULTIPOLYGON (((-76.74506 3...
## 7      115   350     2     139 MULTIPOLYGON (((-76.00897 3...
## 8      254   594     2     371 MULTIPOLYGON (((-76.56251 3...
## 9      748  1190     2     844 MULTIPOLYGON (((-78.30876 3...
## 10     160  2038     5     176 MULTIPOLYGON (((-80.02567 3...
print(nc[9:15], n = 3)

world_tbl = read_sf(system.file("shapes/world.shp", package = "spData"))
world_tbl
## Simple feature collection with 177 features and 10 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -180 ymin: -89.9 xmax: 180 ymax: 83.64513
## Geodetic CRS:  WGS 84
## # A tibble: 177 × 11
##    iso_a2 name_long continent region_un subregion type  area_km2     pop lifeExp
##    <chr>  <chr>     <chr>     <chr>     <chr>     <chr>    <dbl>   <dbl>   <dbl>
##  1 FJ     Fiji      Oceania   Oceania   Melanesia Sove…   1.93e4  8.86e5    70.0
##  2 TZ     Tanzania  Africa    Africa    Eastern … Sove…   9.33e5  5.22e7    64.2
##  3 EH     Western … Africa    Africa    Northern… Inde…   9.63e4 NA         NA  
##  4 CA     Canada    North Am… Americas  Northern… Sove…   1.00e7  3.55e7    82.0
##  5 US     United S… North Am… Americas  Northern… Coun…   9.51e6  3.19e8    78.8
##  6 KZ     Kazakhst… Asia      Asia      Central … Sove…   2.73e6  1.73e7    71.6
##  7 UZ     Uzbekist… Asia      Asia      Central … Sove…   4.61e5  3.08e7    71.0
##  8 PG     Papua Ne… Oceania   Oceania   Melanesia Sove…   4.65e5  7.76e6    65.2
##  9 ID     Indonesia Asia      Asia      South-Ea… Sove…   1.82e6  2.55e8    68.9
## 10 AR     Argentina South Am… Americas  South Am… Sove…   2.78e6  4.30e7    76.3
## # ℹ 167 more rows
## # ℹ 2 more variables: gdpPercap <dbl>, geometry <MULTIPOLYGON [°]>
# subsetting sf is just as subsetting data frames
# note that the geometry column is sticky
world_tbl[1:5, 1:3]
## Simple feature collection with 5 features and 3 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -180 ymin: -18.28799 xmax: 180 ymax: 83.23324
## Geodetic CRS:  WGS 84
## # A tibble: 5 × 4
##   iso_a2 name_long      continent                                       geometry
##   <chr>  <chr>          <chr>                                 <MULTIPOLYGON [°]>
## 1 FJ     Fiji           Oceania       (((-180 -16.55522, -180 -16.06713, -179.7…
## 2 TZ     Tanzania       Africa        (((33.90371 -0.95, 34.07262 -1.05982, 37.…
## 3 EH     Western Sahara Africa        (((-8.66559 27.65643, -8.665124 27.58948,…
## 4 CA     Canada         North America (((-132.71 54.04001, -131.75 54.12, -132.…
## 5 US     United States  North America (((-171.7317 63.78252, -171.1144 63.59219…
world_tbl[3:6]
## Simple feature collection with 177 features and 4 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -180 ymin: -89.9 xmax: 180 ymax: 83.64513
## Geodetic CRS:  WGS 84
## # A tibble: 177 × 5
##    continent     region_un subregion          type                      geometry
##    <chr>         <chr>     <chr>              <chr>           <MULTIPOLYGON [°]>
##  1 Oceania       Oceania   Melanesia          Soverei… (((-180 -16.55522, -180 …
##  2 Africa        Africa    Eastern Africa     Soverei… (((33.90371 -0.95, 34.07…
##  3 Africa        Africa    Northern Africa    Indeter… (((-8.66559 27.65643, -8…
##  4 North America Americas  Northern America   Soverei… (((-132.71 54.04001, -13…
##  5 North America Americas  Northern America   Country  (((-171.7317 63.78252, -…
##  6 Asia          Asia      Central Asia       Soverei… (((87.35997 49.21498, 86…
##  7 Asia          Asia      Central Asia       Soverei… (((55.96819 41.30864, 55…
##  8 Oceania       Oceania   Melanesia          Soverei… (((141.0002 -2.600151, 1…
##  9 Asia          Asia      South-Eastern Asia Soverei… (((104.37 -1.084843, 104…
## 10 South America Americas  South America      Soverei… (((-68.63401 -52.63637, …
## # ℹ 167 more rows
filter(world_tbl, continent == "North America")
## Simple feature collection with 18 features and 10 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -171.7911 ymin: 7.220541 xmax: -12.20855 ymax: 83.64513
## Geodetic CRS:  WGS 84
## # A tibble: 18 × 11
##    iso_a2 name_long  continent region_un subregion type  area_km2    pop lifeExp
##  * <chr>  <chr>      <chr>     <chr>     <chr>     <chr>    <dbl>  <dbl>   <dbl>
##  1 CA     Canada     North Am… Americas  Northern… Sove…   1.00e7 3.55e7    82.0
##  2 US     United St… North Am… Americas  Northern… Coun…   9.51e6 3.19e8    78.8
##  3 HT     Haiti      North Am… Americas  Caribbean Sove…   2.85e4 1.06e7    62.8
##  4 DO     Dominican… North Am… Americas  Caribbean Sove…   4.82e4 1.04e7    73.5
##  5 BS     Bahamas    North Am… Americas  Caribbean Sove…   1.56e4 3.82e5    75.4
##  6 GL     Greenland  North Am… Americas  Northern… Coun…   2.21e6 5.63e4    NA  
##  7 MX     Mexico     North Am… Americas  Central … Sove…   1.97e6 1.24e8    76.8
##  8 PA     Panama     North Am… Americas  Central … Sove…   7.53e4 3.90e6    77.6
##  9 CR     Costa Rica North Am… Americas  Central … Sove…   5.38e4 4.76e6    79.4
## 10 NI     Nicaragua  North Am… Americas  Central … Sove…   1.30e5 6.01e6    74.9
## 11 HN     Honduras   North Am… Americas  Central … Sove…   1.14e5 8.81e6    73.2
## 12 SV     El Salvad… North Am… Americas  Central … Sove…   2.09e4 6.28e6    73.0
## 13 GT     Guatemala  North Am… Americas  Central … Sove…   1.09e5 1.59e7    72.9
## 14 BZ     Belize     North Am… Americas  Central … Sove…   2.20e4 3.52e5    70.0
## 15 PR     Puerto Ri… North Am… Americas  Caribbean Depe…   9.22e3 3.53e6    79.4
## 16 JM     Jamaica    North Am… Americas  Caribbean Sove…   1.25e4 2.86e6    75.7
## 17 CU     Cuba       North Am… Americas  Caribbean Sove…   1.15e5 1.14e7    79.4
## 18 TT     Trinidad … North Am… Americas  Caribbean Sove…   7.74e3 1.35e6    70.4
## # ℹ 2 more variables: gdpPercap <dbl>, geometry <MULTIPOLYGON [°]>
dplyr::select(world_tbl, name_long, pop)
## Simple feature collection with 177 features and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -180 ymin: -89.9 xmax: 180 ymax: 83.64513
## Geodetic CRS:  WGS 84
## # A tibble: 177 × 3
##    name_long              pop                                           geometry
##    <chr>                <dbl>                                 <MULTIPOLYGON [°]>
##  1 Fiji                885806 (((-180 -16.55522, -180 -16.06713, -179.7933 -16.…
##  2 Tanzania          52234869 (((33.90371 -0.95, 34.07262 -1.05982, 37.69869 -3…
##  3 Western Sahara          NA (((-8.66559 27.65643, -8.665124 27.58948, -8.6844…
##  4 Canada            35535348 (((-132.71 54.04001, -131.75 54.12, -132.0495 52.…
##  5 United States    318622525 (((-171.7317 63.78252, -171.1144 63.59219, -170.4…
##  6 Kazakhstan        17288285 (((87.35997 49.21498, 86.59878 48.54918, 85.76823…
##  7 Uzbekistan        30757700 (((55.96819 41.30864, 55.92892 44.99586, 58.50313…
##  8 Papua New Guinea   7755785 (((141.0002 -2.600151, 142.7352 -3.289153, 144.58…
##  9 Indonesia        255131116 (((104.37 -1.084843, 104.5395 -1.782372, 104.8879…
## 10 Argentina         42981515 (((-68.63401 -52.63637, -68.25 -53.1, -67.75 -53.…
## # ℹ 167 more rows
# to get the geometry only
world_tbl[0]
## Simple feature collection with 177 features and 0 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -180 ymin: -89.9 xmax: 180 ymax: 83.64513
## Geodetic CRS:  WGS 84
## # A tibble: 177 × 1
##                                                                         geometry
##                                                               <MULTIPOLYGON [°]>
##  1 (((-180 -16.55522, -180 -16.06713, -179.7933 -16.02088, -179.9174 -16.50178,…
##  2 (((33.90371 -0.95, 34.07262 -1.05982, 37.69869 -3.09699, 37.7669 -3.67712, 3…
##  3 (((-8.66559 27.65643, -8.665124 27.58948, -8.6844 27.39574, -8.687294 25.881…
##  4 (((-132.71 54.04001, -131.75 54.12, -132.0495 52.98462, -131.179 52.18043, -…
##  5 (((-171.7317 63.78252, -171.1144 63.59219, -170.4911 63.69498, -169.6825 63.…
##  6 (((87.35997 49.21498, 86.59878 48.54918, 85.76823 48.45575, 85.72048 47.4529…
##  7 (((55.96819 41.30864, 55.92892 44.99586, 58.50313 45.5868, 58.68999 45.50001…
##  8 (((141.0002 -2.600151, 142.7352 -3.289153, 144.584 -3.861418, 145.2732 -4.37…
##  9 (((104.37 -1.084843, 104.5395 -1.782372, 104.8879 -2.340425, 105.6221 -2.428…
## 10 (((-68.63401 -52.63637, -68.25 -53.1, -67.75 -53.85, -66.45 -54.45, -65.05 -…
## # ℹ 167 more rows
st_geometry(world_tbl)
## Geometry set for 177 features 
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -180 ymin: -89.9 xmax: 180 ymax: 83.64513
## Geodetic CRS:  WGS 84
## First 5 geometries:
## MULTIPOLYGON (((-180 -16.55522, -180 -16.06713,...
## MULTIPOLYGON (((33.90371 -0.95, 34.07262 -1.059...
## MULTIPOLYGON (((-8.66559 27.65643, -8.665124 27...
## MULTIPOLYGON (((-132.71 54.04001, -131.75 54.12...
## MULTIPOLYGON (((-171.7317 63.78252, -171.1144 6...
# drop geometry
st_drop_geometry(world_tbl)
## # A tibble: 177 × 10
##    iso_a2 name_long continent region_un subregion type  area_km2     pop lifeExp
##  * <chr>  <chr>     <chr>     <chr>     <chr>     <chr>    <dbl>   <dbl>   <dbl>
##  1 FJ     Fiji      Oceania   Oceania   Melanesia Sove…   1.93e4  8.86e5    70.0
##  2 TZ     Tanzania  Africa    Africa    Eastern … Sove…   9.33e5  5.22e7    64.2
##  3 EH     Western … Africa    Africa    Northern… Inde…   9.63e4 NA         NA  
##  4 CA     Canada    North Am… Americas  Northern… Sove…   1.00e7  3.55e7    82.0
##  5 US     United S… North Am… Americas  Northern… Coun…   9.51e6  3.19e8    78.8
##  6 KZ     Kazakhst… Asia      Asia      Central … Sove…   2.73e6  1.73e7    71.6
##  7 UZ     Uzbekist… Asia      Asia      Central … Sove…   4.61e5  3.08e7    71.0
##  8 PG     Papua Ne… Oceania   Oceania   Melanesia Sove…   4.65e5  7.76e6    65.2
##  9 ID     Indonesia Asia      Asia      South-Ea… Sove…   1.82e6  2.55e8    68.9
## 10 AR     Argentina South Am… Americas  South Am… Sove…   2.78e6  4.30e7    76.3
## # ℹ 167 more rows
## # ℹ 1 more variable: gdpPercap <dbl>
# aggregating (note the geometry is aggregated too)
world_tbl |> 
  group_by(continent) |> 
  summarise(total_area = sum(area_km2, na.rm = TRUE)) |> 
  arrange(desc(total_area))
## Simple feature collection with 8 features and 2 fields
## Geometry type: GEOMETRY
## Dimension:     XY
## Bounding box:  xmin: -180 ymin: -89.9 xmax: 180 ymax: 83.64513
## Geodetic CRS:  WGS 84
## # A tibble: 8 × 3
##   continent               total_area                                    geometry
##   <chr>                        <dbl>                              <GEOMETRY [°]>
## 1 Asia                     31252459. MULTIPOLYGON (((104.0108 -1.059212, 103.43…
## 2 Africa                   29946198. MULTIPOLYGON (((36.86623 22, 36.69069 22.2…
## 3 North America            24484309. MULTIPOLYGON (((-82.51044 23.07875, -83.26…
## 4 Europe                   23065219. MULTIPOLYGON (((25.74502 35.18, 25.76921 3…
## 5 South America            17762592. MULTIPOLYGON (((-70.37257 -18.34798, -70.1…
## 6 Antarctica               12335956. MULTIPOLYGON (((180 -89.9, 180 -84.71338, …
## 7 Oceania                   8504489. MULTIPOLYGON (((167.0012 -15.6146, 167.27 …
## 8 Seven seas (open ocean)     11603. POLYGON ((68.8675 -48.83, 68.72 -49.2425, …
# join data frames / sf
coffee_data
## # A tibble: 47 × 3
##    name_long                coffee_production_2016 coffee_production_2017
##    <chr>                                     <int>                  <int>
##  1 Angola                                       NA                     NA
##  2 Bolivia                                       3                      4
##  3 Brazil                                     3277                   2786
##  4 Burundi                                      37                     38
##  5 Cameroon                                      8                      6
##  6 Central African Republic                     NA                     NA
##  7 Congo, Dem. Rep. of                           4                     12
##  8 Colombia                                   1330                   1169
##  9 Costa Rica                                   28                     32
## 10 Côte d'Ivoire                               114                    130
## # ℹ 37 more rows
world_coffee = left_join(world_tbl, coffee_data)
## Joining with `by = join_by(name_long)`
world_coffee
## Simple feature collection with 177 features and 12 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -180 ymin: -89.9 xmax: 180 ymax: 83.64513
## Geodetic CRS:  WGS 84
## # A tibble: 177 × 13
##    iso_a2 name_long continent region_un subregion type  area_km2     pop lifeExp
##    <chr>  <chr>     <chr>     <chr>     <chr>     <chr>    <dbl>   <dbl>   <dbl>
##  1 FJ     Fiji      Oceania   Oceania   Melanesia Sove…   1.93e4  8.86e5    70.0
##  2 TZ     Tanzania  Africa    Africa    Eastern … Sove…   9.33e5  5.22e7    64.2
##  3 EH     Western … Africa    Africa    Northern… Inde…   9.63e4 NA         NA  
##  4 CA     Canada    North Am… Americas  Northern… Sove…   1.00e7  3.55e7    82.0
##  5 US     United S… North Am… Americas  Northern… Coun…   9.51e6  3.19e8    78.8
##  6 KZ     Kazakhst… Asia      Asia      Central … Sove…   2.73e6  1.73e7    71.6
##  7 UZ     Uzbekist… Asia      Asia      Central … Sove…   4.61e5  3.08e7    71.0
##  8 PG     Papua Ne… Oceania   Oceania   Melanesia Sove…   4.65e5  7.76e6    65.2
##  9 ID     Indonesia Asia      Asia      South-Ea… Sove…   1.82e6  2.55e8    68.9
## 10 AR     Argentina South Am… Americas  South Am… Sove…   2.78e6  4.30e7    76.3
## # ℹ 167 more rows
## # ℹ 4 more variables: gdpPercap <dbl>, geometry <MULTIPOLYGON [°]>,
## #   coffee_production_2016 <int>, coffee_production_2017 <int>
# making simple maps are easy
plot(world_tbl)
## Warning: plotting the first 9 out of 10 attributes; use max.plot = 10 to plot
## all