This lecture will cover briefly about how to visualize, analyze, and model geographic data with open source software. We will focus on using the R ecosystem for spatial data analysis. Over the past decade, a wide range of packages based on the free and open source software for geospatial (FOSS4G) have been developed in R to support advanced geospatial statistics, visualization, and modeling. To learn more, one good free book is Geocomputation with R.
Two fundamental geographic data models exist: vector and raster.
The vector data model represents the world using points, lines and
polygons. For example, administrative boundary of counties across all
states in the US. Such data usually have high level of precision and are
mostly used to describe properties of units with discrete borders (e.g.,
human population density of different counties). The geographic vector
data model is based on points located within a coordinate reference
system (CRS); one common example is latitude/longitude coordinates.
Lines and polygons are just collections of points. A key R package to
work with geographic vector data is sf
.
The raster data model divides the surface up into cells of constant
size. Raster datasets are the basis of web-mapping based on aerial
photography and satellite-based remote sensing data. Rasters aggregate
properties to a specific spatial resolution (e.g., 1 km by 1 km). Such
data tends to dominate environmental sciences given their reliance on
remote sensing data (e.g., average temperature over the past 20 years at
the global scale at 50 km by 50 km resolution). Key R packages to work
with rasters are raster
and its replacement terra
.
CRS, either geographic or projected, defines how the spatial elements of the data relate to the surface of the Earth. Geographic coordinate systems identify any location on the Earth’s surface using two values — longitude and latitude. The surface of the Earth in geographic coordinate systems is represented by a spherical or ellipsoidal surface.
All projected CRSs are based on a geographic CRS and rely on map projections to convert the 3-d surface of the Earth into Easting and Northing (x and y) values in a projected CRS. Projected CRSs are based on Cartesian coordinates on an implicitly flat surface. They have an origin, x and y axes, and a linear unit of measurement such as meters.
This transition cannot be done without adding some deformations. Therefore, some properties of the Earth’s surface are distorted in this process, such as area, direction, distance, and shape. A projected coordinate system can preserve only one or two of those properties.
A quick summary of different projections, their types, properties, and suitability can be found at https://www.geo-projections.com/.
sf
and simple features for geographic vector dataSimple Features is a hierarchical data model that represents a wide
range of geometry types. Simple features is an open standard developed
and endorsed by the Open Geospatial Consortium (OGC). 7 out of 18
geometry types are used in most of geographic research and are supported
by the R package sf
(POINT
,
LINESTRING
, POLYGON
, MULTIPOINT
,
MULTILINESTRING
, MULTIPOLYGON
and
GEOMETRYCOLLECTION
).
POINT
zero-dimensional geometry containing a single
pointLINESTRING
sequence of points connected by straight,
non-self intersecting line pieces; one-dimensional geometryPOLYGON
geometry with a positive area
(two-dimensional); sequence of points form a closed, non-self
intersecting ring; the first ring denotes the exterior ring, zero or
more subsequent rings denote holes in this exterior ringMULTIPOINT
set of points; a MULTIPOINT is simple if no
two Points in the MULTIPOINT are equalMULTILINESTRING
set of linestringsMULTIPOLYGON
set of polygonsGEOMETRYCOLLECTION
set of geometries of any type except
GEOMETRYCOLLECTIONsf
includes functions from three previous widely used R
package: sp
for data classes, rgdal
for data
read and write via interfaces to GDAL and PROJ, and rgeos
for spatial operation via an interface to GEOS. Therefore, I recommend
you to learn sf
instead. In fact, rgdal
and
rgeos
will be discontinued by the end of 2023. The best
place to start is the vignettes of the
sf
package.
Another neat feature of sf
is that the simple feature
objects are saved as data frames and work nicely with the
tidyverse
packages. Therefore, we can use the verbs from
packages such as dplyr
to filter, select, aggregate
(group_by), summarize, etc. on the simple features, making working with
geographic vector data in R enjoyable.
Almost all functions of sf
start with st_
(stands for spatial type).
To install sf
, follow the instruction from its webpage.
if(!require("xfun")) install.packages("xfun")
## Loading required package: xfun
##
## Attaching package: 'xfun'
## The following objects are masked from 'package:base':
##
## attr, isFALSE
xfun::pkg_attach2(c("sf", "spData", "tidyverse"))
## Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.2 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.3 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
nc <- st_read(system.file("shape/nc.shp", package="sf"))
## Reading layer `nc' from data source `/Users/dli/R/sf/shape/nc.shp' using driver `ESRI Shapefile'
## Simple feature collection with 100 features and 14 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
## Geodetic CRS: NAD27
class(nc)
## [1] "sf" "data.frame"
nc
## Simple feature collection with 100 features and 14 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
## Geodetic CRS: NAD27
## First 10 features:
## AREA PERIMETER CNTY_ CNTY_ID NAME FIPS FIPSNO CRESS_ID BIR74 SID74
## 1 0.114 1.442 1825 1825 Ashe 37009 37009 5 1091 1
## 2 0.061 1.231 1827 1827 Alleghany 37005 37005 3 487 0
## 3 0.143 1.630 1828 1828 Surry 37171 37171 86 3188 5
## 4 0.070 2.968 1831 1831 Currituck 37053 37053 27 508 1
## 5 0.153 2.206 1832 1832 Northampton 37131 37131 66 1421 9
## 6 0.097 1.670 1833 1833 Hertford 37091 37091 46 1452 7
## 7 0.062 1.547 1834 1834 Camden 37029 37029 15 286 0
## 8 0.091 1.284 1835 1835 Gates 37073 37073 37 420 0
## 9 0.118 1.421 1836 1836 Warren 37185 37185 93 968 4
## 10 0.124 1.428 1837 1837 Stokes 37169 37169 85 1612 1
## NWBIR74 BIR79 SID79 NWBIR79 geometry
## 1 10 1364 0 19 MULTIPOLYGON (((-81.47276 3...
## 2 10 542 3 12 MULTIPOLYGON (((-81.23989 3...
## 3 208 3616 6 260 MULTIPOLYGON (((-80.45634 3...
## 4 123 830 2 145 MULTIPOLYGON (((-76.00897 3...
## 5 1066 1606 3 1197 MULTIPOLYGON (((-77.21767 3...
## 6 954 1838 5 1237 MULTIPOLYGON (((-76.74506 3...
## 7 115 350 2 139 MULTIPOLYGON (((-76.00897 3...
## 8 254 594 2 371 MULTIPOLYGON (((-76.56251 3...
## 9 748 1190 2 844 MULTIPOLYGON (((-78.30876 3...
## 10 160 2038 5 176 MULTIPOLYGON (((-80.02567 3...
print(nc[9:15], n = 3)
world_tbl = read_sf(system.file("shapes/world.shp", package = "spData"))
world_tbl
## Simple feature collection with 177 features and 10 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -180 ymin: -89.9 xmax: 180 ymax: 83.64513
## Geodetic CRS: WGS 84
## # A tibble: 177 × 11
## iso_a2 name_long continent region_un subregion type area_km2 pop lifeExp
## <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 FJ Fiji Oceania Oceania Melanesia Sove… 1.93e4 8.86e5 70.0
## 2 TZ Tanzania Africa Africa Eastern … Sove… 9.33e5 5.22e7 64.2
## 3 EH Western … Africa Africa Northern… Inde… 9.63e4 NA NA
## 4 CA Canada North Am… Americas Northern… Sove… 1.00e7 3.55e7 82.0
## 5 US United S… North Am… Americas Northern… Coun… 9.51e6 3.19e8 78.8
## 6 KZ Kazakhst… Asia Asia Central … Sove… 2.73e6 1.73e7 71.6
## 7 UZ Uzbekist… Asia Asia Central … Sove… 4.61e5 3.08e7 71.0
## 8 PG Papua Ne… Oceania Oceania Melanesia Sove… 4.65e5 7.76e6 65.2
## 9 ID Indonesia Asia Asia South-Ea… Sove… 1.82e6 2.55e8 68.9
## 10 AR Argentina South Am… Americas South Am… Sove… 2.78e6 4.30e7 76.3
## # ℹ 167 more rows
## # ℹ 2 more variables: gdpPercap <dbl>, geometry <MULTIPOLYGON [°]>
# subsetting sf is just as subsetting data frames
# note that the geometry column is sticky
world_tbl[1:5, 1:3]
## Simple feature collection with 5 features and 3 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -180 ymin: -18.28799 xmax: 180 ymax: 83.23324
## Geodetic CRS: WGS 84
## # A tibble: 5 × 4
## iso_a2 name_long continent geometry
## <chr> <chr> <chr> <MULTIPOLYGON [°]>
## 1 FJ Fiji Oceania (((-180 -16.55522, -180 -16.06713, -179.7…
## 2 TZ Tanzania Africa (((33.90371 -0.95, 34.07262 -1.05982, 37.…
## 3 EH Western Sahara Africa (((-8.66559 27.65643, -8.665124 27.58948,…
## 4 CA Canada North America (((-132.71 54.04001, -131.75 54.12, -132.…
## 5 US United States North America (((-171.7317 63.78252, -171.1144 63.59219…
world_tbl[3:6]
## Simple feature collection with 177 features and 4 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -180 ymin: -89.9 xmax: 180 ymax: 83.64513
## Geodetic CRS: WGS 84
## # A tibble: 177 × 5
## continent region_un subregion type geometry
## <chr> <chr> <chr> <chr> <MULTIPOLYGON [°]>
## 1 Oceania Oceania Melanesia Soverei… (((-180 -16.55522, -180 …
## 2 Africa Africa Eastern Africa Soverei… (((33.90371 -0.95, 34.07…
## 3 Africa Africa Northern Africa Indeter… (((-8.66559 27.65643, -8…
## 4 North America Americas Northern America Soverei… (((-132.71 54.04001, -13…
## 5 North America Americas Northern America Country (((-171.7317 63.78252, -…
## 6 Asia Asia Central Asia Soverei… (((87.35997 49.21498, 86…
## 7 Asia Asia Central Asia Soverei… (((55.96819 41.30864, 55…
## 8 Oceania Oceania Melanesia Soverei… (((141.0002 -2.600151, 1…
## 9 Asia Asia South-Eastern Asia Soverei… (((104.37 -1.084843, 104…
## 10 South America Americas South America Soverei… (((-68.63401 -52.63637, …
## # ℹ 167 more rows
filter(world_tbl, continent == "North America")
## Simple feature collection with 18 features and 10 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -171.7911 ymin: 7.220541 xmax: -12.20855 ymax: 83.64513
## Geodetic CRS: WGS 84
## # A tibble: 18 × 11
## iso_a2 name_long continent region_un subregion type area_km2 pop lifeExp
## * <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 CA Canada North Am… Americas Northern… Sove… 1.00e7 3.55e7 82.0
## 2 US United St… North Am… Americas Northern… Coun… 9.51e6 3.19e8 78.8
## 3 HT Haiti North Am… Americas Caribbean Sove… 2.85e4 1.06e7 62.8
## 4 DO Dominican… North Am… Americas Caribbean Sove… 4.82e4 1.04e7 73.5
## 5 BS Bahamas North Am… Americas Caribbean Sove… 1.56e4 3.82e5 75.4
## 6 GL Greenland North Am… Americas Northern… Coun… 2.21e6 5.63e4 NA
## 7 MX Mexico North Am… Americas Central … Sove… 1.97e6 1.24e8 76.8
## 8 PA Panama North Am… Americas Central … Sove… 7.53e4 3.90e6 77.6
## 9 CR Costa Rica North Am… Americas Central … Sove… 5.38e4 4.76e6 79.4
## 10 NI Nicaragua North Am… Americas Central … Sove… 1.30e5 6.01e6 74.9
## 11 HN Honduras North Am… Americas Central … Sove… 1.14e5 8.81e6 73.2
## 12 SV El Salvad… North Am… Americas Central … Sove… 2.09e4 6.28e6 73.0
## 13 GT Guatemala North Am… Americas Central … Sove… 1.09e5 1.59e7 72.9
## 14 BZ Belize North Am… Americas Central … Sove… 2.20e4 3.52e5 70.0
## 15 PR Puerto Ri… North Am… Americas Caribbean Depe… 9.22e3 3.53e6 79.4
## 16 JM Jamaica North Am… Americas Caribbean Sove… 1.25e4 2.86e6 75.7
## 17 CU Cuba North Am… Americas Caribbean Sove… 1.15e5 1.14e7 79.4
## 18 TT Trinidad … North Am… Americas Caribbean Sove… 7.74e3 1.35e6 70.4
## # ℹ 2 more variables: gdpPercap <dbl>, geometry <MULTIPOLYGON [°]>
dplyr::select(world_tbl, name_long, pop)
## Simple feature collection with 177 features and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -180 ymin: -89.9 xmax: 180 ymax: 83.64513
## Geodetic CRS: WGS 84
## # A tibble: 177 × 3
## name_long pop geometry
## <chr> <dbl> <MULTIPOLYGON [°]>
## 1 Fiji 885806 (((-180 -16.55522, -180 -16.06713, -179.7933 -16.…
## 2 Tanzania 52234869 (((33.90371 -0.95, 34.07262 -1.05982, 37.69869 -3…
## 3 Western Sahara NA (((-8.66559 27.65643, -8.665124 27.58948, -8.6844…
## 4 Canada 35535348 (((-132.71 54.04001, -131.75 54.12, -132.0495 52.…
## 5 United States 318622525 (((-171.7317 63.78252, -171.1144 63.59219, -170.4…
## 6 Kazakhstan 17288285 (((87.35997 49.21498, 86.59878 48.54918, 85.76823…
## 7 Uzbekistan 30757700 (((55.96819 41.30864, 55.92892 44.99586, 58.50313…
## 8 Papua New Guinea 7755785 (((141.0002 -2.600151, 142.7352 -3.289153, 144.58…
## 9 Indonesia 255131116 (((104.37 -1.084843, 104.5395 -1.782372, 104.8879…
## 10 Argentina 42981515 (((-68.63401 -52.63637, -68.25 -53.1, -67.75 -53.…
## # ℹ 167 more rows
# to get the geometry only
world_tbl[0]
## Simple feature collection with 177 features and 0 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -180 ymin: -89.9 xmax: 180 ymax: 83.64513
## Geodetic CRS: WGS 84
## # A tibble: 177 × 1
## geometry
## <MULTIPOLYGON [°]>
## 1 (((-180 -16.55522, -180 -16.06713, -179.7933 -16.02088, -179.9174 -16.50178,…
## 2 (((33.90371 -0.95, 34.07262 -1.05982, 37.69869 -3.09699, 37.7669 -3.67712, 3…
## 3 (((-8.66559 27.65643, -8.665124 27.58948, -8.6844 27.39574, -8.687294 25.881…
## 4 (((-132.71 54.04001, -131.75 54.12, -132.0495 52.98462, -131.179 52.18043, -…
## 5 (((-171.7317 63.78252, -171.1144 63.59219, -170.4911 63.69498, -169.6825 63.…
## 6 (((87.35997 49.21498, 86.59878 48.54918, 85.76823 48.45575, 85.72048 47.4529…
## 7 (((55.96819 41.30864, 55.92892 44.99586, 58.50313 45.5868, 58.68999 45.50001…
## 8 (((141.0002 -2.600151, 142.7352 -3.289153, 144.584 -3.861418, 145.2732 -4.37…
## 9 (((104.37 -1.084843, 104.5395 -1.782372, 104.8879 -2.340425, 105.6221 -2.428…
## 10 (((-68.63401 -52.63637, -68.25 -53.1, -67.75 -53.85, -66.45 -54.45, -65.05 -…
## # ℹ 167 more rows
st_geometry(world_tbl)
## Geometry set for 177 features
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -180 ymin: -89.9 xmax: 180 ymax: 83.64513
## Geodetic CRS: WGS 84
## First 5 geometries:
## MULTIPOLYGON (((-180 -16.55522, -180 -16.06713,...
## MULTIPOLYGON (((33.90371 -0.95, 34.07262 -1.059...
## MULTIPOLYGON (((-8.66559 27.65643, -8.665124 27...
## MULTIPOLYGON (((-132.71 54.04001, -131.75 54.12...
## MULTIPOLYGON (((-171.7317 63.78252, -171.1144 6...
# drop geometry
st_drop_geometry(world_tbl)
## # A tibble: 177 × 10
## iso_a2 name_long continent region_un subregion type area_km2 pop lifeExp
## * <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 FJ Fiji Oceania Oceania Melanesia Sove… 1.93e4 8.86e5 70.0
## 2 TZ Tanzania Africa Africa Eastern … Sove… 9.33e5 5.22e7 64.2
## 3 EH Western … Africa Africa Northern… Inde… 9.63e4 NA NA
## 4 CA Canada North Am… Americas Northern… Sove… 1.00e7 3.55e7 82.0
## 5 US United S… North Am… Americas Northern… Coun… 9.51e6 3.19e8 78.8
## 6 KZ Kazakhst… Asia Asia Central … Sove… 2.73e6 1.73e7 71.6
## 7 UZ Uzbekist… Asia Asia Central … Sove… 4.61e5 3.08e7 71.0
## 8 PG Papua Ne… Oceania Oceania Melanesia Sove… 4.65e5 7.76e6 65.2
## 9 ID Indonesia Asia Asia South-Ea… Sove… 1.82e6 2.55e8 68.9
## 10 AR Argentina South Am… Americas South Am… Sove… 2.78e6 4.30e7 76.3
## # ℹ 167 more rows
## # ℹ 1 more variable: gdpPercap <dbl>
# aggregating (note the geometry is aggregated too)
world_tbl |>
group_by(continent) |>
summarise(total_area = sum(area_km2, na.rm = TRUE)) |>
arrange(desc(total_area))
## Simple feature collection with 8 features and 2 fields
## Geometry type: GEOMETRY
## Dimension: XY
## Bounding box: xmin: -180 ymin: -89.9 xmax: 180 ymax: 83.64513
## Geodetic CRS: WGS 84
## # A tibble: 8 × 3
## continent total_area geometry
## <chr> <dbl> <GEOMETRY [°]>
## 1 Asia 31252459. MULTIPOLYGON (((104.0108 -1.059212, 103.43…
## 2 Africa 29946198. MULTIPOLYGON (((36.86623 22, 36.69069 22.2…
## 3 North America 24484309. MULTIPOLYGON (((-82.51044 23.07875, -83.26…
## 4 Europe 23065219. MULTIPOLYGON (((25.74502 35.18, 25.76921 3…
## 5 South America 17762592. MULTIPOLYGON (((-70.37257 -18.34798, -70.1…
## 6 Antarctica 12335956. MULTIPOLYGON (((180 -89.9, 180 -84.71338, …
## 7 Oceania 8504489. MULTIPOLYGON (((167.0012 -15.6146, 167.27 …
## 8 Seven seas (open ocean) 11603. POLYGON ((68.8675 -48.83, 68.72 -49.2425, …
# join data frames / sf
coffee_data
## # A tibble: 47 × 3
## name_long coffee_production_2016 coffee_production_2017
## <chr> <int> <int>
## 1 Angola NA NA
## 2 Bolivia 3 4
## 3 Brazil 3277 2786
## 4 Burundi 37 38
## 5 Cameroon 8 6
## 6 Central African Republic NA NA
## 7 Congo, Dem. Rep. of 4 12
## 8 Colombia 1330 1169
## 9 Costa Rica 28 32
## 10 Côte d'Ivoire 114 130
## # ℹ 37 more rows
world_coffee = left_join(world_tbl, coffee_data)
## Joining with `by = join_by(name_long)`
world_coffee
## Simple feature collection with 177 features and 12 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -180 ymin: -89.9 xmax: 180 ymax: 83.64513
## Geodetic CRS: WGS 84
## # A tibble: 177 × 13
## iso_a2 name_long continent region_un subregion type area_km2 pop lifeExp
## <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 FJ Fiji Oceania Oceania Melanesia Sove… 1.93e4 8.86e5 70.0
## 2 TZ Tanzania Africa Africa Eastern … Sove… 9.33e5 5.22e7 64.2
## 3 EH Western … Africa Africa Northern… Inde… 9.63e4 NA NA
## 4 CA Canada North Am… Americas Northern… Sove… 1.00e7 3.55e7 82.0
## 5 US United S… North Am… Americas Northern… Coun… 9.51e6 3.19e8 78.8
## 6 KZ Kazakhst… Asia Asia Central … Sove… 2.73e6 1.73e7 71.6
## 7 UZ Uzbekist… Asia Asia Central … Sove… 4.61e5 3.08e7 71.0
## 8 PG Papua Ne… Oceania Oceania Melanesia Sove… 4.65e5 7.76e6 65.2
## 9 ID Indonesia Asia Asia South-Ea… Sove… 1.82e6 2.55e8 68.9
## 10 AR Argentina South Am… Americas South Am… Sove… 2.78e6 4.30e7 76.3
## # ℹ 167 more rows
## # ℹ 4 more variables: gdpPercap <dbl>, geometry <MULTIPOLYGON [°]>,
## # coffee_production_2016 <int>, coffee_production_2017 <int>
# making simple maps are easy
plot(world_tbl)
## Warning: plotting the first 9 out of 10 attributes; use max.plot = 10 to plot
## all