This lecture will cover briefly about how to visualize, analyze, and model geographic data with open source software. We will focus on using the R ecosystem for spatial data analysis. Over the past decade, a wide range of packages based on the free and open source software for geospatial (FOSS4G) have been developed in R to support advanced geospatial statistics, visualization, and modeling. To learn more, one good free book is Geocomputation with R.

Two fundamental geographic data models exist: vector and raster.

The vector data model represents the world using points, lines and polygons. For example, administrative boundary of counties across all states in the US. Such data usually have high level of precision and are mostly used to describe properties of units with discrete borders (e.g., human population density of different counties). The geographic vector data model is based on points located within a coordinate reference system (CRS); one common example is latitude/longitude coordinates. Lines and polygons are just collections of points. A key R package to work with geographic vector data is sf.

The raster data model divides the surface up into cells of constant size. Raster datasets are the basis of web-mapping based on aerial photography and satellite-based remote sensing data. Rasters aggregate properties to a specific spatial resolution (e.g., 1 km by 1 km). Such data tends to dominate environmental sciences given their reliance on remote sensing data (e.g., average temperature over the past 20 years at the global scale at 50 km by 50 km resolution). Key R packages to work with rasters are raster and its replacement terra.

Geographic and projected Coordinate Reference Systems (CRS)

CRS, either geographic or projected, defines how the spatial elements of the data relate to the surface of the Earth. Geographic coordinate systems identify any location on the Earth’s surface using two values — longitude and latitude. The surface of the Earth in geographic coordinate systems is represented by a spherical or ellipsoidal surface.

All projected CRSs are based on a geographic CRS and rely on map projections to convert the 3-d surface of the Earth into Easting and Northing (x and y) values in a projected CRS. Projected CRSs are based on Cartesian coordinates on an implicitly flat surface. They have an origin, x and y axes, and a linear unit of measurement such as meters.

This transition cannot be done without adding some deformations. Therefore, some properties of the Earth’s surface are distorted in this process, such as area, direction, distance, and shape. A projected coordinate system can preserve only one or two of those properties.

projection
projection

A quick summary of different projections, their types, properties, and suitability can be found at https://www.geo-projections.com/.

sf and simple features for geographic vector data

Simple Features is a hierarchical data model that represents a wide range of geometry types. Simple features is an open standard developed and endorsed by the Open Geospatial Consortium (OGC). 7 out of 18 geometry types are used in most of geographic research and are supported by the R package sf (POINT, LINESTRING, POLYGON, MULTIPOINT, MULTILINESTRING, MULTIPOLYGON and GEOMETRYCOLLECTION).

sf geometry types
sf geometry types

sf includes functions from three previous widely used R package: sp for data classes, rgdal for data read and write via interfaces to GDAL and PROJ, and rgeos for spatial operation via an interface to GEOS. Therefore, I recommend you to learn sf instead. In fact, rgdal and rgeos will be discontinued by the end of 2023. The best place to start is the vignettes of the sf package.

Another neat feature of sf is that the simple feature objects are saved as data frames and work nicely with the tidyverse packages. Therefore, we can use the verbs from packages such as dplyr to filter, select, aggregate (group_by), summarize, etc. on the simple features, making working with geographic vector data in R enjoyable.

Almost all functions of sf start with st_ (stands for spatial type).

To install sf, follow the instruction from its webpage.

if(!require("xfun")) install.packages("xfun")
## Loading required package: xfun
## 
## Attaching package: 'xfun'
## The following objects are masked from 'package:base':
## 
##     attr, isFALSE
xfun::pkg_attach2(c("sf", "spData", "tidyverse"))
## Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.2     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.3     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
nc <- st_read(system.file("shape/nc.shp", package="sf"))
## Reading layer `nc' from data source `/Users/dli/R/sf/shape/nc.shp' using driver `ESRI Shapefile'
## Simple feature collection with 100 features and 14 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
## Geodetic CRS:  NAD27
class(nc)
## [1] "sf"         "data.frame"
nc
## Simple feature collection with 100 features and 14 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
## Geodetic CRS:  NAD27
## First 10 features:
##     AREA PERIMETER CNTY_ CNTY_ID        NAME  FIPS FIPSNO CRESS_ID BIR74 SID74
## 1  0.114     1.442  1825    1825        Ashe 37009  37009        5  1091     1
## 2  0.061     1.231  1827    1827   Alleghany 37005  37005        3   487     0
## 3  0.143     1.630  1828    1828       Surry 37171  37171       86  3188     5
## 4  0.070     2.968  1831    1831   Currituck 37053  37053       27   508     1
## 5  0.153     2.206  1832    1832 Northampton 37131  37131       66  1421     9
## 6  0.097     1.670  1833    1833    Hertford 37091  37091       46  1452     7
## 7  0.062     1.547  1834    1834      Camden 37029  37029       15   286     0
## 8  0.091     1.284  1835    1835       Gates 37073  37073       37   420     0
## 9  0.118     1.421  1836    1836      Warren 37185  37185       93   968     4
## 10 0.124     1.428  1837    1837      Stokes 37169  37169       85  1612     1
##    NWBIR74 BIR79 SID79 NWBIR79                       geometry
## 1       10  1364     0      19 MULTIPOLYGON (((-81.47276 3...
## 2       10   542     3      12 MULTIPOLYGON (((-81.23989 3...
## 3      208  3616     6     260 MULTIPOLYGON (((-80.45634 3...
## 4      123   830     2     145 MULTIPOLYGON (((-76.00897 3...
## 5     1066  1606     3    1197 MULTIPOLYGON (((-77.21767 3...
## 6      954  1838     5    1237 MULTIPOLYGON (((-76.74506 3...
## 7      115   350     2     139 MULTIPOLYGON (((-76.00897 3...
## 8      254   594     2     371 MULTIPOLYGON (((-76.56251 3...
## 9      748  1190     2     844 MULTIPOLYGON (((-78.30876 3...
## 10     160  2038     5     176 MULTIPOLYGON (((-80.02567 3...
print(nc[9:15], n = 3)

world_tbl = read_sf(system.file("shapes/world.shp", package = "spData"))
world_tbl
## Simple feature collection with 177 features and 10 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -180 ymin: -89.9 xmax: 180 ymax: 83.64513
## Geodetic CRS:  WGS 84
## # A tibble: 177 × 11
##    iso_a2 name_long continent region_un subregion type  area_km2     pop lifeExp
##    <chr>  <chr>     <chr>     <chr>     <chr>     <chr>    <dbl>   <dbl>   <dbl>
##  1 FJ     Fiji      Oceania   Oceania   Melanesia Sove…   1.93e4  8.86e5    70.0
##  2 TZ     Tanzania  Africa    Africa    Eastern … Sove…   9.33e5  5.22e7    64.2
##  3 EH     Western … Africa    Africa    Northern… Inde…   9.63e4 NA         NA  
##  4 CA     Canada    North Am… Americas  Northern… Sove…   1.00e7  3.55e7    82.0
##  5 US     United S… North Am… Americas  Northern… Coun…   9.51e6  3.19e8    78.8
##  6 KZ     Kazakhst… Asia      Asia      Central … Sove…   2.73e6  1.73e7    71.6
##  7 UZ     Uzbekist… Asia      Asia      Central … Sove…   4.61e5  3.08e7    71.0
##  8 PG     Papua Ne… Oceania   Oceania   Melanesia Sove…   4.65e5  7.76e6    65.2
##  9 ID     Indonesia Asia      Asia      South-Ea… Sove…   1.82e6  2.55e8    68.9
## 10 AR     Argentina South Am… Americas  South Am… Sove…   2.78e6  4.30e7    76.3
## # ℹ 167 more rows
## # ℹ 2 more variables: gdpPercap <dbl>, geometry <MULTIPOLYGON [°]>
# subsetting sf is just as subsetting data frames
# note that the geometry column is sticky
world_tbl[1:5, 1:3]
## Simple feature collection with 5 features and 3 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -180 ymin: -18.28799 xmax: 180 ymax: 83.23324
## Geodetic CRS:  WGS 84
## # A tibble: 5 × 4
##   iso_a2 name_long      continent                                       geometry
##   <chr>  <chr>          <chr>                                 <MULTIPOLYGON [°]>
## 1 FJ     Fiji           Oceania       (((-180 -16.55522, -180 -16.06713, -179.7…
## 2 TZ     Tanzania       Africa        (((33.90371 -0.95, 34.07262 -1.05982, 37.…
## 3 EH     Western Sahara Africa        (((-8.66559 27.65643, -8.665124 27.58948,…
## 4 CA     Canada         North America (((-132.71 54.04001, -131.75 54.12, -132.…
## 5 US     United States  North America (((-171.7317 63.78252, -171.1144 63.59219…
world_tbl[3:6]
## Simple feature collection with 177 features and 4 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -180 ymin: -89.9 xmax: 180 ymax: 83.64513
## Geodetic CRS:  WGS 84
## # A tibble: 177 × 5
##    continent     region_un subregion          type                      geometry
##    <chr>         <chr>     <chr>              <chr>           <MULTIPOLYGON [°]>
##  1 Oceania       Oceania   Melanesia          Soverei… (((-180 -16.55522, -180 …
##  2 Africa        Africa    Eastern Africa     Soverei… (((33.90371 -0.95, 34.07…
##  3 Africa        Africa    Northern Africa    Indeter… (((-8.66559 27.65643, -8…
##  4 North America Americas  Northern America   Soverei… (((-132.71 54.04001, -13…
##  5 North America Americas  Northern America   Country  (((-171.7317 63.78252, -…
##  6 Asia          Asia      Central Asia       Soverei… (((87.35997 49.21498, 86…
##  7 Asia          Asia      Central Asia       Soverei… (((55.96819 41.30864, 55…
##  8 Oceania       Oceania   Melanesia          Soverei… (((141.0002 -2.600151, 1…
##  9 Asia          Asia      South-Eastern Asia Soverei… (((104.37 -1.084843, 104…
## 10 South America Americas  South America      Soverei… (((-68.63401 -52.63637, …
## # ℹ 167 more rows
filter(world_tbl, continent == "North America")
## Simple feature collection with 18 features and 10 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -171.7911 ymin: 7.220541 xmax: -12.20855 ymax: 83.64513
## Geodetic CRS:  WGS 84
## # A tibble: 18 × 11
##    iso_a2 name_long  continent region_un subregion type  area_km2    pop lifeExp
##  * <chr>  <chr>      <chr>     <chr>     <chr>     <chr>    <dbl>  <dbl>   <dbl>
##  1 CA     Canada     North Am… Americas  Northern… Sove…   1.00e7 3.55e7    82.0
##  2 US     United St… North Am… Americas  Northern… Coun…   9.51e6 3.19e8    78.8
##  3 HT     Haiti      North Am… Americas  Caribbean Sove…   2.85e4 1.06e7    62.8
##  4 DO     Dominican… North Am… Americas  Caribbean Sove…   4.82e4 1.04e7    73.5
##  5 BS     Bahamas    North Am… Americas  Caribbean Sove…   1.56e4 3.82e5    75.4
##  6 GL     Greenland  North Am… Americas  Northern… Coun…   2.21e6 5.63e4    NA  
##  7 MX     Mexico     North Am… Americas  Central … Sove…   1.97e6 1.24e8    76.8
##  8 PA     Panama     North Am… Americas  Central … Sove…   7.53e4 3.90e6    77.6
##  9 CR     Costa Rica North Am… Americas  Central … Sove…   5.38e4 4.76e6    79.4
## 10 NI     Nicaragua  North Am… Americas  Central … Sove…   1.30e5 6.01e6    74.9
## 11 HN     Honduras   North Am… Americas  Central … Sove…   1.14e5 8.81e6    73.2
## 12 SV     El Salvad… North Am… Americas  Central … Sove…   2.09e4 6.28e6    73.0
## 13 GT     Guatemala  North Am… Americas  Central … Sove…   1.09e5 1.59e7    72.9
## 14 BZ     Belize     North Am… Americas  Central … Sove…   2.20e4 3.52e5    70.0
## 15 PR     Puerto Ri… North Am… Americas  Caribbean Depe…   9.22e3 3.53e6    79.4
## 16 JM     Jamaica    North Am… Americas  Caribbean Sove…   1.25e4 2.86e6    75.7
## 17 CU     Cuba       North Am… Americas  Caribbean Sove…   1.15e5 1.14e7    79.4
## 18 TT     Trinidad … North Am… Americas  Caribbean Sove…   7.74e3 1.35e6    70.4
## # ℹ 2 more variables: gdpPercap <dbl>, geometry <MULTIPOLYGON [°]>
dplyr::select(world_tbl, name_long, pop)
## Simple feature collection with 177 features and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -180 ymin: -89.9 xmax: 180 ymax: 83.64513
## Geodetic CRS:  WGS 84
## # A tibble: 177 × 3
##    name_long              pop                                           geometry
##    <chr>                <dbl>                                 <MULTIPOLYGON [°]>
##  1 Fiji                885806 (((-180 -16.55522, -180 -16.06713, -179.7933 -16.…
##  2 Tanzania          52234869 (((33.90371 -0.95, 34.07262 -1.05982, 37.69869 -3…
##  3 Western Sahara          NA (((-8.66559 27.65643, -8.665124 27.58948, -8.6844…
##  4 Canada            35535348 (((-132.71 54.04001, -131.75 54.12, -132.0495 52.…
##  5 United States    318622525 (((-171.7317 63.78252, -171.1144 63.59219, -170.4…
##  6 Kazakhstan        17288285 (((87.35997 49.21498, 86.59878 48.54918, 85.76823…
##  7 Uzbekistan        30757700 (((55.96819 41.30864, 55.92892 44.99586, 58.50313…
##  8 Papua New Guinea   7755785 (((141.0002 -2.600151, 142.7352 -3.289153, 144.58…
##  9 Indonesia        255131116 (((104.37 -1.084843, 104.5395 -1.782372, 104.8879…
## 10 Argentina         42981515 (((-68.63401 -52.63637, -68.25 -53.1, -67.75 -53.…
## # ℹ 167 more rows
# to get the geometry only
world_tbl[0]
## Simple feature collection with 177 features and 0 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -180 ymin: -89.9 xmax: 180 ymax: 83.64513
## Geodetic CRS:  WGS 84
## # A tibble: 177 × 1
##                                                                         geometry
##                                                               <MULTIPOLYGON [°]>
##  1 (((-180 -16.55522, -180 -16.06713, -179.7933 -16.02088, -179.9174 -16.50178,…
##  2 (((33.90371 -0.95, 34.07262 -1.05982, 37.69869 -3.09699, 37.7669 -3.67712, 3…
##  3 (((-8.66559 27.65643, -8.665124 27.58948, -8.6844 27.39574, -8.687294 25.881…
##  4 (((-132.71 54.04001, -131.75 54.12, -132.0495 52.98462, -131.179 52.18043, -…
##  5 (((-171.7317 63.78252, -171.1144 63.59219, -170.4911 63.69498, -169.6825 63.…
##  6 (((87.35997 49.21498, 86.59878 48.54918, 85.76823 48.45575, 85.72048 47.4529…
##  7 (((55.96819 41.30864, 55.92892 44.99586, 58.50313 45.5868, 58.68999 45.50001…
##  8 (((141.0002 -2.600151, 142.7352 -3.289153, 144.584 -3.861418, 145.2732 -4.37…
##  9 (((104.37 -1.084843, 104.5395 -1.782372, 104.8879 -2.340425, 105.6221 -2.428…
## 10 (((-68.63401 -52.63637, -68.25 -53.1, -67.75 -53.85, -66.45 -54.45, -65.05 -…
## # ℹ 167 more rows
st_geometry(world_tbl)
## Geometry set for 177 features 
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -180 ymin: -89.9 xmax: 180 ymax: 83.64513
## Geodetic CRS:  WGS 84
## First 5 geometries:
## MULTIPOLYGON (((-180 -16.55522, -180 -16.06713,...
## MULTIPOLYGON (((33.90371 -0.95, 34.07262 -1.059...
## MULTIPOLYGON (((-8.66559 27.65643, -8.665124 27...
## MULTIPOLYGON (((-132.71 54.04001, -131.75 54.12...
## MULTIPOLYGON (((-171.7317 63.78252, -171.1144 6...
# drop geometry
st_drop_geometry(world_tbl)
## # A tibble: 177 × 10
##    iso_a2 name_long continent region_un subregion type  area_km2     pop lifeExp
##  * <chr>  <chr>     <chr>     <chr>     <chr>     <chr>    <dbl>   <dbl>   <dbl>
##  1 FJ     Fiji      Oceania   Oceania   Melanesia Sove…   1.93e4  8.86e5    70.0
##  2 TZ     Tanzania  Africa    Africa    Eastern … Sove…   9.33e5  5.22e7    64.2
##  3 EH     Western … Africa    Africa    Northern… Inde…   9.63e4 NA         NA  
##  4 CA     Canada    North Am… Americas  Northern… Sove…   1.00e7  3.55e7    82.0
##  5 US     United S… North Am… Americas  Northern… Coun…   9.51e6  3.19e8    78.8
##  6 KZ     Kazakhst… Asia      Asia      Central … Sove…   2.73e6  1.73e7    71.6
##  7 UZ     Uzbekist… Asia      Asia      Central … Sove…   4.61e5  3.08e7    71.0
##  8 PG     Papua Ne… Oceania   Oceania   Melanesia Sove…   4.65e5  7.76e6    65.2
##  9 ID     Indonesia Asia      Asia      South-Ea… Sove…   1.82e6  2.55e8    68.9
## 10 AR     Argentina South Am… Americas  South Am… Sove…   2.78e6  4.30e7    76.3
## # ℹ 167 more rows
## # ℹ 1 more variable: gdpPercap <dbl>
# aggregating (note the geometry is aggregated too)
world_tbl |> 
  group_by(continent) |> 
  summarise(total_area = sum(area_km2, na.rm = TRUE)) |> 
  arrange(desc(total_area))
## Simple feature collection with 8 features and 2 fields
## Geometry type: GEOMETRY
## Dimension:     XY
## Bounding box:  xmin: -180 ymin: -89.9 xmax: 180 ymax: 83.64513
## Geodetic CRS:  WGS 84
## # A tibble: 8 × 3
##   continent               total_area                                    geometry
##   <chr>                        <dbl>                              <GEOMETRY [°]>
## 1 Asia                     31252459. MULTIPOLYGON (((104.0108 -1.059212, 103.43…
## 2 Africa                   29946198. MULTIPOLYGON (((36.86623 22, 36.69069 22.2…
## 3 North America            24484309. MULTIPOLYGON (((-82.51044 23.07875, -83.26…
## 4 Europe                   23065219. MULTIPOLYGON (((25.74502 35.18, 25.76921 3…
## 5 South America            17762592. MULTIPOLYGON (((-70.37257 -18.34798, -70.1…
## 6 Antarctica               12335956. MULTIPOLYGON (((180 -89.9, 180 -84.71338, …
## 7 Oceania                   8504489. MULTIPOLYGON (((167.0012 -15.6146, 167.27 …
## 8 Seven seas (open ocean)     11603. POLYGON ((68.8675 -48.83, 68.72 -49.2425, …
# join data frames / sf
coffee_data
## # A tibble: 47 × 3
##    name_long                coffee_production_2016 coffee_production_2017
##    <chr>                                     <int>                  <int>
##  1 Angola                                       NA                     NA
##  2 Bolivia                                       3                      4
##  3 Brazil                                     3277                   2786
##  4 Burundi                                      37                     38
##  5 Cameroon                                      8                      6
##  6 Central African Republic                     NA                     NA
##  7 Congo, Dem. Rep. of                           4                     12
##  8 Colombia                                   1330                   1169
##  9 Costa Rica                                   28                     32
## 10 Côte d'Ivoire                               114                    130
## # ℹ 37 more rows
world_coffee = left_join(world_tbl, coffee_data)
## Joining with `by = join_by(name_long)`
world_coffee
## Simple feature collection with 177 features and 12 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -180 ymin: -89.9 xmax: 180 ymax: 83.64513
## Geodetic CRS:  WGS 84
## # A tibble: 177 × 13
##    iso_a2 name_long continent region_un subregion type  area_km2     pop lifeExp
##    <chr>  <chr>     <chr>     <chr>     <chr>     <chr>    <dbl>   <dbl>   <dbl>
##  1 FJ     Fiji      Oceania   Oceania   Melanesia Sove…   1.93e4  8.86e5    70.0
##  2 TZ     Tanzania  Africa    Africa    Eastern … Sove…   9.33e5  5.22e7    64.2
##  3 EH     Western … Africa    Africa    Northern… Inde…   9.63e4 NA         NA  
##  4 CA     Canada    North Am… Americas  Northern… Sove…   1.00e7  3.55e7    82.0
##  5 US     United S… North Am… Americas  Northern… Coun…   9.51e6  3.19e8    78.8
##  6 KZ     Kazakhst… Asia      Asia      Central … Sove…   2.73e6  1.73e7    71.6
##  7 UZ     Uzbekist… Asia      Asia      Central … Sove…   4.61e5  3.08e7    71.0
##  8 PG     Papua Ne… Oceania   Oceania   Melanesia Sove…   4.65e5  7.76e6    65.2
##  9 ID     Indonesia Asia      Asia      South-Ea… Sove…   1.82e6  2.55e8    68.9
## 10 AR     Argentina South Am… Americas  South Am… Sove…   2.78e6  4.30e7    76.3
## # ℹ 167 more rows
## # ℹ 4 more variables: gdpPercap <dbl>, geometry <MULTIPOLYGON [°]>,
## #   coffee_production_2016 <int>, coffee_production_2017 <int>
# making simple maps are easy
plot(world_tbl)
## Warning: plotting the first 9 out of 10 attributes; use max.plot = 10 to plot
## all

plot(world_tbl[c(3, 7:9)])

plot(world_tbl["pop"])

plot(world_coffee["coffee_production_2017"])

world_asia = world[world$continent == "Asia", ]
# almost all functions in sf start with st_
asia = st_union(world_asia)

plot(world_tbl["pop"], reset = FALSE)
plot(asia, add = TRUE, col = "red")

world_proj = st_transform(world, crs = "+proj=eck4")
# crs = 4326 (lat/long)
plot(world_proj["continent"], reset = F, main = "", key.pos = NULL)
g = st_graticule()
g = st_transform(g, crs = "+proj=eck4")

plot(g$geometry, add = TRUE, col = "lightgrey")

# ggplot2
ggplot(world_tbl) +
  geom_sf(aes(fill = log10(pop))) 

ggplot(world_proj) +
  geom_sf(data = g, color = "lightgrey") +
  geom_sf(aes(fill = continent)) 

Spatial data operations

Spatial subsetting is the process of taking a spatial object and returning a new object containing only features that relate in space to another object.

spData::nz
## Simple feature collection with 16 features and 6 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 1090144 ymin: 4748537 xmax: 2089533 ymax: 6191874
## Projected CRS: NZGD2000 / New Zealand Transverse Mercator 2000
## First 10 features:
##                 Name Island Land_area Population Median_income Sex_ratio
## 1          Northland  North 12500.561     175500         23400 0.9424532
## 2           Auckland  North  4941.573    1657200         29600 0.9442858
## 3            Waikato  North 23900.036     460100         27900 0.9520500
## 4      Bay of Plenty  North 12071.145     299900         26200 0.9280391
## 5           Gisborne  North  8385.827      48500         24400 0.9349734
## 6        Hawke's Bay  North 14137.524     164000         26100 0.9238375
## 7           Taranaki  North  7254.480     118000         29100 0.9569363
## 8  Manawatu-Wanganui  North 22220.608     234500         25000 0.9387734
## 9         Wellington  North  8048.553     513900         32700 0.9335524
## 10        West Coast  South 23245.456      32400         26900 1.0139072
##                              geom
## 1  MULTIPOLYGON (((1745493 600...
## 2  MULTIPOLYGON (((1803822 590...
## 3  MULTIPOLYGON (((1860345 585...
## 4  MULTIPOLYGON (((2049387 583...
## 5  MULTIPOLYGON (((2024489 567...
## 6  MULTIPOLYGON (((2024489 567...
## 7  MULTIPOLYGON (((1740438 571...
## 8  MULTIPOLYGON (((1866732 566...
## 9  MULTIPOLYGON (((1881590 548...
## 10 MULTIPOLYGON (((1557042 531...
spData::nz_height
## Simple feature collection with 101 features and 2 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: 1204143 ymin: 5048309 xmax: 1822492 ymax: 5650492
## Projected CRS: NZGD2000 / New Zealand Transverse Mercator 2000
## First 10 features:
##    t50_fid elevation                geometry
## 1  2353944      2723 POINT (1204143 5049971)
## 2  2354404      2820 POINT (1234725 5048309)
## 3  2354405      2830 POINT (1235915 5048745)
## 4  2369113      3033 POINT (1259702 5076570)
## 5  2362630      2749 POINT (1378170 5158491)
## 6  2362814      2822 POINT (1389460 5168749)
## 7  2362817      2778 POINT (1390166 5169466)
## 8  2363991      3004 POINT (1372357 5172729)
## 9  2363993      3114 POINT (1372062 5173236)
## 10 2363994      2882 POINT (1372810 5173419)
ggplot() +
  geom_sf(data = nz) +
  geom_sf(data = nz_height, alpha = 0.6, color = "blue")

canterbury = filter(nz, Name == "Canterbury")
canterbury
## Simple feature collection with 1 feature and 6 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 1325039 ymin: 5004766 xmax: 1686902 ymax: 5360239
## Projected CRS: NZGD2000 / New Zealand Transverse Mercator 2000
##         Name Island Land_area Population Median_income Sex_ratio
## 1 Canterbury  South   44504.5     612000         30100 0.9753265
##                             geom
## 1 MULTIPOLYGON (((1686902 535...
canterbury_height = nz_height[canterbury, ]
canterbury_height
## Simple feature collection with 70 features and 2 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: 1365809 ymin: 5158491 xmax: 1654899 ymax: 5350463
## Projected CRS: NZGD2000 / New Zealand Transverse Mercator 2000
## First 10 features:
##    t50_fid elevation                geometry
## 5  2362630      2749 POINT (1378170 5158491)
## 6  2362814      2822 POINT (1389460 5168749)
## 7  2362817      2778 POINT (1390166 5169466)
## 8  2363991      3004 POINT (1372357 5172729)
## 9  2363993      3114 POINT (1372062 5173236)
## 10 2363994      2882 POINT (1372810 5173419)
## 11 2363995      2796 POINT (1372579 5173989)
## 13 2363997      3070 POINT (1373796 5174144)
## 14 2363998      3061 POINT (1373955 5174231)
## 15 2363999      3077 POINT (1373984 5175228)
canterbury_height2 = st_filter(x = nz_height, y = canterbury, .predicate = st_intersects)
canterbury_height2
## Simple feature collection with 70 features and 2 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: 1365809 ymin: 5158491 xmax: 1654899 ymax: 5350463
## Projected CRS: NZGD2000 / New Zealand Transverse Mercator 2000
## First 10 features:
##    t50_fid elevation                geometry
## 1  2362630      2749 POINT (1378170 5158491)
## 2  2362814      2822 POINT (1389460 5168749)
## 3  2362817      2778 POINT (1390166 5169466)
## 4  2363991      3004 POINT (1372357 5172729)
## 5  2363993      3114 POINT (1372062 5173236)
## 6  2363994      2882 POINT (1372810 5173419)
## 7  2363995      2796 POINT (1372579 5173989)
## 8  2363997      3070 POINT (1373796 5174144)
## 9  2363998      3061 POINT (1373955 5174231)
## 10 2363999      3077 POINT (1373984 5175228)
ggplot() +
  geom_sf(data = nz) +
  geom_sf(data = canterbury, fill = "orange") +
  geom_sf(data = canterbury_height2, alpha = 0.6, color = "blue") 

How to convert an existing data frame with latitude and longitude to a sf object?

set.seed(2018) # set seed for reproducibility
(bb = st_bbox(world)) 
##       xmin       ymin       xmax       ymax 
## -180.00000  -89.90000  179.99999   83.64513
random_df = data.frame(
  x = runif(n = 10, min = bb[1], max = bb[3]),
  y = runif(n = 10, min = bb[2], max = bb[4])
)
random_df
##              x          y
## 1   -58.984754 -21.242775
## 2   -13.059627  25.427439
## 3  -158.189262  80.540804
## 4  -108.923901  27.800978
## 5    -9.246895  49.982200
## 6   -71.622506  20.158830
## 7    38.433183 -42.915005
## 8  -133.195644   6.053818
## 9   165.115688  38.168615
## 10   16.865812  53.864852
random_points = random_df |> 
  st_as_sf(coords = c("x", "y"), remove = FALSE) |> # set coordinates
  st_set_crs("EPSG:4326") # set geographic CRS
# https://epsg.io/4326 
random_points
## Simple feature collection with 10 features and 2 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -158.1893 ymin: -42.91501 xmax: 165.1157 ymax: 80.5408
## Geodetic CRS:  WGS 84
##              x          y                    geometry
## 1   -58.984754 -21.242775 POINT (-58.98475 -21.24278)
## 2   -13.059627  25.427439  POINT (-13.05963 25.42744)
## 3  -158.189262  80.540804   POINT (-158.1893 80.5408)
## 4  -108.923901  27.800978  POINT (-108.9239 27.80098)
## 5    -9.246895  49.982200   POINT (-9.246895 49.9822)
## 6   -71.622506  20.158830  POINT (-71.62251 20.15883)
## 7    38.433183 -42.915005  POINT (38.43318 -42.91501)
## 8  -133.195644   6.053818  POINT (-133.1956 6.053818)
## 9   165.115688  38.168615   POINT (165.1157 38.16862)
## 10   16.865812  53.864852   POINT (16.86581 53.86485)

How to find which country a point is within?

random_joined = st_join(random_points, world["name_long"])
random_joined
## Simple feature collection with 10 features and 3 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -158.1893 ymin: -42.91501 xmax: 165.1157 ymax: 80.5408
## Geodetic CRS:  WGS 84
##              x          y name_long                    geometry
## 1   -58.984754 -21.242775  Paraguay POINT (-58.98475 -21.24278)
## 2   -13.059627  25.427439   Morocco  POINT (-13.05963 25.42744)
## 3  -158.189262  80.540804      <NA>   POINT (-158.1893 80.5408)
## 4  -108.923901  27.800978    Mexico  POINT (-108.9239 27.80098)
## 5    -9.246895  49.982200      <NA>   POINT (-9.246895 49.9822)
## 6   -71.622506  20.158830      <NA>  POINT (-71.62251 20.15883)
## 7    38.433183 -42.915005      <NA>  POINT (38.43318 -42.91501)
## 8  -133.195644   6.053818      <NA>  POINT (-133.1956 6.053818)
## 9   165.115688  38.168615      <NA>   POINT (165.1157 38.16862)
## 10   16.865812  53.864852    Poland   POINT (16.86581 53.86485)

Other useful functions in sf (incomplete list)

  • st_combine() or st_union(): combine several feature geometries into one
  • st_centroid(): get the centroid of each feature; sometimes the geographic centroid falls outside the boundaries of their parent objects (think of a doughnut). In such cases st_point_on_surface() can be used to guarantee the point will be in the parent object
  • st_crs() to check which crs an object has
  • st_set_crs(): specify CRS
  • st_proj_info(type = 'datum') and st_proj_info(type = 'proj'): projection information
  • st_area(), st_length(), and st_distance(): calculate geometric measurements, e.g. cn = st_area(world[world$name_long == 'China',]). If CRS is set, the results comes with units; to convert units, try units::set_units(cn, 'km^2')
  • st_intersects(): tests whether two sf objects intersect, the opposite is st_disjoint(); other similar functions st_within(), st_touches(), st_is_within_distance(), st_contains(), etc.
  • st_cast(): convert different types of geometries
  • st_buffer(): create a buffer around, better to use it on projected crs

Raster data

The spatial raster data model represents the world with equally spaced and consistent sized grid of cells (i.e. pixels). The raster data model usually consists of a raster header and a matrix (with rows and columns) representing values of equally spaced cells (only one value per cell). The header defines the coordinate reference system, the extent, and the origin. This matrix representation avoids storing explicitly the coordinates for the four corner points (in fact it only stores one coordinate, namely the origin) of each cell corner as would be the case for rectangular vector polygons. Therefore, raster can process data faster and more efficient than geographic vector data processing.

library(terra)
## terra 1.7.39
## 
## Attaching package: 'terra'
## The following object is masked from 'package:tidyr':
## 
##     extract
elev = rast(nrows = 6, ncols = 6, resolution = 0.5, 
            xmin = -1.5, xmax = 1.5, ymin = -1.5, ymax = 1.5,
            vals = 1:36)
elev
## class       : SpatRaster 
## dimensions  : 6, 6, 1  (nrow, ncol, nlyr)
## resolution  : 0.5, 0.5  (x, y)
## extent      : -1.5, 1.5, -1.5, 1.5  (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat WGS 84 
## source(s)   : memory
## name        : lyr.1 
## min value   :     1 
## max value   :    36
plot(elev)

summary(elev)
##      lyr.1      
##  Min.   : 1.00  
##  1st Qu.: 9.75  
##  Median :18.50  
##  Mean   :18.50  
##  3rd Qu.:27.25  
##  Max.   :36.00
global(elev, sd) # global operation
##             sd
## lyr.1 10.53565
hist(elev)

elev + elev
## class       : SpatRaster 
## dimensions  : 6, 6, 1  (nrow, ncol, nlyr)
## resolution  : 0.5, 0.5  (x, y)
## extent      : -1.5, 1.5, -1.5, 1.5  (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat WGS 84 
## source(s)   : memory
## name        : lyr.1 
## min value   :     2 
## max value   :    72
elev ^ 2
## class       : SpatRaster 
## dimensions  : 6, 6, 1  (nrow, ncol, nlyr)
## resolution  : 0.5, 0.5  (x, y)
## extent      : -1.5, 1.5, -1.5, 1.5  (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat WGS 84 
## source(s)   : memory
## name        : lyr.1 
## min value   :     1 
## max value   :  1296
elev < 20
## class       : SpatRaster 
## dimensions  : 6, 6, 1  (nrow, ncol, nlyr)
## resolution  : 0.5, 0.5  (x, y)
## extent      : -1.5, 1.5, -1.5, 1.5  (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat WGS 84 
## source(s)   : memory
## name        : lyr.1 
## min value   : FALSE 
## max value   :  TRUE
plot(elev < 20)

# extract values
id = cellFromXY(elev, xy = matrix(c(0.1, 0.1), ncol = 2))
elev[id]
##   lyr.1
## 1    16
terra::extract(elev, matrix(c(0.1, 0.1), ncol = 2))
##   lyr.1
## 1    16
rmask = elev
values(rmask) = sample(c(NA, TRUE), 36, replace = TRUE)
plot(rmask)

plot(mask(elev, rmask))

# multiple layers
if(!require("spDataLarge")) 
  install.packages("spDataLarge", 
                   repos = "https://geocompr.r-universe.dev")
## Loading required package: spDataLarge
## 
## Attaching package: 'spDataLarge'
## The following object is masked _by_ '.GlobalEnv':
## 
##     random_points
multi_raster_file = system.file("raster/landsat.tif", package = "spDataLarge")
multi_rast = rast(multi_raster_file)
multi_rast
## class       : SpatRaster 
## dimensions  : 1428, 1128, 4  (nrow, ncol, nlyr)
## resolution  : 30, 30  (x, y)
## extent      : 301905, 335745, 4111245, 4154085  (xmin, xmax, ymin, ymax)
## coord. ref. : WGS 84 / UTM zone 12N (EPSG:32612) 
## source      : landsat.tif 
## names       : landsat_1, landsat_2, landsat_3, landsat_4 
## min values  :      7550,      6404,      5678,      5252 
## max values  :     19071,     22051,     25780,     31961
nlyr(multi_rast)
## [1] 4
hist(multi_rast)
## Warning: [hist] a sample of62% of the cells was used

## Warning: [hist] a sample of62% of the cells was used

## Warning: [hist] a sample of62% of the cells was used

## Warning: [hist] a sample of62% of the cells was used

multi_rast[[2]]
## class       : SpatRaster 
## dimensions  : 1428, 1128, 1  (nrow, ncol, nlyr)
## resolution  : 30, 30  (x, y)
## extent      : 301905, 335745, 4111245, 4154085  (xmin, xmax, ymin, ymax)
## coord. ref. : WGS 84 / UTM zone 12N (EPSG:32612) 
## source      : landsat.tif 
## name        : landsat_2 
## min value   :      6404 
## max value   :     22051
plot(subset(multi_rast, 2))

# focal operation 
r_focal = focal(elev, w = matrix(1, nrow = 3, ncol = 3), fun = min)

# zonal operations
grain_order = c("clay", "silt", "sand")
grain_char = sample(grain_order, 36, replace = TRUE)
grain_fact = factor(grain_char, levels = grain_order)
grain = rast(nrows = 6, ncols = 6, resolution = 0.5, 
             xmin = -1.5, xmax = 1.5, ymin = -1.5, ymax = 1.5,
             vals = grain_fact)
plot(grain)

z = zonal(elev, grain, fun = "mean")
z
##   lyr.1  lyr.1.1
## 1  clay 17.71429
## 2  silt 12.36364
## 3  sand 22.55556
# aggregate & disaggregate & resampling

Reference

Most text and code above came from Geocomputation with R; credits for the authors.