This lecture will cover briefly about how to visualize, analyze, and model geographic data with open source software. We will focus on using the R ecosystem for spatial data analysis. Over the past decade, a wide range of packages based on the free and open source software for geospatial (FOSS4G) have been developed in R to support advanced geospatial statistics, visualization, and modeling. To learn more, one good free book is Geocomputation with R.
Two fundamental geographic data models exist: vector and raster.
The vector data model represents the world using points, lines and
polygons. For example, administrative boundary of counties across all
states in the US. Such data usually have high level of precision and are
mostly used to describe properties of units with discrete borders (e.g.,
human population density of different counties). The geographic vector
data model is based on points located within a coordinate reference
system (CRS); one common example is latitude/longitude coordinates.
Lines and polygons are just collections of points. A key R package to
work with geographic vector data is sf
.
The raster data model divides the surface up into cells of constant
size. Raster datasets are the basis of web-mapping based on aerial
photography and satellite-based remote sensing data. Rasters aggregate
properties to a specific spatial resolution (e.g., 1 km by 1 km). Such
data tends to dominate environmental sciences given their reliance on
remote sensing data (e.g., average temperature over the past 20 years at
the global scale at 50 km by 50 km resolution). Key R packages to work
with rasters are raster
and its replacement terra
.
CRS, either geographic or projected, defines how the spatial elements of the data relate to the surface of the Earth. Geographic coordinate systems identify any location on the Earth’s surface using two values — longitude and latitude. The surface of the Earth in geographic coordinate systems is represented by a spherical or ellipsoidal surface.
All projected CRSs are based on a geographic CRS and rely on map projections to convert the 3-d surface of the Earth into Easting and Northing (x and y) values in a projected CRS. Projected CRSs are based on Cartesian coordinates on an implicitly flat surface. They have an origin, x and y axes, and a linear unit of measurement such as meters.
This transition cannot be done without adding some deformations. Therefore, some properties of the Earth’s surface are distorted in this process, such as area, direction, distance, and shape. A projected coordinate system can preserve only one or two of those properties.
A quick summary of different projections, their types, properties, and suitability can be found at https://www.geo-projections.com/.
sf
and simple features for geographic vector dataSimple Features is a hierarchical data model that represents a wide
range of geometry types. Simple features is an open standard developed
and endorsed by the Open Geospatial Consortium (OGC). 7 out of 18
geometry types are used in most of geographic research and are supported
by the R package sf
(POINT
,
LINESTRING
, POLYGON
, MULTIPOINT
,
MULTILINESTRING
, MULTIPOLYGON
and
GEOMETRYCOLLECTION
).
POINT
zero-dimensional geometry containing a single
pointLINESTRING
sequence of points connected by straight,
non-self intersecting line pieces; one-dimensional geometryPOLYGON
geometry with a positive area
(two-dimensional); sequence of points form a closed, non-self
intersecting ring; the first ring denotes the exterior ring, zero or
more subsequent rings denote holes in this exterior ringMULTIPOINT
set of points; a MULTIPOINT is simple if no
two Points in the MULTIPOINT are equalMULTILINESTRING
set of linestringsMULTIPOLYGON
set of polygonsGEOMETRYCOLLECTION
set of geometries of any type except
GEOMETRYCOLLECTIONsf
includes functions from three previous widely used R
package: sp
for data classes, rgdal
for data
read and write via interfaces to GDAL and PROJ, and rgeos
for spatial operation via an interface to GEOS. Therefore, I recommend
you to learn sf
instead. In fact, rgdal
and
rgeos
will be discontinued by the end of 2023. The best
place to start is the vignettes of the
sf
package.
Another neat feature of sf
is that the simple feature
objects are saved as data frames and work nicely with the
tidyverse
packages. Therefore, we can use the verbs from
packages such as dplyr
to filter, select, aggregate
(group_by), summarize, etc. on the simple features, making working with
geographic vector data in R enjoyable.
Almost all functions of sf
start with st_
(stands for spatial type).
To install sf
, follow the instruction from its webpage.
if(!require("xfun")) install.packages("xfun")
## Loading required package: xfun
##
## Attaching package: 'xfun'
## The following objects are masked from 'package:base':
##
## attr, isFALSE
xfun::pkg_attach2(c("sf", "spData", "tidyverse"))
## Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.2 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.3 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
nc <- st_read(system.file("shape/nc.shp", package="sf"))
## Reading layer `nc' from data source `/Users/dli/R/sf/shape/nc.shp' using driver `ESRI Shapefile'
## Simple feature collection with 100 features and 14 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
## Geodetic CRS: NAD27
class(nc)
## [1] "sf" "data.frame"
nc
## Simple feature collection with 100 features and 14 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
## Geodetic CRS: NAD27
## First 10 features:
## AREA PERIMETER CNTY_ CNTY_ID NAME FIPS FIPSNO CRESS_ID BIR74 SID74
## 1 0.114 1.442 1825 1825 Ashe 37009 37009 5 1091 1
## 2 0.061 1.231 1827 1827 Alleghany 37005 37005 3 487 0
## 3 0.143 1.630 1828 1828 Surry 37171 37171 86 3188 5
## 4 0.070 2.968 1831 1831 Currituck 37053 37053 27 508 1
## 5 0.153 2.206 1832 1832 Northampton 37131 37131 66 1421 9
## 6 0.097 1.670 1833 1833 Hertford 37091 37091 46 1452 7
## 7 0.062 1.547 1834 1834 Camden 37029 37029 15 286 0
## 8 0.091 1.284 1835 1835 Gates 37073 37073 37 420 0
## 9 0.118 1.421 1836 1836 Warren 37185 37185 93 968 4
## 10 0.124 1.428 1837 1837 Stokes 37169 37169 85 1612 1
## NWBIR74 BIR79 SID79 NWBIR79 geometry
## 1 10 1364 0 19 MULTIPOLYGON (((-81.47276 3...
## 2 10 542 3 12 MULTIPOLYGON (((-81.23989 3...
## 3 208 3616 6 260 MULTIPOLYGON (((-80.45634 3...
## 4 123 830 2 145 MULTIPOLYGON (((-76.00897 3...
## 5 1066 1606 3 1197 MULTIPOLYGON (((-77.21767 3...
## 6 954 1838 5 1237 MULTIPOLYGON (((-76.74506 3...
## 7 115 350 2 139 MULTIPOLYGON (((-76.00897 3...
## 8 254 594 2 371 MULTIPOLYGON (((-76.56251 3...
## 9 748 1190 2 844 MULTIPOLYGON (((-78.30876 3...
## 10 160 2038 5 176 MULTIPOLYGON (((-80.02567 3...
print(nc[9:15], n = 3)
world_tbl = read_sf(system.file("shapes/world.shp", package = "spData"))
world_tbl
## Simple feature collection with 177 features and 10 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -180 ymin: -89.9 xmax: 180 ymax: 83.64513
## Geodetic CRS: WGS 84
## # A tibble: 177 × 11
## iso_a2 name_long continent region_un subregion type area_km2 pop lifeExp
## <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 FJ Fiji Oceania Oceania Melanesia Sove… 1.93e4 8.86e5 70.0
## 2 TZ Tanzania Africa Africa Eastern … Sove… 9.33e5 5.22e7 64.2
## 3 EH Western … Africa Africa Northern… Inde… 9.63e4 NA NA
## 4 CA Canada North Am… Americas Northern… Sove… 1.00e7 3.55e7 82.0
## 5 US United S… North Am… Americas Northern… Coun… 9.51e6 3.19e8 78.8
## 6 KZ Kazakhst… Asia Asia Central … Sove… 2.73e6 1.73e7 71.6
## 7 UZ Uzbekist… Asia Asia Central … Sove… 4.61e5 3.08e7 71.0
## 8 PG Papua Ne… Oceania Oceania Melanesia Sove… 4.65e5 7.76e6 65.2
## 9 ID Indonesia Asia Asia South-Ea… Sove… 1.82e6 2.55e8 68.9
## 10 AR Argentina South Am… Americas South Am… Sove… 2.78e6 4.30e7 76.3
## # ℹ 167 more rows
## # ℹ 2 more variables: gdpPercap <dbl>, geometry <MULTIPOLYGON [°]>
# subsetting sf is just as subsetting data frames
# note that the geometry column is sticky
world_tbl[1:5, 1:3]
## Simple feature collection with 5 features and 3 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -180 ymin: -18.28799 xmax: 180 ymax: 83.23324
## Geodetic CRS: WGS 84
## # A tibble: 5 × 4
## iso_a2 name_long continent geometry
## <chr> <chr> <chr> <MULTIPOLYGON [°]>
## 1 FJ Fiji Oceania (((-180 -16.55522, -180 -16.06713, -179.7…
## 2 TZ Tanzania Africa (((33.90371 -0.95, 34.07262 -1.05982, 37.…
## 3 EH Western Sahara Africa (((-8.66559 27.65643, -8.665124 27.58948,…
## 4 CA Canada North America (((-132.71 54.04001, -131.75 54.12, -132.…
## 5 US United States North America (((-171.7317 63.78252, -171.1144 63.59219…
world_tbl[3:6]
## Simple feature collection with 177 features and 4 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -180 ymin: -89.9 xmax: 180 ymax: 83.64513
## Geodetic CRS: WGS 84
## # A tibble: 177 × 5
## continent region_un subregion type geometry
## <chr> <chr> <chr> <chr> <MULTIPOLYGON [°]>
## 1 Oceania Oceania Melanesia Soverei… (((-180 -16.55522, -180 …
## 2 Africa Africa Eastern Africa Soverei… (((33.90371 -0.95, 34.07…
## 3 Africa Africa Northern Africa Indeter… (((-8.66559 27.65643, -8…
## 4 North America Americas Northern America Soverei… (((-132.71 54.04001, -13…
## 5 North America Americas Northern America Country (((-171.7317 63.78252, -…
## 6 Asia Asia Central Asia Soverei… (((87.35997 49.21498, 86…
## 7 Asia Asia Central Asia Soverei… (((55.96819 41.30864, 55…
## 8 Oceania Oceania Melanesia Soverei… (((141.0002 -2.600151, 1…
## 9 Asia Asia South-Eastern Asia Soverei… (((104.37 -1.084843, 104…
## 10 South America Americas South America Soverei… (((-68.63401 -52.63637, …
## # ℹ 167 more rows
filter(world_tbl, continent == "North America")
## Simple feature collection with 18 features and 10 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -171.7911 ymin: 7.220541 xmax: -12.20855 ymax: 83.64513
## Geodetic CRS: WGS 84
## # A tibble: 18 × 11
## iso_a2 name_long continent region_un subregion type area_km2 pop lifeExp
## * <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 CA Canada North Am… Americas Northern… Sove… 1.00e7 3.55e7 82.0
## 2 US United St… North Am… Americas Northern… Coun… 9.51e6 3.19e8 78.8
## 3 HT Haiti North Am… Americas Caribbean Sove… 2.85e4 1.06e7 62.8
## 4 DO Dominican… North Am… Americas Caribbean Sove… 4.82e4 1.04e7 73.5
## 5 BS Bahamas North Am… Americas Caribbean Sove… 1.56e4 3.82e5 75.4
## 6 GL Greenland North Am… Americas Northern… Coun… 2.21e6 5.63e4 NA
## 7 MX Mexico North Am… Americas Central … Sove… 1.97e6 1.24e8 76.8
## 8 PA Panama North Am… Americas Central … Sove… 7.53e4 3.90e6 77.6
## 9 CR Costa Rica North Am… Americas Central … Sove… 5.38e4 4.76e6 79.4
## 10 NI Nicaragua North Am… Americas Central … Sove… 1.30e5 6.01e6 74.9
## 11 HN Honduras North Am… Americas Central … Sove… 1.14e5 8.81e6 73.2
## 12 SV El Salvad… North Am… Americas Central … Sove… 2.09e4 6.28e6 73.0
## 13 GT Guatemala North Am… Americas Central … Sove… 1.09e5 1.59e7 72.9
## 14 BZ Belize North Am… Americas Central … Sove… 2.20e4 3.52e5 70.0
## 15 PR Puerto Ri… North Am… Americas Caribbean Depe… 9.22e3 3.53e6 79.4
## 16 JM Jamaica North Am… Americas Caribbean Sove… 1.25e4 2.86e6 75.7
## 17 CU Cuba North Am… Americas Caribbean Sove… 1.15e5 1.14e7 79.4
## 18 TT Trinidad … North Am… Americas Caribbean Sove… 7.74e3 1.35e6 70.4
## # ℹ 2 more variables: gdpPercap <dbl>, geometry <MULTIPOLYGON [°]>
dplyr::select(world_tbl, name_long, pop)
## Simple feature collection with 177 features and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -180 ymin: -89.9 xmax: 180 ymax: 83.64513
## Geodetic CRS: WGS 84
## # A tibble: 177 × 3
## name_long pop geometry
## <chr> <dbl> <MULTIPOLYGON [°]>
## 1 Fiji 885806 (((-180 -16.55522, -180 -16.06713, -179.7933 -16.…
## 2 Tanzania 52234869 (((33.90371 -0.95, 34.07262 -1.05982, 37.69869 -3…
## 3 Western Sahara NA (((-8.66559 27.65643, -8.665124 27.58948, -8.6844…
## 4 Canada 35535348 (((-132.71 54.04001, -131.75 54.12, -132.0495 52.…
## 5 United States 318622525 (((-171.7317 63.78252, -171.1144 63.59219, -170.4…
## 6 Kazakhstan 17288285 (((87.35997 49.21498, 86.59878 48.54918, 85.76823…
## 7 Uzbekistan 30757700 (((55.96819 41.30864, 55.92892 44.99586, 58.50313…
## 8 Papua New Guinea 7755785 (((141.0002 -2.600151, 142.7352 -3.289153, 144.58…
## 9 Indonesia 255131116 (((104.37 -1.084843, 104.5395 -1.782372, 104.8879…
## 10 Argentina 42981515 (((-68.63401 -52.63637, -68.25 -53.1, -67.75 -53.…
## # ℹ 167 more rows
# to get the geometry only
world_tbl[0]
## Simple feature collection with 177 features and 0 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -180 ymin: -89.9 xmax: 180 ymax: 83.64513
## Geodetic CRS: WGS 84
## # A tibble: 177 × 1
## geometry
## <MULTIPOLYGON [°]>
## 1 (((-180 -16.55522, -180 -16.06713, -179.7933 -16.02088, -179.9174 -16.50178,…
## 2 (((33.90371 -0.95, 34.07262 -1.05982, 37.69869 -3.09699, 37.7669 -3.67712, 3…
## 3 (((-8.66559 27.65643, -8.665124 27.58948, -8.6844 27.39574, -8.687294 25.881…
## 4 (((-132.71 54.04001, -131.75 54.12, -132.0495 52.98462, -131.179 52.18043, -…
## 5 (((-171.7317 63.78252, -171.1144 63.59219, -170.4911 63.69498, -169.6825 63.…
## 6 (((87.35997 49.21498, 86.59878 48.54918, 85.76823 48.45575, 85.72048 47.4529…
## 7 (((55.96819 41.30864, 55.92892 44.99586, 58.50313 45.5868, 58.68999 45.50001…
## 8 (((141.0002 -2.600151, 142.7352 -3.289153, 144.584 -3.861418, 145.2732 -4.37…
## 9 (((104.37 -1.084843, 104.5395 -1.782372, 104.8879 -2.340425, 105.6221 -2.428…
## 10 (((-68.63401 -52.63637, -68.25 -53.1, -67.75 -53.85, -66.45 -54.45, -65.05 -…
## # ℹ 167 more rows
st_geometry(world_tbl)
## Geometry set for 177 features
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -180 ymin: -89.9 xmax: 180 ymax: 83.64513
## Geodetic CRS: WGS 84
## First 5 geometries:
## MULTIPOLYGON (((-180 -16.55522, -180 -16.06713,...
## MULTIPOLYGON (((33.90371 -0.95, 34.07262 -1.059...
## MULTIPOLYGON (((-8.66559 27.65643, -8.665124 27...
## MULTIPOLYGON (((-132.71 54.04001, -131.75 54.12...
## MULTIPOLYGON (((-171.7317 63.78252, -171.1144 6...
# drop geometry
st_drop_geometry(world_tbl)
## # A tibble: 177 × 10
## iso_a2 name_long continent region_un subregion type area_km2 pop lifeExp
## * <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 FJ Fiji Oceania Oceania Melanesia Sove… 1.93e4 8.86e5 70.0
## 2 TZ Tanzania Africa Africa Eastern … Sove… 9.33e5 5.22e7 64.2
## 3 EH Western … Africa Africa Northern… Inde… 9.63e4 NA NA
## 4 CA Canada North Am… Americas Northern… Sove… 1.00e7 3.55e7 82.0
## 5 US United S… North Am… Americas Northern… Coun… 9.51e6 3.19e8 78.8
## 6 KZ Kazakhst… Asia Asia Central … Sove… 2.73e6 1.73e7 71.6
## 7 UZ Uzbekist… Asia Asia Central … Sove… 4.61e5 3.08e7 71.0
## 8 PG Papua Ne… Oceania Oceania Melanesia Sove… 4.65e5 7.76e6 65.2
## 9 ID Indonesia Asia Asia South-Ea… Sove… 1.82e6 2.55e8 68.9
## 10 AR Argentina South Am… Americas South Am… Sove… 2.78e6 4.30e7 76.3
## # ℹ 167 more rows
## # ℹ 1 more variable: gdpPercap <dbl>
# aggregating (note the geometry is aggregated too)
world_tbl |>
group_by(continent) |>
summarise(total_area = sum(area_km2, na.rm = TRUE)) |>
arrange(desc(total_area))
## Simple feature collection with 8 features and 2 fields
## Geometry type: GEOMETRY
## Dimension: XY
## Bounding box: xmin: -180 ymin: -89.9 xmax: 180 ymax: 83.64513
## Geodetic CRS: WGS 84
## # A tibble: 8 × 3
## continent total_area geometry
## <chr> <dbl> <GEOMETRY [°]>
## 1 Asia 31252459. MULTIPOLYGON (((104.0108 -1.059212, 103.43…
## 2 Africa 29946198. MULTIPOLYGON (((36.86623 22, 36.69069 22.2…
## 3 North America 24484309. MULTIPOLYGON (((-82.51044 23.07875, -83.26…
## 4 Europe 23065219. MULTIPOLYGON (((25.74502 35.18, 25.76921 3…
## 5 South America 17762592. MULTIPOLYGON (((-70.37257 -18.34798, -70.1…
## 6 Antarctica 12335956. MULTIPOLYGON (((180 -89.9, 180 -84.71338, …
## 7 Oceania 8504489. MULTIPOLYGON (((167.0012 -15.6146, 167.27 …
## 8 Seven seas (open ocean) 11603. POLYGON ((68.8675 -48.83, 68.72 -49.2425, …
# join data frames / sf
coffee_data
## # A tibble: 47 × 3
## name_long coffee_production_2016 coffee_production_2017
## <chr> <int> <int>
## 1 Angola NA NA
## 2 Bolivia 3 4
## 3 Brazil 3277 2786
## 4 Burundi 37 38
## 5 Cameroon 8 6
## 6 Central African Republic NA NA
## 7 Congo, Dem. Rep. of 4 12
## 8 Colombia 1330 1169
## 9 Costa Rica 28 32
## 10 Côte d'Ivoire 114 130
## # ℹ 37 more rows
world_coffee = left_join(world_tbl, coffee_data)
## Joining with `by = join_by(name_long)`
world_coffee
## Simple feature collection with 177 features and 12 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -180 ymin: -89.9 xmax: 180 ymax: 83.64513
## Geodetic CRS: WGS 84
## # A tibble: 177 × 13
## iso_a2 name_long continent region_un subregion type area_km2 pop lifeExp
## <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 FJ Fiji Oceania Oceania Melanesia Sove… 1.93e4 8.86e5 70.0
## 2 TZ Tanzania Africa Africa Eastern … Sove… 9.33e5 5.22e7 64.2
## 3 EH Western … Africa Africa Northern… Inde… 9.63e4 NA NA
## 4 CA Canada North Am… Americas Northern… Sove… 1.00e7 3.55e7 82.0
## 5 US United S… North Am… Americas Northern… Coun… 9.51e6 3.19e8 78.8
## 6 KZ Kazakhst… Asia Asia Central … Sove… 2.73e6 1.73e7 71.6
## 7 UZ Uzbekist… Asia Asia Central … Sove… 4.61e5 3.08e7 71.0
## 8 PG Papua Ne… Oceania Oceania Melanesia Sove… 4.65e5 7.76e6 65.2
## 9 ID Indonesia Asia Asia South-Ea… Sove… 1.82e6 2.55e8 68.9
## 10 AR Argentina South Am… Americas South Am… Sove… 2.78e6 4.30e7 76.3
## # ℹ 167 more rows
## # ℹ 4 more variables: gdpPercap <dbl>, geometry <MULTIPOLYGON [°]>,
## # coffee_production_2016 <int>, coffee_production_2017 <int>
# making simple maps are easy
plot(world_tbl)
## Warning: plotting the first 9 out of 10 attributes; use max.plot = 10 to plot
## all
plot(world_tbl[c(3, 7:9)])
plot(world_tbl["pop"])
plot(world_coffee["coffee_production_2017"])
world_asia = world[world$continent == "Asia", ]
# almost all functions in sf start with st_
asia = st_union(world_asia)
plot(world_tbl["pop"], reset = FALSE)
plot(asia, add = TRUE, col = "red")
world_proj = st_transform(world, crs = "+proj=eck4")
# crs = 4326 (lat/long)
plot(world_proj["continent"], reset = F, main = "", key.pos = NULL)
g = st_graticule()
g = st_transform(g, crs = "+proj=eck4")
plot(g$geometry, add = TRUE, col = "lightgrey")
# ggplot2
ggplot(world_tbl) +
geom_sf(aes(fill = log10(pop)))
ggplot(world_proj) +
geom_sf(data = g, color = "lightgrey") +
geom_sf(aes(fill = continent))
Spatial subsetting is the process of taking a spatial object and returning a new object containing only features that relate in space to another object.
spData::nz
## Simple feature collection with 16 features and 6 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 1090144 ymin: 4748537 xmax: 2089533 ymax: 6191874
## Projected CRS: NZGD2000 / New Zealand Transverse Mercator 2000
## First 10 features:
## Name Island Land_area Population Median_income Sex_ratio
## 1 Northland North 12500.561 175500 23400 0.9424532
## 2 Auckland North 4941.573 1657200 29600 0.9442858
## 3 Waikato North 23900.036 460100 27900 0.9520500
## 4 Bay of Plenty North 12071.145 299900 26200 0.9280391
## 5 Gisborne North 8385.827 48500 24400 0.9349734
## 6 Hawke's Bay North 14137.524 164000 26100 0.9238375
## 7 Taranaki North 7254.480 118000 29100 0.9569363
## 8 Manawatu-Wanganui North 22220.608 234500 25000 0.9387734
## 9 Wellington North 8048.553 513900 32700 0.9335524
## 10 West Coast South 23245.456 32400 26900 1.0139072
## geom
## 1 MULTIPOLYGON (((1745493 600...
## 2 MULTIPOLYGON (((1803822 590...
## 3 MULTIPOLYGON (((1860345 585...
## 4 MULTIPOLYGON (((2049387 583...
## 5 MULTIPOLYGON (((2024489 567...
## 6 MULTIPOLYGON (((2024489 567...
## 7 MULTIPOLYGON (((1740438 571...
## 8 MULTIPOLYGON (((1866732 566...
## 9 MULTIPOLYGON (((1881590 548...
## 10 MULTIPOLYGON (((1557042 531...
spData::nz_height
## Simple feature collection with 101 features and 2 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: 1204143 ymin: 5048309 xmax: 1822492 ymax: 5650492
## Projected CRS: NZGD2000 / New Zealand Transverse Mercator 2000
## First 10 features:
## t50_fid elevation geometry
## 1 2353944 2723 POINT (1204143 5049971)
## 2 2354404 2820 POINT (1234725 5048309)
## 3 2354405 2830 POINT (1235915 5048745)
## 4 2369113 3033 POINT (1259702 5076570)
## 5 2362630 2749 POINT (1378170 5158491)
## 6 2362814 2822 POINT (1389460 5168749)
## 7 2362817 2778 POINT (1390166 5169466)
## 8 2363991 3004 POINT (1372357 5172729)
## 9 2363993 3114 POINT (1372062 5173236)
## 10 2363994 2882 POINT (1372810 5173419)
ggplot() +
geom_sf(data = nz) +
geom_sf(data = nz_height, alpha = 0.6, color = "blue")
canterbury = filter(nz, Name == "Canterbury")
canterbury
## Simple feature collection with 1 feature and 6 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 1325039 ymin: 5004766 xmax: 1686902 ymax: 5360239
## Projected CRS: NZGD2000 / New Zealand Transverse Mercator 2000
## Name Island Land_area Population Median_income Sex_ratio
## 1 Canterbury South 44504.5 612000 30100 0.9753265
## geom
## 1 MULTIPOLYGON (((1686902 535...
canterbury_height = nz_height[canterbury, ]
canterbury_height
## Simple feature collection with 70 features and 2 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: 1365809 ymin: 5158491 xmax: 1654899 ymax: 5350463
## Projected CRS: NZGD2000 / New Zealand Transverse Mercator 2000
## First 10 features:
## t50_fid elevation geometry
## 5 2362630 2749 POINT (1378170 5158491)
## 6 2362814 2822 POINT (1389460 5168749)
## 7 2362817 2778 POINT (1390166 5169466)
## 8 2363991 3004 POINT (1372357 5172729)
## 9 2363993 3114 POINT (1372062 5173236)
## 10 2363994 2882 POINT (1372810 5173419)
## 11 2363995 2796 POINT (1372579 5173989)
## 13 2363997 3070 POINT (1373796 5174144)
## 14 2363998 3061 POINT (1373955 5174231)
## 15 2363999 3077 POINT (1373984 5175228)
canterbury_height2 = st_filter(x = nz_height, y = canterbury, .predicate = st_intersects)
canterbury_height2
## Simple feature collection with 70 features and 2 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: 1365809 ymin: 5158491 xmax: 1654899 ymax: 5350463
## Projected CRS: NZGD2000 / New Zealand Transverse Mercator 2000
## First 10 features:
## t50_fid elevation geometry
## 1 2362630 2749 POINT (1378170 5158491)
## 2 2362814 2822 POINT (1389460 5168749)
## 3 2362817 2778 POINT (1390166 5169466)
## 4 2363991 3004 POINT (1372357 5172729)
## 5 2363993 3114 POINT (1372062 5173236)
## 6 2363994 2882 POINT (1372810 5173419)
## 7 2363995 2796 POINT (1372579 5173989)
## 8 2363997 3070 POINT (1373796 5174144)
## 9 2363998 3061 POINT (1373955 5174231)
## 10 2363999 3077 POINT (1373984 5175228)
ggplot() +
geom_sf(data = nz) +
geom_sf(data = canterbury, fill = "orange") +
geom_sf(data = canterbury_height2, alpha = 0.6, color = "blue")
How to convert an existing data frame with latitude and longitude to a sf object?
set.seed(2018) # set seed for reproducibility
(bb = st_bbox(world))
## xmin ymin xmax ymax
## -180.00000 -89.90000 179.99999 83.64513
random_df = data.frame(
x = runif(n = 10, min = bb[1], max = bb[3]),
y = runif(n = 10, min = bb[2], max = bb[4])
)
random_df
## x y
## 1 -58.984754 -21.242775
## 2 -13.059627 25.427439
## 3 -158.189262 80.540804
## 4 -108.923901 27.800978
## 5 -9.246895 49.982200
## 6 -71.622506 20.158830
## 7 38.433183 -42.915005
## 8 -133.195644 6.053818
## 9 165.115688 38.168615
## 10 16.865812 53.864852
random_points = random_df |>
st_as_sf(coords = c("x", "y"), remove = FALSE) |> # set coordinates
st_set_crs("EPSG:4326") # set geographic CRS
# https://epsg.io/4326
random_points
## Simple feature collection with 10 features and 2 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: -158.1893 ymin: -42.91501 xmax: 165.1157 ymax: 80.5408
## Geodetic CRS: WGS 84
## x y geometry
## 1 -58.984754 -21.242775 POINT (-58.98475 -21.24278)
## 2 -13.059627 25.427439 POINT (-13.05963 25.42744)
## 3 -158.189262 80.540804 POINT (-158.1893 80.5408)
## 4 -108.923901 27.800978 POINT (-108.9239 27.80098)
## 5 -9.246895 49.982200 POINT (-9.246895 49.9822)
## 6 -71.622506 20.158830 POINT (-71.62251 20.15883)
## 7 38.433183 -42.915005 POINT (38.43318 -42.91501)
## 8 -133.195644 6.053818 POINT (-133.1956 6.053818)
## 9 165.115688 38.168615 POINT (165.1157 38.16862)
## 10 16.865812 53.864852 POINT (16.86581 53.86485)
How to find which country a point is within?
random_joined = st_join(random_points, world["name_long"])
random_joined
## Simple feature collection with 10 features and 3 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: -158.1893 ymin: -42.91501 xmax: 165.1157 ymax: 80.5408
## Geodetic CRS: WGS 84
## x y name_long geometry
## 1 -58.984754 -21.242775 Paraguay POINT (-58.98475 -21.24278)
## 2 -13.059627 25.427439 Morocco POINT (-13.05963 25.42744)
## 3 -158.189262 80.540804 <NA> POINT (-158.1893 80.5408)
## 4 -108.923901 27.800978 Mexico POINT (-108.9239 27.80098)
## 5 -9.246895 49.982200 <NA> POINT (-9.246895 49.9822)
## 6 -71.622506 20.158830 <NA> POINT (-71.62251 20.15883)
## 7 38.433183 -42.915005 <NA> POINT (38.43318 -42.91501)
## 8 -133.195644 6.053818 <NA> POINT (-133.1956 6.053818)
## 9 165.115688 38.168615 <NA> POINT (165.1157 38.16862)
## 10 16.865812 53.864852 Poland POINT (16.86581 53.86485)
sf
(incomplete list)st_combine()
or st_union()
: combine
several feature geometries into onest_centroid()
: get the centroid of each feature;
sometimes the geographic centroid falls outside the boundaries of their
parent objects (think of a doughnut). In such cases
st_point_on_surface()
can be used to guarantee the point
will be in the parent objectst_crs()
to check which crs an object hasst_set_crs()
: specify CRSst_proj_info(type = 'datum')
and
st_proj_info(type = 'proj')
: projection informationst_area()
, st_length()
, and
st_distance()
: calculate geometric measurements,
e.g. cn = st_area(world[world$name_long == 'China',])
. If
CRS is set, the results comes with units; to convert units, try
units::set_units(cn, 'km^2')
st_intersects()
: tests whether two sf objects
intersect, the opposite is st_disjoint()
; other similar
functions st_within()
, st_touches()
,
st_is_within_distance()
, st_contains()
,
etc.st_cast()
: convert different types of geometriesst_buffer()
: create a buffer around, better to use it
on projected crsThe spatial raster data model represents the world with equally spaced and consistent sized grid of cells (i.e. pixels). The raster data model usually consists of a raster header and a matrix (with rows and columns) representing values of equally spaced cells (only one value per cell). The header defines the coordinate reference system, the extent, and the origin. This matrix representation avoids storing explicitly the coordinates for the four corner points (in fact it only stores one coordinate, namely the origin) of each cell corner as would be the case for rectangular vector polygons. Therefore, raster can process data faster and more efficient than geographic vector data processing.
library(terra)
## terra 1.7.39
##
## Attaching package: 'terra'
## The following object is masked from 'package:tidyr':
##
## extract
elev = rast(nrows = 6, ncols = 6, resolution = 0.5,
xmin = -1.5, xmax = 1.5, ymin = -1.5, ymax = 1.5,
vals = 1:36)
elev
## class : SpatRaster
## dimensions : 6, 6, 1 (nrow, ncol, nlyr)
## resolution : 0.5, 0.5 (x, y)
## extent : -1.5, 1.5, -1.5, 1.5 (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat WGS 84
## source(s) : memory
## name : lyr.1
## min value : 1
## max value : 36
plot(elev)
summary(elev)
## lyr.1
## Min. : 1.00
## 1st Qu.: 9.75
## Median :18.50
## Mean :18.50
## 3rd Qu.:27.25
## Max. :36.00
global(elev, sd) # global operation
## sd
## lyr.1 10.53565
hist(elev)
elev + elev
## class : SpatRaster
## dimensions : 6, 6, 1 (nrow, ncol, nlyr)
## resolution : 0.5, 0.5 (x, y)
## extent : -1.5, 1.5, -1.5, 1.5 (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat WGS 84
## source(s) : memory
## name : lyr.1
## min value : 2
## max value : 72
elev ^ 2
## class : SpatRaster
## dimensions : 6, 6, 1 (nrow, ncol, nlyr)
## resolution : 0.5, 0.5 (x, y)
## extent : -1.5, 1.5, -1.5, 1.5 (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat WGS 84
## source(s) : memory
## name : lyr.1
## min value : 1
## max value : 1296
elev < 20
## class : SpatRaster
## dimensions : 6, 6, 1 (nrow, ncol, nlyr)
## resolution : 0.5, 0.5 (x, y)
## extent : -1.5, 1.5, -1.5, 1.5 (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat WGS 84
## source(s) : memory
## name : lyr.1
## min value : FALSE
## max value : TRUE
plot(elev < 20)
# extract values
id = cellFromXY(elev, xy = matrix(c(0.1, 0.1), ncol = 2))
elev[id]
## lyr.1
## 1 16
terra::extract(elev, matrix(c(0.1, 0.1), ncol = 2))
## lyr.1
## 1 16
rmask = elev
values(rmask) = sample(c(NA, TRUE), 36, replace = TRUE)
plot(rmask)
plot(mask(elev, rmask))
# multiple layers
if(!require("spDataLarge"))
install.packages("spDataLarge",
repos = "https://geocompr.r-universe.dev")
## Loading required package: spDataLarge
##
## Attaching package: 'spDataLarge'
## The following object is masked _by_ '.GlobalEnv':
##
## random_points
multi_raster_file = system.file("raster/landsat.tif", package = "spDataLarge")
multi_rast = rast(multi_raster_file)
multi_rast
## class : SpatRaster
## dimensions : 1428, 1128, 4 (nrow, ncol, nlyr)
## resolution : 30, 30 (x, y)
## extent : 301905, 335745, 4111245, 4154085 (xmin, xmax, ymin, ymax)
## coord. ref. : WGS 84 / UTM zone 12N (EPSG:32612)
## source : landsat.tif
## names : landsat_1, landsat_2, landsat_3, landsat_4
## min values : 7550, 6404, 5678, 5252
## max values : 19071, 22051, 25780, 31961
nlyr(multi_rast)
## [1] 4
hist(multi_rast)
## Warning: [hist] a sample of62% of the cells was used
## Warning: [hist] a sample of62% of the cells was used
## Warning: [hist] a sample of62% of the cells was used
## Warning: [hist] a sample of62% of the cells was used
multi_rast[[2]]
## class : SpatRaster
## dimensions : 1428, 1128, 1 (nrow, ncol, nlyr)
## resolution : 30, 30 (x, y)
## extent : 301905, 335745, 4111245, 4154085 (xmin, xmax, ymin, ymax)
## coord. ref. : WGS 84 / UTM zone 12N (EPSG:32612)
## source : landsat.tif
## name : landsat_2
## min value : 6404
## max value : 22051
plot(subset(multi_rast, 2))
# focal operation
r_focal = focal(elev, w = matrix(1, nrow = 3, ncol = 3), fun = min)
# zonal operations
grain_order = c("clay", "silt", "sand")
grain_char = sample(grain_order, 36, replace = TRUE)
grain_fact = factor(grain_char, levels = grain_order)
grain = rast(nrows = 6, ncols = 6, resolution = 0.5,
xmin = -1.5, xmax = 1.5, ymin = -1.5, ymax = 1.5,
vals = grain_fact)
plot(grain)
z = zonal(elev, grain, fun = "mean")
z
## lyr.1 lyr.1.1
## 1 clay 17.71429
## 2 silt 12.36364
## 3 sand 22.55556
# aggregate & disaggregate & resampling
Most text and code above came from Geocomputation with R; credits for the authors.