vignettes/SOTM_workshop.Rmd
SOTM_workshop.Rmd
The workshop is aimed at everyone interested in using OpenStreetMap (OSM) to support sustainable transport planning, in professional or advocacy contexts. It demonstrates how to get started with using OSM data representing transport infrastructure for sustainable transport planning, research and policy-making. It covers how to identify, re-categorize, visualize, and analyze key tags that represent walking, cycling, and wheeling networks. We also discuss OSM data in the context of other data sources to identify its advantages and limitations.
The workshop is practical and covers the following questions:
To get the most out of the workshop, we recommend installing R, RStudio, and the required packages before the session. To do this, follow “Installation” section below. To make sure that everything is installed correctly, run the code in the “Installation Check” section below. Check issue 98 on the udsleeds/openinfra repo on GitHub (you will need an account to comment) to see what you should get after running that code. Let us know if you have not :)
This workshop assumes no any prior knowledge of R. We recommend taking a look at online resources like RStudio’s Beginners guide and its Gentle Introduction to Tidy Statistics in R if you’re new to the language, however.
In this workshop we’ll use R, a statistical programming language
widely used for data science, modelling and visualisation. If you are
new to R, you should install base R and RStudio before the workshop
begins.
There is a great R and
RStudio installation guide (Section A.1) by Garrett Grolemund,
within Hands-On
Programming with R for Windows, Mac, and Linux devices.
A number of packages will be used within this workshop, though namely
osmextract
and openinfra.
Here are some of the additional packages we will make use of within this workshop:
pkgs = c(
"tmap", # package for map making
"sf", # geographic vector data classes and functions
"dplyr", # data manipulation
"remotes" # for installing packages from GitHub
)
These can be installed with the following command:
install.packages(pkgs, repos = "http://cran.us.r-project.org", dependencies = TRUE)
You can install additional packages, including the development
versions of the osmextract
and openinfra
packages as follows:
install.packages("tidyverse")
install.packages("mapedit")
# Enter the lines below into your console to install!
remotes::install_github("ropensci/osmextract", force = T)
remotes::install_github("udsleeds/openinfra")
Each package can then be loaded one-by-one as follows:1
If installation has occurred correctly, running the code below
locally (on your own device) should successfully create two plots:
One 5km radius circular buffer of the infrastructure network around
Leeds (UK) City Centre, and
Secondly, a test plot just outside of
the Institute of Transport Studies (ITS), at the University of
Leeds.
# Pre-installed package data for central Leeds
data = openinfra::sotm_data
# Test osmextract file (ITS Leeds = Institute of Transport Studies, University of Leeds)
test_data = osmextract::oe_get(place = "ITS Leeds")
# Plot central Leeds
tm_shape(data |> dplyr::select("highway")) +
tm_lines(col = "highway", title.col = "OSM Highways") +
tm_layout( legend.bg.alpha = 0.5, legend.bg.color = "white")
# Plot ITS
tm_shape(test_data |> dplyr::select("highway")) +
tm_lines(col = "highway", title.col = "OSM Highways") +
tm_layout( legend.bg.alpha = 0.5, legend.bg.color = "white")
To check that your plots match, check issue 98
If you already have base r
& Rstudio
installed, you can perform the package installations and run the
installation check in one by copy and pasting the code cell below into
your rstudio and hitting crtl + enter
:
pkgs = c(
"tmap", # package for map making
"sf", # geographic vector data classes and functions
"dplyr", # data manipulation
"remotes", # for installing packages from GitHub
"tidyverse",
"mapedit"
)
install.packages(pkgs, repos = "http://cran.us.r-project.org")
remotes::install_github("ropensci/osmextract", force = T)
remotes::install_github("udsleeds/openinfra")
library(tmap)
library(sf)
library(dplyr)
library(remotes)
library(osmextract)
library(openinfra)
# Pre-installed package data for central Leeds
data = openinfra::sotm_data
# Test osmextract file (ITS Leeds = Institute of Transport Studies, University of Leeds)
test_data = osmextract::oe_get(place = "ITS Leeds")
# Plot central Leeds
tm_shape(data |> dplyr::select("highway")) +
tm_lines(col = "highway", title.col = "OSM Highways") +
tm_layout( legend.bg.alpha = 0.5, legend.bg.color = "white")
# Plot ITS
tm_shape(test_data |> dplyr::select("highway")) +
tm_lines(col = "highway", title.col = "OSM Highways") +
tm_layout( legend.bg.alpha = 0.5, legend.bg.color = "white")
The osmextract
package is an open source package for downloading and importing large
OSM ‘extracts’ from providers who make the datasets available at
national and regional levels. We can view all osmextract
available providers with the following:
osmextract::oe_providers()
osmextract
can currently import datasets from the
following providers: Geofabrik, BBBike, and openstreetmap_fr.
These provide OSM datasets in the pbf file format .pbf, which is
optimized for small file sizes (see the OSM wiki for
details).
The openinfra package,
contains several functions to re-categorize OSM infrastructure data from
the default key=tag values to friendlier visuals.
tmap
is
a fantastic package for visualizing geospatial data either statically or
interactively.
The sf
package supports the representation of spatial vector data as simple
features which, consequentially, makes spatial analysis much more
accessible. This package is a backbone of geographic vector data in R
and the code in this workshop :)
A key output of the project is open access active travel data packs. These will be released in the Releases section of the udsleeds/openinfra repo: https://github.com/udsleeds/openinfra/releases
Let’s dive straight in and load some data:
u = "https://github.com/udsleeds/openinfra/releases/download/v0.2/datapack_leeds.geojson"
openinfra_data = sf::read_sf(u)
Task: plot and do basic analysis of this dataset.
u = "https://github.com/udsleeds/openinfra/releases/download/0.3/datapack_leeds.zip"
f = basename(u)
download.file(url = u, destfile = f)
In this section we will use osmextract
to download OSM
data that will be used throughout this workshop.
The osmextract
package comprises the following core
functions (also see a vignette
introducing osmextract
):
Function | Description |
---|---|
oe_providers() |
Shows a data frame of currently supported OSM data providers |
oe_match() |
Matches the input place query with a url from one of the OSM providers |
oe_download() |
Downloads data from OSM providers |
oe_vectortranslate() |
Converts between .pbf (providers default) and .gpkg file formats |
oe_read() |
Reads downloaded .pbf and .gpk files into R |
oe_get() |
Performs all of the above in a single function |
To start off we will use a generic oe_get()
function to
download data for Leeds, UK. It might take a while. If you struggle
downloading the data or it takes long to compute, then you might use the
example dataset (e.g. openinfra::sotm_data
) for central
Leeds. This is a much smaller dataset, thus making computing much
quicker.
# A useful function is `oe_match_pattern()` to search for patterns in a provider's database. It is helpful in indicating a correct region name for a given provider.
osmextract::oe_match_pattern("Leeds")
leeds = osmextract::oe_get(
place = "Leeds",
provider = "bbbike", # Indicates the provider; default is geofabrik
layer = "lines" # Default; returns linestring geometries (highways etc)
)
# have a look at the data
leeds |> dplyr::glimpse()
See the documentation associated with oe_get()
on the
package website or by entering the command ?oe_get
into the
R console.
In the previous example we’ve downloaded OSM data but it does not
necessarily contain all the information we need (unless you used the
example sotm_data
). Indeed, it only has 10 columns and only
highway
column is directly useful for active travel
infrastructure planning!
To get additional tags we need to specify this as an additional
argument before running oe_get()
.
# so let's define the extra tags we want
et = c("kerb", "width", "sidewalk", "cycleway", "footway", "lit", "wheelchair",
"maxspeed", "foot", "access", "service", "bicycle", "oneway")
leeds = osmextract::oe_get(
place = "Leeds",
provider = "bbbike",
layer = "lines",
extra_tags = et # non-default tags to download
)
# have a look at the data
leeds |> dplyr::glimpse()
# also it's useful to have a look at the downloaded data by plotting it
tmap::tm_shape(leeds |> dplyr::filter(!is.na("highway")))+
tmap::tm_lines(col = "highway")
Strangely (or not!) our Leeds data actually represents the whole of West Yorkshire and not only Leeds as we requested.
# also it's useful to have a look at the downloaded data by plotting it
tm_shape(leeds |> dplyr::filter(!is.na("highway")))+
tm_lines(col = "highway" )
The Leeds data actually represents the whole of West Yorkshire and not only Leeds as we requested. This is because BBBike only provides results for rectangular datasets. To refine this network further we can apply spatial subsetting.
You can subset the downloaded datasets using mapview
and
mapedit
packages (which you need to have installed if you
run this section)2 to manually define an area, get its
coordinates to perform spatial subsetting.
map_leeds = mapview::mapview(sotm_data) # create an object containing interactive map
map_leeds_edit = mapedit::editMap(map_leeds) # use `mapedit` package to draw a new polygon
box = sf::st_bbox(map_leeds_edit$drawn$geometry) # extract the coordinates of a new polygon
box
osmdata
(optional)
Another package for getting OSM data is osmdata
, which
gets data from the Overpass API rather than from bulk .pbf files.
osmdata
is ideal for getting small amounts of data from
OSM, e.g. the boundary representing Leeds, as shown in the code chunk
below (you need to have the osmdata
package installed to
run this)3.
leeds_boundary = osmdata::getbb("Leeds", format_out = "sf_polygon", limit = 1)
qtm(leeds_boundary)
After running the lines of code above you should see the following which is the boundary for Leeds:
Task: try get the boundary for Florence (Firenze in Italian) using the same technique.
Let’s get everything within a 2.5 km radius of central Leeds, which
has coordinates of -1.549, 53.801:
You can change the size of the
circular buffer by altering the dist
parameter (metres)
Next we will subset Leeds city from leeds
dataset.
Spatial subsetting is one of the essential operations in geocomputation,
so if you want to learn more about this, check out “Spatial
subsetting” section in the Geocomputation with R (Lovelace et al.,
2022).
Now we have coordinates of an area we are interested in. From here on
there are two ways for getting OSM data for that. One is to use
oe_get()
again and give it an additional
boundary
and boundary_type
arguments.
leeds_defined = osmextract::oe_get(
place = "Leeds",
provider = "bbbike",
layer = "lines",
extra_tags = et,
boundary = leeds_buffer,
boundary_type = "clipsrc"
)
Alternatively, we can subset the leeds
data, so we do
not have to download data again.
leeds_defined2 = leeds[leeds_buffer, op = st_within]
You can plot this as follows:
Task:
Note: not every city/town/region might be directly queried using osmextract.
As alluded in the previous section, it is important to understand the data before using it for further (spatial) analysis or inform decision-making.
In this section you will be introduced to data filtering based on OSM tags and how to visualise them.
Our data contains a lot of information, but not all of it might be needed. For example, active travel includes cycling, walking, and wheeling, but you may only be interested in cycling. In this case it would be reasonable to contain only data that is related to cycling. In other words, we may not need data on waterways or motorways as one cannot cycle on them.
# filter out all the rows that do not contain highways
leeds_highways = leeds_defined2 |>
dplyr::filter(!is.na(highway))
Our data now only contains highways in Leeds, but we want to create a new dataset containing highway=cycleway only and then visualise them by plotting on a map.
# filter and select
leeds_cycleway = leeds_highways |>
dplyr::filter(highway == "cycleway") |> # filtering based on a key value
dplyr::select(highway)
# plot
leeds_cycleway |> plot()
Of course, highway=cycleway does not represent an entire cycling infrastructure in the given region. For this we need to apply more complex (conditional) filtering to our data.
For example, you may know that it is illegal to cycle on certain
roads, such as motorways in the UK. Also, it may be a case that a
cycleway is not mapped separately but, rather, has been tagged as part
of another highway, such as tertiary road, using bicycle=*
tag.
# To filter data based on these requirements we will:
leeds_cycle = leeds_highways |>
dplyr::filter(highway == "cycleway" |
bicycle %in% c('yes'))
# let's plot our new cycle network
leeds_cycle["geometry"] |> plot()
Task:
It’s not easy to define a network and, once conceptualised, a lack of data might still lead to innacurate representation, thus limiting further analysis and application.
Nevertheless, it’s still useful to have a network defined. For
instance, osmextract
does offer an opportunity to download
pre-defined walking, cycling, or driving networks. Yet, if you choose to
do this, do check the code used to create the network to ensure that you
understand how the network has been defined and that it works for your
needs.
# to download cycling network as defined by `osmextract`
osme_cycle = osmextract::oe_get_network(place = "Leeds",
mode = "cycling")
# plot cycling network
osme_cycle["geometry"] |>
plot()
In this workshop we’ve been visualizing our data to improve our
understanding. Up to this point we’ve been visualising data statically
using base R plot()
function. However, there is a powerful
tmap
library that can be used to create static and
interactive plots to communicate your findings.
# equivalent to the earlier plotted map
# Note: it might be useful to store your plot as a tmap object that can be called at any point.
osme_cycle_plot = tm_shape(osme_cycle["geometry"])+
tm_lines()
To plot an interactive map you have to specify this using
tmap_mode()
function.
Task:
tmap
object of a static map. Tip: run
?tmap_mode()
to access function’s documentation.tmap_save()
function.In this section we are going to explore OpenInfra package and some of the functions it currently contains.
The openinfra
package contains a suite of functions that
recategorise OSM data to more meaningful variables for transport
planners and policy makers. The aim of the openinfra
package is to add value to OSM data so that this may act as better
evidence to support transport planners and policy makers.
One of the functions that the package has is
oi_recode_road_class()
. It recategorises OSM highways based
on the Chan and Cooper’s (2019) work in which they propose that road
class can be used to infer motor traffic volume. Road traffic volume is
an important factor in discouraging cycling.
leeds_high_rec = leeds_defined2 |>
openinfra::oi_recode_road_class()
Another important consideration in the context of cycling is the maxspeed of motor vehicles as lower motor vehicle speed is associated with lower driver-cyclist road crashes.
leeds_speed_rec = leeds_defined2 |>
openinfra::oi_clean_maxspeed_uk()
We know that in the UK it recommended to have some kind of segregation (eg. lines, tracks) if maxspeed of a motor vehicle is allowed to exceed 30mph. Therefore, we might want to find out if the existing cycling infrastructure complies with the requirements for roads that have maxspeed of 30mph and over.
# Note: that this is simplification of an above example and does not take into consideration .
leeds_seg = leeds_speed_rec |> dplyr::filter(oi_maxspeed != "20 mph" &
cycleway %in% c("lane", "track", "left") |
highway == "cycleway")
Here we have shown outputs from two of the openinfra
functions, oi_clean_maxspeed_uk
and
oi_recode_road_class
. The other package functions and their
respective documentation can be found here
Task: Modify the above example, apply to your data, and then plot it. What does it say about the existing cycling infrastructure?
The transport infrastructure data packs are created from recategorised OSM data. The OSM data has been recategorised by the OpenInfra functions.
Following this workshop, you should be able to create such infrastructure data packs for your desired area.
Here we will visualise the outputs from several
openinfra
fucntions.
This function recategorises OSM infrastructure data from the default
highway=*
values to one of the official UK road
classifications.
With such information, one may wish to infer
traffic flow volumes as a function of road classification.
The oi_active_cycle
and oi_active_walk
functions recategorise OSM infrastructure data based on whether or not
the feature in question forms part of a usable transport network for
cyclists or pedestrians.
oi_active_cycle
oi_active_walk
This function recategorises default OSM maxspeed
values
to be compliant with the current official UK speed limits.
This functions recategorises default OSM lit
values,
values that indicate the type and presence of lighting, to either:
Note that “maybe” does not indicate the lack of lighting, instead,
this indicates that there is a lack of data to distinguish between “yes”
or “no”.
Clearly, in built up urban areas it is likely that there
will be lighting. However, in more rural areas and the countryside this
is more likely to be “no”.
Such information may be valuable if you
would like to establish a safe active travel network that is suitable to
use after dark.
The purpose of this workshop was to get you started. The approach, reading in large datasets into R and modifying/visualising them to support transport planning, is flexible so there are many possible future directions of travel. Below are some suggestions, ranging from easy to hard.
abstr
R package to demonstrate R can be used to generate
data for dynamic agent-based travel simulation.You can also load all packages at once with:
pkgs = c(pkgs, "osmextract", "openinfra")
lapply(pkgs, library, character.only = TRUE)[length(pkgs)]
These can be installed with:
install.packages(c("mapview", "mapedit"))
This can be installed with:
install.packages("osmdata")