The workshop is aimed at everyone interested in using OpenStreetMap (OSM) to support sustainable transport planning, in professional or advocacy contexts. It demonstrates how to get started with using OSM data representing transport infrastructure for sustainable transport planning, research and policy-making. It covers how to identify, re-categorize, visualize, and analyze key tags that represent walking, cycling, and wheeling networks. We also discuss OSM data in the context of other data sources to identify its advantages and limitations.

The workshop is practical and covers the following questions:

  1. What gaps in transport evidence can OSM data fill in different countries?
  2. Which elements and tags are most important for transport planning (and how to get the data in reproducible data science environments)?
  3. How to filter based on different tags and their values?
  4. How to use OSM datasets to visualize transport systems in static and interactive maps?
  5. How to add value to OSM data and communicate results using R, Python and A/B Street, and osm2streets?

To get the most out of the workshop, we recommend installing R, RStudio, and the required packages before the session. To do this, follow “Installation” section below. To make sure that everything is installed correctly, run the code in the “Installation Check” section below. Check issue 98 on the udsleeds/openinfra repo on GitHub (you will need an account to comment) to see what you should get after running that code. Let us know if you have not :)

This workshop assumes no any prior knowledge of R. We recommend taking a look at online resources like RStudio’s Beginners guide and its Gentle Introduction to Tidy Statistics in R if you’re new to the language, however.

Preparation

Installing R and RStudio

In this workshop we’ll use R, a statistical programming language widely used for data science, modelling and visualisation. If you are new to R, you should install base R and RStudio before the workshop begins.
There is a great R and RStudio installation guide (Section A.1) by Garrett Grolemund, within Hands-On Programming with R for Windows, Mac, and Linux devices.

Installing packages

A number of packages will be used within this workshop, though namely osmextract and openinfra.

Here are some of the additional packages we will make use of within this workshop:

pkgs = c(
  "tmap",    # package for map making
  "sf",      # geographic vector data classes and functions
  "dplyr",   # data manipulation
  "remotes"  # for installing packages from GitHub
  )

These can be installed with the following command:

install.packages(pkgs, repos = "http://cran.us.r-project.org", dependencies = TRUE)

You can install additional packages, including the development versions of the osmextract and openinfra packages as follows:

install.packages("tidyverse")
install.packages("mapedit")
# Enter the lines below into your console to install!
remotes::install_github("ropensci/osmextract", force = T)
remotes::install_github("udsleeds/openinfra")

Each package can then be loaded one-by-one as follows:1

Installation check

If installation has occurred correctly, running the code below locally (on your own device) should successfully create two plots:
One 5km radius circular buffer of the infrastructure network around Leeds (UK) City Centre, and
Secondly, a test plot just outside of the Institute of Transport Studies (ITS), at the University of Leeds.

# Pre-installed package data for central Leeds
data = openinfra::sotm_data
# Test osmextract file (ITS Leeds = Institute of Transport Studies, University of Leeds)
test_data = osmextract::oe_get(place = "ITS Leeds")

# Plot central Leeds
tm_shape(data |> dplyr::select("highway")) + 
  tm_lines(col = "highway", title.col = "OSM Highways") + 
  tm_layout( legend.bg.alpha = 0.5, legend.bg.color = "white")

# Plot ITS
tm_shape(test_data |> dplyr::select("highway")) + 
  tm_lines(col = "highway", title.col = "OSM Highways") + 
  tm_layout( legend.bg.alpha = 0.5, legend.bg.color = "white")

To check that your plots match, check issue 98

Installing packages and install check in one

If you already have base r & Rstudio installed, you can perform the package installations and run the installation check in one by copy and pasting the code cell below into your rstudio and hitting crtl + enter:

pkgs = c(
  "tmap",    # package for map making
  "sf",      # geographic vector data classes and functions
  "dplyr",   # data manipulation
  "remotes", # for installing packages from GitHub
  "tidyverse",
  "mapedit"
)

install.packages(pkgs, repos = "http://cran.us.r-project.org")

remotes::install_github("ropensci/osmextract", force = T)
remotes::install_github("udsleeds/openinfra")

library(tmap)
library(sf)
library(dplyr)
library(remotes)
library(osmextract)
library(openinfra)

# Pre-installed package data for central Leeds
data = openinfra::sotm_data
# Test osmextract file (ITS Leeds = Institute of Transport Studies, University of Leeds)
test_data = osmextract::oe_get(place = "ITS Leeds")

# Plot central Leeds
tm_shape(data |> dplyr::select("highway")) + 
  tm_lines(col = "highway", title.col = "OSM Highways") + 
  tm_layout( legend.bg.alpha = 0.5, legend.bg.color = "white")

# Plot ITS
tm_shape(test_data |> dplyr::select("highway")) + 
  tm_lines(col = "highway", title.col = "OSM Highways") + 
  tm_layout( legend.bg.alpha = 0.5, legend.bg.color = "white")

Why these packages?

The osmextract package is an open source package for downloading and importing large OSM ‘extracts’ from providers who make the datasets available at national and regional levels. We can view all osmextract available providers with the following:

osmextract::oe_providers() 


osmextract can currently import datasets from the following providers: Geofabrik, BBBike, and openstreetmap_fr. These provide OSM datasets in the pbf file format .pbf, which is optimized for small file sizes (see the OSM wiki for details).

The openinfra package, contains several functions to re-categorize OSM infrastructure data from the default key=tag values to friendlier visuals.

tmap is a fantastic package for visualizing geospatial data either statically or interactively.

The sf package supports the representation of spatial vector data as simple features which, consequentially, makes spatial analysis much more accessible. This package is a backbone of geographic vector data in R and the code in this workshop :)

OpenInfra data packs

A key output of the project is open access active travel data packs. These will be released in the Releases section of the udsleeds/openinfra repo: https://github.com/udsleeds/openinfra/releases

Let’s dive straight in and load some data:

u = "https://github.com/udsleeds/openinfra/releases/download/v0.2/datapack_leeds.geojson"
openinfra_data = sf::read_sf(u)

Task: plot and do basic analysis of this dataset.

u = "https://github.com/udsleeds/openinfra/releases/download/0.3/datapack_leeds.zip"
f = basename(u)
download.file(url = u, destfile = f)

Importing OSM data

In this section we will use osmextract to download OSM data that will be used throughout this workshop.

The osmextract package comprises the following core functions (also see a vignette introducing osmextract):

Function Description
oe_providers() Shows a data frame of currently supported OSM data providers
oe_match() Matches the input place query with a url from one of the OSM providers
oe_download() Downloads data from OSM providers
oe_vectortranslate() Converts between .pbf (providers default) and .gpkg file formats
oe_read() Reads downloaded .pbf and .gpk files into R
oe_get() Performs all of the above in a single function

To start off we will use a generic oe_get() function to download data for Leeds, UK. It might take a while. If you struggle downloading the data or it takes long to compute, then you might use the example dataset (e.g. openinfra::sotm_data) for central Leeds. This is a much smaller dataset, thus making computing much quicker.

# A useful function is `oe_match_pattern()` to search for patterns in a provider's database. It is helpful in indicating a correct region name for a given provider.
osmextract::oe_match_pattern("Leeds") 

leeds = osmextract::oe_get(
  place = "Leeds", 
  provider = "bbbike", # Indicates the provider; default is geofabrik
  layer = "lines" # Default; returns linestring geometries (highways etc) 
)

# have a look at the data
leeds |> dplyr::glimpse()

See the documentation associated with oe_get() on the package website or by entering the command ?oe_get into the R console.

Tags

In the previous example we’ve downloaded OSM data but it does not necessarily contain all the information we need (unless you used the example sotm_data). Indeed, it only has 10 columns and only highway column is directly useful for active travel infrastructure planning!

To get additional tags we need to specify this as an additional argument before running oe_get().

# so let's define the extra tags we want

et = c("kerb", "width", "sidewalk", "cycleway", "footway", "lit", "wheelchair",
       "maxspeed", "foot", "access", "service", "bicycle", "oneway")

leeds = osmextract::oe_get(
  place = "Leeds", 
  provider = "bbbike",
  layer = "lines",
  extra_tags = et # non-default tags to download
)

# have a look at the data
leeds |> dplyr::glimpse()

# also it's useful to have a look at the downloaded data by plotting it
tmap::tm_shape(leeds |> dplyr::filter(!is.na("highway")))+
  tmap::tm_lines(col = "highway")

Strangely (or not!) our Leeds data actually represents the whole of West Yorkshire and not only Leeds as we requested.

# also it's useful to have a look at the downloaded data by plotting it 
tm_shape(leeds |> dplyr::filter(!is.na("highway")))+
  tm_lines(col = "highway" )

The Leeds data actually represents the whole of West Yorkshire and not only Leeds as we requested. This is because BBBike only provides results for rectangular datasets. To refine this network further we can apply spatial subsetting.

Defining your own area (optional)

You can subset the downloaded datasets using mapview and mapedit packages (which you need to have installed if you run this section)2 to manually define an area, get its coordinates to perform spatial subsetting.

map_leeds = mapview::mapview(sotm_data) # create an object containing interactive map  
map_leeds_edit = mapedit::editMap(map_leeds) # use `mapedit` package to draw a new polygon
box = sf::st_bbox(map_leeds_edit$drawn$geometry) # extract the coordinates of a new polygon
box

Getting data with osmdata (optional)

Another package for getting OSM data is osmdata, which gets data from the Overpass API rather than from bulk .pbf files. osmdata is ideal for getting small amounts of data from OSM, e.g. the boundary representing Leeds, as shown in the code chunk below (you need to have the osmdata package installed to run this)3.

leeds_boundary = osmdata::getbb("Leeds", format_out = "sf_polygon", limit = 1)
qtm(leeds_boundary)

After running the lines of code above you should see the following which is the boundary for Leeds:

Task: try get the boundary for Florence (Firenze in Italian) using the same technique.

Getting a boundary with a buffer

Let’s get everything within a 2.5 km radius of central Leeds, which has coordinates of -1.549, 53.801:
You can change the size of the circular buffer by altering the dist parameter (metres)

leeds_centre_point = sf::st_sfc(sf::st_point(c(-1.549, 53.801)), crs = "EPSG:4326")
leeds_buffer = sf::st_buffer(leeds_centre_point, dist = 2500)

Spatial subsetting

Next we will subset Leeds city from leeds dataset. Spatial subsetting is one of the essential operations in geocomputation, so if you want to learn more about this, check out “Spatial subsetting” section in the Geocomputation with R (Lovelace et al., 2022).

Now we have coordinates of an area we are interested in. From here on there are two ways for getting OSM data for that. One is to use oe_get() again and give it an additional boundary and boundary_type arguments.

leeds_defined = osmextract::oe_get(
  place = "Leeds", 
  provider = "bbbike", 
  layer = "lines", 
  extra_tags = et,
  boundary = leeds_buffer,
  boundary_type = "clipsrc"
)

Alternatively, we can subset the leeds data, so we do not have to download data again.

leeds_defined2 = leeds[leeds_buffer, op = st_within] 

You can plot this as follows:

tm_shape(leeds_defined2) +
  tm_lines(col = "black")

Task:

  1. Decide which tags you would like to download (be selective!)
  2. Download OSM data for a place of your choice by modifying the code above.
  3. Bonus: subset your dataset and plot it.

Note: not every city/town/region might be directly queried using osmextract.

Data exploration

As alluded in the previous section, it is important to understand the data before using it for further (spatial) analysis or inform decision-making.

In this section you will be introduced to data filtering based on OSM tags and how to visualise them.

Data filtering

Our data contains a lot of information, but not all of it might be needed. For example, active travel includes cycling, walking, and wheeling, but you may only be interested in cycling. In this case it would be reasonable to contain only data that is related to cycling. In other words, we may not need data on waterways or motorways as one cannot cycle on them.

# filter out all the rows that do not contain highways
leeds_highways = leeds_defined2 |> 
  dplyr::filter(!is.na(highway))

Our data now only contains highways in Leeds, but we want to create a new dataset containing highway=cycleway only and then visualise them by plotting on a map.

# filter and select
leeds_cycleway = leeds_highways |> 
  dplyr::filter(highway == "cycleway") |> # filtering based on a key value
  dplyr::select(highway)

# plot
leeds_cycleway |> plot()

Of course, highway=cycleway does not represent an entire cycling infrastructure in the given region. For this we need to apply more complex (conditional) filtering to our data.

For example, you may know that it is illegal to cycle on certain roads, such as motorways in the UK. Also, it may be a case that a cycleway is not mapped separately but, rather, has been tagged as part of another highway, such as tertiary road, using bicycle=* tag.

# To filter data based on these requirements we will:

leeds_cycle = leeds_highways |> 
  dplyr::filter(highway == "cycleway" |
                bicycle %in% c('yes'))  

# let's plot our new cycle network
leeds_cycle["geometry"] |> plot()

Task:

  1. Think what other tags and/or their values might be useful to define cycling network. Use them to further refine the network and plot it.
  2. Do you think your new network accurately represents the existing network?
  3. What do you think are potential challenges in defining a network? (eg. missing data)
  4. Bonus: try to define a pedestrian network.

Pre-defined networks

It’s not easy to define a network and, once conceptualised, a lack of data might still lead to innacurate representation, thus limiting further analysis and application.

Nevertheless, it’s still useful to have a network defined. For instance, osmextract does offer an opportunity to download pre-defined walking, cycling, or driving networks. Yet, if you choose to do this, do check the code used to create the network to ensure that you understand how the network has been defined and that it works for your needs.

# to download cycling network as defined by `osmextract`
osme_cycle = osmextract::oe_get_network(place = "Leeds",
                                        mode = "cycling")
# plot cycling network
osme_cycle["geometry"] |>
  plot()

Visualisation

In this workshop we’ve been visualizing our data to improve our understanding. Up to this point we’ve been visualising data statically using base R plot() function. However, there is a powerful tmap library that can be used to create static and interactive plots to communicate your findings.

# equivalent to the earlier plotted map
# Note: it might be useful to store your plot as a tmap object that can be called at any point.

osme_cycle_plot = tm_shape(osme_cycle["geometry"])+
  tm_lines()

To plot an interactive map you have to specify this using tmap_mode() function.

tmap_mode("view") # now all the plots will be interactive
tm_shape(osme_cycle)+
  tm_lines("highway")

Task:

  1. Plot an interactive map of a network you defined in the previous task.
  2. Create a tmap object of a static map. Tip: run ?tmap_mode() to access function’s documentation.
  3. Bonus: Save your new tmap object. Tip: explore tmap_save() function.

OpenInfra

In this section we are going to explore OpenInfra package and some of the functions it currently contains.

The openinfra package contains a suite of functions that recategorise OSM data to more meaningful variables for transport planners and policy makers. The aim of the openinfra package is to add value to OSM data so that this may act as better evidence to support transport planners and policy makers.

One of the functions that the package has is oi_recode_road_class(). It recategorises OSM highways based on the Chan and Cooper’s (2019) work in which they propose that road class can be used to infer motor traffic volume. Road traffic volume is an important factor in discouraging cycling.

leeds_high_rec = leeds_defined2 |> 
  openinfra::oi_recode_road_class()
tm_shape(leeds_high_rec)+
  tm_lines("oi_road_desc")

Another important consideration in the context of cycling is the maxspeed of motor vehicles as lower motor vehicle speed is associated with lower driver-cyclist road crashes.

leeds_speed_rec = leeds_defined2 |> 
  openinfra::oi_clean_maxspeed_uk()
tm_shape(leeds_speed_rec)+
  tm_lines("oi_maxspeed")

We know that in the UK it recommended to have some kind of segregation (eg. lines, tracks) if maxspeed of a motor vehicle is allowed to exceed 30mph. Therefore, we might want to find out if the existing cycling infrastructure complies with the requirements for roads that have maxspeed of 30mph and over.

# Note: that this is simplification of an above example and does not take into consideration .
leeds_seg = leeds_speed_rec |> dplyr::filter(oi_maxspeed != "20 mph" &
                                               cycleway %in% c("lane", "track", "left") |
                                               highway == "cycleway")
tm_shape(leeds_seg["geometry"]) + 
  tm_lines()

Here we have shown outputs from two of the openinfra functions, oi_clean_maxspeed_uk and oi_recode_road_class. The other package functions and their respective documentation can be found here

Task: Modify the above example, apply to your data, and then plot it. What does it say about the existing cycling infrastructure?

Transport Infrastructure Data Packs

The transport infrastructure data packs are created from recategorised OSM data. The OSM data has been recategorised by the OpenInfra functions.

Following this workshop, you should be able to create such infrastructure data packs for your desired area.

Here we will visualise the outputs from several openinfra fucntions.

oi_recode_road_class()

This function recategorises OSM infrastructure data from the default highway=* values to one of the official UK road classifications.
With such information, one may wish to infer traffic flow volumes as a function of road classification.

oi_active_cycle() and oi_active_walk()

The oi_active_cycle and oi_active_walk functions recategorise OSM infrastructure data based on whether or not the feature in question forms part of a usable transport network for cyclists or pedestrians.

oi_active_cycle

oi_active_walk

oi_clean_maxspeed_uk()

This function recategorises default OSM maxspeed values to be compliant with the current official UK speed limits.

oi_is_lit()

This functions recategorises default OSM lit values, values that indicate the type and presence of lighting, to either:

  • “yes”,
  • “no”,
  • or “maybe”

Note that “maybe” does not indicate the lack of lighting, instead, this indicates that there is a lack of data to distinguish between “yes” or “no”.
Clearly, in built up urban areas it is likely that there will be lighting. However, in more rural areas and the countryside this is more likely to be “no”.
Such information may be valuable if you would like to establish a safe active travel network that is suitable to use after dark.

Next steps

The purpose of this workshop was to get you started. The approach, reading in large datasets into R and modifying/visualising them to support transport planning, is flexible so there are many possible future directions of travel. Below are some suggestions, ranging from easy to hard.

  1. Difficult: Run the code in the “Visualising cycling potential with A/B Street” vignette from the abstr R package to demonstrate R can be used to generate data for dynamic agent-based travel simulation.

  1. You can also load all packages at once with:

    pkgs = c(pkgs, "osmextract", "openinfra")
    lapply(pkgs, library, character.only = TRUE)[length(pkgs)]

    ↩︎
  2. These can be installed with:

    install.packages(c("mapview", "mapedit"))

    ↩︎
  3. This can be installed with:

    install.packages("osmdata")

    ↩︎