A reproducible notebook to acquire, process and analyse satellite imagery: Exploring long-term urban changes

Meixu Chen $^{1}$, Dominik Fahrner $^{2,3}$, Daniel Arribas-Bel$^{1}$, Francisco Rowe $^{1}$

$^{1}$ Geographic Data Science Lab, Department of Geography and Planning, University of Liverpool, Roxby Building, 74 Bedford St S, Liverpool, L69 7ZT, United Kingdom

$^{2}$ Department of Geography annd Planning, School of Environmental Sciences, University of Liverpool, Liverpool, L69 7ZT, United Kingdom

$^{3}$ Institute for Risk and Uncertainty, University of Liverpool, Chadwick Building, Peach Street, Liverpool, L7 7BD, United Kingdom


Satellite imagery is often used to study and monitor changes in natural environments and the Earth surface. The open availability and extensive temporal coverage of Landsat imagery has enabled to monitor changes in temperature, wind, vegetation and ice melting speed for a period of up to 46 years. Yet, the use of satellite imagery to study cities has remained underutilised in Regional Science, partly due to the lack of a practical methodological approach to capture data, extract relevant features and monitor changes in the urban environment. This notebook offers a framework to demonstrate how to batch-download high-resolution satellite imagery; and enable the extraction, analysis and visualisation of features of the built environment to capture long-term urban changes.

Keywords: satellite imagery, image segmentation, urbanisation, cities, urban change, computational notebooks


Sustainable urban habitats are a key component of many global challenges. Efficient management and planning of cities are pivotal to all 17 UN Sustainable Development Goals (SDGs). Over 90% of the projected urban population growth by 2050 will occur in less developed countries (UN, 2019). Concentrated in cities, this growth offers an opportunity for social progress and economic development but it also imposes major challenges for urban planning. Prior work on urbanisation has identified the benefits of agglomeration and improvements in health and education, which tend to outweigh the costs of congestion, pollution and poverty (Glaeser and Henderson, 2017). Yet research has remained largely focused on Western cities (e.g. Burchfield et al., 2006), developing a good understanding of urban areas in high-income, developed countries (Glaeser and Henderson, 2017). Much less is known about the long-term evolution of urban habitats in less developed countries. Analysis of historical census data exist exploring changes at discrete points over time such as slum detection (e.g. Giada et al., 2003; Kit and Lüdeke, 2013; Kohli, Sliuzas and Stein, 2016 ) . Less applications can be identified tracking changes in urban settings over a continuous temporal scale (Ibrahim et al. 2020).This gap is partly due to the lack of comprehensive and consistent data sources capturing the long-term dynamics of urban structures in less developed countries.

Cities in Asia provide a unique setting to explore the challenges triggered by rapid urbanisation. The share of urban population in Asia is currently at a turning point transitioning to exceed the share of rural population. Currently Asia is home to over 53% of the urban population globally and the share of urban population is projected to increase to 66% by 2050 (UN, 2019). Developing tools to monitor and understand the past and current urbanisation process is key to guide appropriate urban planning and policy strategies.

Recent technological developments can help overcome the paucity in spatially-detailed urban data in less developed countries. The combination of geospatial technology, cheap computing and new machine learning algorithms has ushered in an age of new forms of data, producing brand new data sets and repurposing existing sources. Satellite imagery represents a key source of information. Photographs from the sky have existed for decades, but their use in the context of socioeconomic urban research has been limited. Image data has been hard to process and understand for social scientists. Yet recent developments in machine learning and artificial intelligence have made images computable and turned these data into brand new information to be explored by quantitative urban researchers. Further, satellite data has become more abundant and openly accessible in the past decade, and offers new possibilities for data exploration through increasing spatial and temporal resolution. This, together with more computational power being available, allows to process these data in an efficient and meaningful way.

This notebook illustrates an easy-to-use analytical framework based on Python tools which enables batch download, image feature extraction, analysis and visualisation of high-resolution satellite imagery to capture long-term urban changes. Our purpose is to fill in the absence of a systematic and reproducible framework to acquire, process and analyse satellite imagery in urban built environment related to the field of Regional Science. The source of satellite data and administrative boundaries data are from NASA's Landsat satellite programme and ArcGIS Online. The Python libraries used in this notebook are the following:

  • Landsat images in Google Cloud Storage: The Google Cloud Storage is accessed using an API to download Landsat imagery (version used: 0.4.9)
  • Matplotolib:A Python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms.
  • Numpy: Adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions
  • Pandas: Provides high-performance, easy-to-use data structures and data analysis tools
  • GeoPandas: Python library that simplifies working with geospatial data (version used: 0.6.2)
  • Folium: Python library that enables plotting interactive maps using leaflet (verision used: 0.10.0)
  • Glob: Unix style pathname pattern expansion
  • GDAL: Library for geospatial data processing (version used: 2.4.4)
  • Landsat578: Simple Landsat imagery download tool
  • L8qa: Landsat processing toolbox (verision used: 0.1.1)
  • Rasterio: Library for raster data processing (verision used: 1.1.3)
  • Scikit-image: Collection of algorithms for image processing
  • Wget: Pure python download utility (version used: 3.2)
  • OpenCV: Library for image processing
  • scikit-learn: Machine learning in Python. Simple and efficient tools for data mining and data analysis.

We can import them al as follows:

In [75]:
%matplotlib inline

#load external libraries
import matplotlib.pyplot as plt
from matplotlib import colors
import pandas as pd
import numpy as np
import geopandas as gpd
import folium
import os, shutil
import glob
import gdal
import wget
from landsat import google_download
from google_download import GoogleDownload
from l8qa.qa import write_cloud_mask
import rasterio
import rasterio as rio
from rasterio import merge 
from rasterio.plot import show
from rasterio.mask import mask
from skimage import io,exposure, transform,data
from skimage.color import rgb2hsv, rgb2gray
from skimage.feature import local_binary_pattern
from sklearn.cluster import KMeans
import matplotlib.cm as cm
from sklearn import preprocessing
from rasterio.enums import Resampling
import seaborn as sns
import itertools

wdir= os.getcwd()

The remainder of this paper is structured as follows. The next section introduces the Landsat satellite imagery, study area Shanghai, and process on how to batch download and pre-process satellite data. Section 3 proposes our methods to extract different features including colour, texture, vegetation and built-up from imagery. Section 4 performs a clustering method on the extracted features, and section 5 interprets the results and gain insights from them. Finally, section 6 concludes by providing a summary of our work and avenues for further research using our proposed framework.

Data and Study Area

Landsat Imagery

We draw data from the NASA's Landsat satellite programme. It is the longest standing programme for Earth observation (EO) imagery (NASA, 2019). Landsat satellites have been orbiting the Earth for 46 years providing increasingly higher resolution imagery. Landsat Missions 1-3 offer coarse imagery of 80m covering the period from 1972 to 1983. Landsat Missions 4-5 provides images of 30m resolution covering the period from 1983 to 2013 and Landsat Missions 7-8 are currently collecting enhanced images at 15m capturing Cirrus and Panchromatic bands, in addition to the traditional RGB, Near-, Shortwave-Infrared, and Thermal bands. The Landsat 6 mission was unsuccessful due to the transporting rocket not reaching orbit. Landsat imagery is openly available and offers extensive temporal coverage stretching for 46 years. Table 1 provides a summary overview of the operation, revisit time and image resolution for the Landsat programme, with other Earth observation satellite missions being shown in table 2.

Table 1: Overview of Landsat missions, their revisit time and spatial resolution

Mission Operational time Revisit time Resolution
Landsat 1 1972-1978 18 d 80 m
Landsat 2 1975-1982 18 d 80 m
Landsat 3 1978-1983 18 d 80 m
Landsat 4 1983-1993 16 d 30 m
Landsat 5 1984-2013 16 d 30 m
Landsat 7 1999-present 16 d 15 m
Landsat 8 2013-present 16 d 15 m

Additional Earth observation programmes exist. These programmes also offer freely accessible imagery at a higher resolution.

Table 2: Overview of other Earth observation satellites, their revisit time and spatial resolution

Provider Programme Operational time Revisit time Resolution
European Space Agency Sentinel 2015-present 5 d 10m
Planet Labs Rapideye
2009-present 4/5 d to daily up to 0.8 m
NASA Orbview 3 2003-2007 < 3 d 1-4 m
NASA EO-1 2003 -2017 -- 10-30 m

Study Area

In this analysis, we examine urban changes in Shanghai, China. Shanghai has experienced rapid population growth. Between 2000 and 2010, Shanghai’s population rose by 7.4 million from 16.4 million to 23.8 million. It has an annual growth rate of 3.8 percent over 10 years. While the pace of population expansion has been less acute, Shanghai’s population has continued to grow. In 2018, an estimated 24.24 million people were living in Shanghai experiencing a population expansion of approximately 8 million since 2010. The city is therefore a well suited example to explore long-term changes in urbanisation.

To extract satellite imagery, a first step is to identify the shape of the geographical area of interest. To this end, we use a polygon shapefile (Shapefile source). These polygons represent the Shanghai metropolitan area, so they include the city centre and surrounding areas. These polygons will be used as a bounding box to identify and extract relevant satellite images. We need to ensure the shapefile is in the same coordinate reference system (CRS) as the satellite imagery (WGS84 or EPSG:4326).

In [2]:
# Specify the path to your shapefile
directory = os.path.dirname(wdir)
shp = 'shang_dis_merged/shang_dis_merged.shp'
In [3]:
# Certify that the shapefile is in the right coordinate system, otherwise reproject it into the right CRS
def shapefile_crs_check(file):
    global bbox
    bbox = gpd.read_file(file)
    crs = bbox.crs
    data = crs.get("init", "")
    if 'epsg:4326' in data:
        print('Shapefile in right CRS')
        bbox = bbox.to_crs({'init':'epsg:4326'})
    f,ax = plt.subplots(figsize=(5,5))
    plt.title('Fig.1: Shapefile of Shanghai urban area',y= -0.2)
In [4]:
Shapefile in right CRS

The world reference system (WRS) from NASA is a system to identify individual satellite imagery scenes using path-row tuples instead of absolute latitude/longitude coordinates. The latitudinal centre of the image corresponds to the row, the longitudinal centre to the path. This system allows to uniformly catalogue satellite data across multiple missions and provides an easy to use reference system for the end user. It is necessary to note that the WRS was changed between Landsat missions, due to a difference in swath patterns of the more recent Landsat satellites (NASA, 2019). The WRS1 is used for Landsat missions 1-3 and the WRS2 for Landsat missions 4,5,7,8. In order to obtain path-row tuples of relevant satellite images for an area of interest (AOI), it is necessary to intersect the WRS shapefile (either WRS1 or WRS2, depending on the Landsat satellite you would like to obtain data from) with the AOI shapefile. The resulting path-row tuples will later be used to locate and download the corresponding satellite images from the Google Cloud Storage. The output of the intersection between WRS and AOI files can be visualised using an interactive widget. The map below shows our area of interest in purple and the footprints of the relevant Landsat images on top of an OpenStreetMap basemap.

In [5]:
# Donwload the WRS 2 file to later intersect the shapefile with the WRS path/row tuples to identify 
# relevant Landsat scenes
def sat_path():
    url = 'https://prd-wret.s3.us-west-2.amazonaws.com/assets/palladium/production/s3fs-public/atoms/files/WRS2_descending_0.zip'
    # Create folder for WRS2 file
    if os.path.exists(os.path.join('Landsat_images','wrs2')):
        print('folder exists')

    WRS_PATH = os.path.join('Landsat_images','WRS2_descending_0.zip')
    LANDSAT_PATH = os.path.dirname(WRS_PATH)
    # The WRS file is only needed once thus we add this loop
    if os.path.exists(WRS_PATH):
        print('File already exists')
    # Downloads the WRS file from the URL given and unzips it
        wget.download(url, out = LANDSAT_PATH)
        shutil.unpack_archive(WRS_PATH, os.path.join(LANDSAT_PATH, 'wrs2'))
In [6]:
# WARNING: this will take time the first time it's executed
# depending on your connection
folder exists
File already exists
Wall time: 1e+03 µs
In [7]:
# Intersect the shapefile with the WRS2 shapefile to determine relevant path/row tuples
def get_pathrow():
    global paths,rows,path,row, wrs_intersection
    paths,rows=wrs_intersection['PATH'].values, wrs_intersection['ROW'].values
    for i, (path,row) in enumerate(zip(paths,rows)):
        print('Image', i+1, ' -path:', path, 'row:', row)
In [8]:
Image 1  -path: 118 row: 38
Image 2  -path: 119 row: 38
In [9]:
# Visualise the output of the intersection with the shapefile using Folium

# Get the centre of the map
xy = np.asarray(bbox.centroid[0].xy).squeeze()
center = list(xy[::-1])

# Select a zoom
zoom = 8

# Create the most basic OSM folium map
m = folium.Map(location = center, zoom_start = zoom, control_scale=True)

# Add the bounding box (bbox) GeoDataFrame in red using a lambda function
m.add_child(folium.GeoJson(bbox.__geo_interface__, name = 'Area of Interest', 
                            style_function = lambda x: {'color': 'purple', 'alpha': 0}))

loc = 'Fig 2.: Landsat satellite tiles that  cover the Area of Interest'
title_html = '''
         <figcaption align="center" style="font-size:12px"><b>{}</b></figcaption>

# Iterate through each polygon of paths and rows intersecting the area
for i, row in wrs_intersection.iterrows():
    # Create a string for the name containing the path and row of this Polygon
    name = 'path: %03d, row: %03d' % (row.PATH, row.ROW)
    # Create the folium geometry of this Polygon 
    g = folium.GeoJson(row.geometry.__geo_interface__, name=name)
    # Add a folium Popup object with the name string
    # Add the object to the map
Make this Notebook Trusted to load map: File -> Trust Notebook