diff --git a/README.md b/README.md index 020ae8f..eb79200 100644 --- a/README.md +++ b/README.md @@ -20,8 +20,8 @@ Many considerations behind a particular action or processing step can only be br ## Obtaining and Running the Code -The notebooks are published for reading at https://arne.schlueter.is/working-on/remote-sensing-for-journalism. -The source code lives at https://github.com/heyarne/remote-sensing-for-journalism. +The notebooks are published for reading at https://arne.schlueter.is/working-on/earth-observation-for-journalism. +The source code lives at https://github.com/heyarne/earth-observation-for-journalism. A `Dockerfile` is present at the root of the repository to help with reproducing the computing environment. The image can be built by running the following command from the project root: diff --git a/sources/01a-download-process.ipynb b/sources/01a-download-process.ipynb index a62c4d1..799ea47 100644 --- a/sources/01a-download-process.ipynb +++ b/sources/01a-download-process.ipynb @@ -39,7 +39,8 @@ "## Defining the Region of Interest\n", "\n", "The Copernicus Open Access Hub API expects a point or area that designates the region of interest.\n", - "We use the [OpenStreetMap Nominatim API](https://nominatim.org/) to query for the administrative boundaries of Berlin using the `search_osm` function defined in `sentinel_helpers.py`:" + "The [OpenStreetMap Nominatim API](https://nominatim.org/) provides a HTTP-based interface to the OpenStreetMap data set, which can be queried for the administrative boundaries of Berlin.", + "The `search_osm` function defined in `sentinel_helpers.py` allows retrieving these geometries by location name:" ] }, { @@ -241,13 +242,12 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "OpenStreetMap contains geoinformation at vastly different scales and of very different types.\n", - "We can use it to query outlines of parks or entire countries.\n", + "OpenStreetMap contains geoinformation at vastly different scales and of very different types, which range from outlines of parks to entire countries.\n", "The [OpenStreetMap wiki](https://wiki.openstreetmap.org/wiki/Main_Page) contains exhaustive information about the architectural design of OpenStreetMap.\n", - "Using the `search_osm` function we are very flexible in the type of query information we can retrieve.\n", + "This empowers the `search_osm` function to retrieve geoinformation on places that would otherwise be widely scattered across a myriad of sources, if available at all.\n", "\n", "The first result is the city's centroid.\n", - "We use the `type` to select the administrative boundaries." + "The property listed in the `type` column can be used to select the administrative boundaries." ] }, { @@ -424,7 +424,7 @@ "metadata": {}, "source": [ "The region of interest is given as the `footprint` parameter.\n", - "We use a simplified version of the geometry retrieved from OpenStreetMap - its convex hull - due to restrictions in URL lengths that don't allow us to query for arbitrarily detailed geometries:" + "The geometry retrieved from OpenStreetMap is simplified by calculating convex hull due to restrictions in URL lengths that don't allow querying for arbitrarily detailed geometries:" ] }, { @@ -483,7 +483,7 @@ "The criteria for selecting a product depends on the specific use case.\n", "The first use case is to plot an image of Berlin, so want to make sure that as much of the city as possible is visible in the data we download.\n", "\n", - "We convert the list of products to a `GeoDataFrame`, for which `gdf` is an acronym:" + "The list of products to a `GeoDataFrame` (`gdf`):" ] }, { @@ -508,9 +508,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "`plot_downloaded_products` is a helper that allows us to draw downloaded products along with a designated area of interest.\n", + "`plot_downloaded_products` allows plotting the tile geometry of downloaded or available products over with a designated area of interest.\n", "\n", - "We can use it to get a quick visual impression of the result:" + "This provides quick visual impression of the result:" ] }, { @@ -550,8 +550,8 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Because products are are large (roughly 1GB), we want to skip unnecessary downloads wherever possible.\n", - "We are selecting products based on two criteria:\n", + "Because products are large (roughly 1GB), it is desirable to skip unnecessary downloads whenever possible.\n", + "Products are therefore filtered by two criteria:\n", "\n", "- How much of the area of interest is visible in the product (i.e. area of the intersection of a product's tile and our area of interest)\n", "- Cloud coverage (the less the better)" @@ -962,7 +962,7 @@ "metadata": {}, "source": [ "`downloads` contains a dictionary, mapping each product's UUID to detailed information about the downloaded product.\n", - "We can use it to calculate the total download size:" + "This information can be used to calculate the total download size:" ] }, { diff --git a/sources/01b-visualization.ipynb b/sources/01b-visualization.ipynb index 70594a0..f778b9d 100644 --- a/sources/01b-visualization.ipynb +++ b/sources/01b-visualization.ipynb @@ -6,13 +6,13 @@ "source": [ "# Visualization\n", "\n", - "This notebook show how to access the content of the products downloaded in [](01a-download-process.ipynb) and plot a true-color rendering. \n", + "This notebook shows how to access the content of the products downloaded in [](01a-download-process.ipynb) and plot a true-color rendering. \n", "While the products already contain a True-Color Image (TCI), this approach is useful for two reasons:\n", "\n", "1. It allows comparing the readings with a rendering provided by official sources, thereby allowing us to find errors\n", "2. Generating a custom True-Color Image can be useful for further image manipulations, changing contrast or changing out single bands for others to highlight specific phenomena.\n", "\n", - "We start by reading the shape of Berlin previously downloaded from OpenStreetMap:" + "First the shape of Berlin is created from data previously downloaded from OpenStreetMap:" ] }, { @@ -61,7 +61,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "We know the product with the lowest cloud cover percentage from the previous notebook." + "The information which product contains the least amount of clouds is given in the previous notebook." ] }, { @@ -92,7 +92,7 @@ "source": [ "The product path contains a lot of information:\n", "\n", - "- `S2B` shows that the downloaded products was captured by the Sentinel-2 satellite B. At the moment there are two satellites in the mission, A and B.\n", + "- `S2B` shows that the downloaded product was captured by the Sentinel-2 satellite B. At the moment there are two satellites in the mission, A and B.\n", "- `MSI` stands for Multi Spectral Instrument.\n", "- `L2A` is the processing level. Level 2A is the highest processing level and lower processing levels may need further processing to be useful.\n", "- The first timestamp, `20200602T100559`, is the date at which the data was captured.\n", @@ -175,7 +175,7 @@ "source": [ "Using the compressed zip-file, while slightly inconvenient, makes sense because it allows saving disk space and allows us to avoid the extra step of decompressing every single downloaded product.\n", "\n", - "There is a pre-rendered True-Color Image (\"TCI\") that we can use to get a quick plot of the contents:" + "There is a pre-rendered True-Color Image (\"TCI\") that can be plotted for an impression of the product's contents:" ] }, { @@ -203,7 +203,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Using the `rasterio` library we can open this image and render its contents:" + "The `rasterio` library is used to open this image and render its contents:" ] }, { @@ -238,7 +238,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "You will see this pattern repeatedly:\n", + "This pattern shows repeatedly across many notebooks:\n", "\n", "``` python\n", "with r.open(...) as src:\n", @@ -252,7 +252,7 @@ "While for many use cases using the TCI can be enough, knowing how to compose True-Color Images provides additional merit as explained above.\n", "\n", "The blue, green, and red parts of the spectrum are represented in the raster files for the bands 2, 3 and 4 respectively\n", - "`sentinel_helpers.py` contains a helper that wraps `scihub_band_paths` to retrieve those bands in a resolution of our choice:" + "`sentinel_helpers.py` contains a function wrapping `scihub_band_paths` to retrieve those bands in a resolution of choice:" ] }, { @@ -313,7 +313,7 @@ "source": [ "### Full Range Plot\n", "\n", - "We continue with a plot of the combination of these bands:" + "Next, a plot of the combination of these bands is plotted:" ] }, { @@ -510,7 +510,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "We can compare the histograms of `included_tci` and the `normalized_rgb` array: " + "A comparison of the histograms of `included_tci` and the `normalized_rgb` array offers more details:" ] }, { @@ -555,7 +555,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "We can see each the red, green and blue band peaking higher in the prerendered TCI around a value of 50 - the curves match closely. Most of the pixels are using the designated nodata-value 0, which is the black stripe in the top left corner of the image.\n", + "Each the red, green and blue band have higher peaks in the prerendered TCI at value of around 50 - the curves match closely. Most of the pixels are using the designated nodata-value 0, which is the black stripe in the top left corner of the image.\n", "\n", "Because the purpose of this visualization is not creating a one-to-one replica of the included TCI but rather demonstrate how to interpret and manipulate the raster file contents, the approximation is sufficient.\n", "\n", @@ -563,7 +563,7 @@ "\n", "It is rare to plot the entire product because the data in this product can be partially missing depending on the orbit position (see [](01c-coverage-analysis.ipynb) for more information).\n", "\n", - "We can create a rectangular cutout of the created image using code provided in the `rasterio` library for its `rio` command line tool. The code uses a data structure called `Window`, which is a rectangle with an x- and y-offset that is provided by `rasterio` to partially read or write raster data.\n", + "The created image can be cropped using code provided in the `rasterio` library for its `rio` command line tool. This requires constructing a `Window`, which is a rectangle with an x- and y-offset that is provided by `rasterio` to partially read or write raster data.\n", "\n", "The position of the `Window` is calculated by transforming the area of interest `berlin` into the Coordinate Reference System that is used by `src` and then calculating the intersection:" ] diff --git a/sources/01c-coverage-analysis.ipynb b/sources/01c-coverage-analysis.ipynb index 6a11ef9..d24a8c3 100644 --- a/sources/01c-coverage-analysis.ipynb +++ b/sources/01c-coverage-analysis.ipynb @@ -98,7 +98,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "All available products are plotted to verify if there are enough products to cover our area of interest without holes.\n", + "All available products are plotted to verify that there are enough products to cover the area of interest without holes.\n", "As the revisit frequency given by the Sentinel-2 mission is much lower than four weeks this should not be a problem." ] }, @@ -264,7 +264,7 @@ "Adjacent rectangles are not perfectly parallel.\n", "This is because of choices that have to be made when projecting from the earths spherical surface to a rectangular plane.\n", "The UTM grid is constructed so that coordinates within each tiling represent metrical distances on the earths surface.\n", - "This has the nice property that in order to areas and surfaces within a UTM tiling can be calculated simply by counting.\n", + "This has the nice property that lengths and areas within a UTM tiling can be calculated simply by counting pixels or using Euclidian distance arithmetic.[^in_contrast_to_lat_lon]\n", "\n", "(content:orbits)=\n", "## Product Shape and Orbit Number\n", @@ -272,8 +272,9 @@ "As mentioned above, each square is a single product that can be downloaded from the Copernicus Open Access Hub.\n", "The visualizations above and the true color rendering in [](01b-visualization.ipynb) shows that these products are not often not perfect squares, but that they have missing slices.\n", "\n", - "This is because of the satellite orbit at the time of capturing the data.\n", - "To visualize this we plot the available products per orbit:" + "This is because of path along which the satellite orbits the Earth, which can be shown by plotting the available products per orbit number:\n", + "\n", + "[^in_contrast_to_lat_lon]: This is in contrast to Coordinate Reference Systems which use Latitude and Longitude, such as the widely used WGS84, which does not express coordinates on a plane, and has to rely on more complex eliptical distance calculations." ] }, { @@ -398,9 +399,9 @@ "source": [ "## Ensuring Complete Coverage\n", "\n", - "If we do not want to wait an entire repeat cycle, what is the minimum time span in those four weeks to ensure a coverage of all of Brandenburg?\n", + "When not waiting for an entire repeat cycle to complete, what is the minimum time span in those four weeks to ensure complete data for Brandenburg?\n", "\n", - "To find out we iterate through the returned products, for each iteration $i$ unifying the associated product's geometry $P_i$ with all products we already iterated through:\n", + "To answer this question, the returned products are iterated through. For each iteration $i = I$ the associated product's geometry $P_{i=I}$ is unified with all products $P_{i