Incorporate Josh's feedback

This commit is contained in:
heyarne 2021-03-13 15:36:14 +01:00
commit f5e22013c4
4 changed files with 40 additions and 39 deletions

View file

@ -39,7 +39,8 @@
"## Defining the Region of Interest\n",
"\n",
"The Copernicus Open Access Hub API expects a point or area that designates the region of interest.\n",
"We use the [OpenStreetMap Nominatim API](https://nominatim.org/) to query for the administrative boundaries of Berlin using the `search_osm` function defined in `sentinel_helpers.py`:"
"The [OpenStreetMap Nominatim API](https://nominatim.org/) provides a HTTP-based interface to the OpenStreetMap data set, which can be queried for the administrative boundaries of Berlin.",
"The `search_osm` function defined in `sentinel_helpers.py` allows retrieving these geometries by location name:"
]
},
{
@ -241,13 +242,12 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"OpenStreetMap contains geoinformation at vastly different scales and of very different types.\n",
"We can use it to query outlines of parks or entire countries.\n",
"OpenStreetMap contains geoinformation at vastly different scales and of very different types, which range from outlines of parks to entire countries.\n",
"The [OpenStreetMap wiki](https://wiki.openstreetmap.org/wiki/Main_Page) contains exhaustive information about the architectural design of OpenStreetMap.\n",
"Using the `search_osm` function we are very flexible in the type of query information we can retrieve.\n",
"This empowers the `search_osm` function to retrieve geoinformation on places that would otherwise be widely scattered across a myriad of sources, if available at all.\n",
"\n",
"The first result is the city's centroid.\n",
"We use the `type` to select the administrative boundaries."
"The property listed in the `type` column can be used to select the administrative boundaries."
]
},
{
@ -424,7 +424,7 @@
"metadata": {},
"source": [
"The region of interest is given as the `footprint` parameter.\n",
"We use a simplified version of the geometry retrieved from OpenStreetMap - its convex hull - due to restrictions in URL lengths that don't allow us to query for arbitrarily detailed geometries:"
"The geometry retrieved from OpenStreetMap is simplified by calculating convex hull due to restrictions in URL lengths that don't allow querying for arbitrarily detailed geometries:"
]
},
{
@ -483,7 +483,7 @@
"The criteria for selecting a product depends on the specific use case.\n",
"The first use case is to plot an image of Berlin, so want to make sure that as much of the city as possible is visible in the data we download.\n",
"\n",
"We convert the list of products to a `GeoDataFrame`, for which `gdf` is an acronym:"
"The list of products to a `GeoDataFrame` (`gdf`):"
]
},
{
@ -508,9 +508,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"`plot_downloaded_products` is a helper that allows us to draw downloaded products along with a designated area of interest.\n",
"`plot_downloaded_products` allows plotting the tile geometry of downloaded or available products over with a designated area of interest.\n",
"\n",
"We can use it to get a quick visual impression of the result:"
"This provides quick visual impression of the result:"
]
},
{
@ -550,8 +550,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Because products are are large (roughly 1GB), we want to skip unnecessary downloads wherever possible.\n",
"We are selecting products based on two criteria:\n",
"Because products are large (roughly 1GB), it is desirable to skip unnecessary downloads whenever possible.\n",
"Products are therefore filtered by two criteria:\n",
"\n",
"- How much of the area of interest is visible in the product (i.e. area of the intersection of a product's tile and our area of interest)\n",
"- Cloud coverage (the less the better)"
@ -962,7 +962,7 @@
"metadata": {},
"source": [
"`downloads` contains a dictionary, mapping each product's UUID to detailed information about the downloaded product.\n",
"We can use it to calculate the total download size:"
"This information can be used to calculate the total download size:"
]
},
{

View file

@ -6,13 +6,13 @@
"source": [
"# Visualization\n",
"\n",
"This notebook show how to access the content of the products downloaded in [](01a-download-process.ipynb) and plot a true-color rendering. \n",
"This notebook shows how to access the content of the products downloaded in [](01a-download-process.ipynb) and plot a true-color rendering. \n",
"While the products already contain a True-Color Image (TCI), this approach is useful for two reasons:\n",
"\n",
"1. It allows comparing the readings with a rendering provided by official sources, thereby allowing us to find errors\n",
"2. Generating a custom True-Color Image can be useful for further image manipulations, changing contrast or changing out single bands for others to highlight specific phenomena.\n",
"\n",
"We start by reading the shape of Berlin previously downloaded from OpenStreetMap:"
"First the shape of Berlin is created from data previously downloaded from OpenStreetMap:"
]
},
{
@ -61,7 +61,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"We know the product with the lowest cloud cover percentage from the previous notebook."
"The information which product contains the least amount of clouds is given in the previous notebook."
]
},
{
@ -92,7 +92,7 @@
"source": [
"The product path contains a lot of information:\n",
"\n",
"- `S2B` shows that the downloaded products was captured by the Sentinel-2 satellite B. At the moment there are two satellites in the mission, A and B.\n",
"- `S2B` shows that the downloaded product was captured by the Sentinel-2 satellite B. At the moment there are two satellites in the mission, A and B.\n",
"- `MSI` stands for Multi Spectral Instrument.\n",
"- `L2A` is the processing level. Level 2A is the highest processing level and lower processing levels may need further processing to be useful.\n",
"- The first timestamp, `20200602T100559`, is the date at which the data was captured.\n",
@ -175,7 +175,7 @@
"source": [
"Using the compressed zip-file, while slightly inconvenient, makes sense because it allows saving disk space and allows us to avoid the extra step of decompressing every single downloaded product.\n",
"\n",
"There is a pre-rendered True-Color Image (\"TCI\") that we can use to get a quick plot of the contents:"
"There is a pre-rendered True-Color Image (\"TCI\") that can be plotted for an impression of the product's contents:"
]
},
{
@ -203,7 +203,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Using the `rasterio` library we can open this image and render its contents:"
"The `rasterio` library is used to open this image and render its contents:"
]
},
{
@ -238,7 +238,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"You will see this pattern repeatedly:\n",
"This pattern shows repeatedly across many notebooks:\n",
"\n",
"``` python\n",
"with r.open(...) as src:\n",
@ -252,7 +252,7 @@
"While for many use cases using the TCI can be enough, knowing how to compose True-Color Images provides additional merit as explained above.\n",
"\n",
"The blue, green, and red parts of the spectrum are represented in the raster files for the bands 2, 3 and 4 respectively\n",
"`sentinel_helpers.py` contains a helper that wraps `scihub_band_paths` to retrieve those bands in a resolution of our choice:"
"`sentinel_helpers.py` contains a function wrapping `scihub_band_paths` to retrieve those bands in a resolution of choice:"
]
},
{
@ -313,7 +313,7 @@
"source": [
"### Full Range Plot\n",
"\n",
"We continue with a plot of the combination of these bands:"
"Next, a plot of the combination of these bands is plotted:"
]
},
{
@ -510,7 +510,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"We can compare the histograms of `included_tci` and the `normalized_rgb` array: "
"A comparison of the histograms of `included_tci` and the `normalized_rgb` array offers more details:"
]
},
{
@ -555,7 +555,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"We can see each the red, green and blue band peaking higher in the prerendered TCI around a value of 50 - the curves match closely. Most of the pixels are using the designated nodata-value 0, which is the black stripe in the top left corner of the image.\n",
"Each the red, green and blue band have higher peaks in the prerendered TCI at value of around 50 - the curves match closely. Most of the pixels are using the designated nodata-value 0, which is the black stripe in the top left corner of the image.\n",
"\n",
"Because the purpose of this visualization is not creating a one-to-one replica of the included TCI but rather demonstrate how to interpret and manipulate the raster file contents, the approximation is sufficient.\n",
"\n",
@ -563,7 +563,7 @@
"\n",
"It is rare to plot the entire product because the data in this product can be partially missing depending on the orbit position (see [](01c-coverage-analysis.ipynb) for more information).\n",
"\n",
"We can create a rectangular cutout of the created image using code provided in the `rasterio` library for its `rio` command line tool. The code uses a data structure called `Window`, which is a rectangle with an x- and y-offset that is provided by `rasterio` to partially read or write raster data.\n",
"The created image can be cropped using code provided in the `rasterio` library for its `rio` command line tool. This requires constructing a `Window`, which is a rectangle with an x- and y-offset that is provided by `rasterio` to partially read or write raster data.\n",
"\n",
"The position of the `Window` is calculated by transforming the area of interest `berlin` into the Coordinate Reference System that is used by `src` and then calculating the intersection:"
]

View file

@ -98,7 +98,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"All available products are plotted to verify if there are enough products to cover our area of interest without holes.\n",
"All available products are plotted to verify that there are enough products to cover the area of interest without holes.\n",
"As the revisit frequency given by the Sentinel-2 mission is much lower than four weeks this should not be a problem."
]
},
@ -264,7 +264,7 @@
"Adjacent rectangles are not perfectly parallel.\n",
"This is because of choices that have to be made when projecting from the earths spherical surface to a rectangular plane.\n",
"The UTM grid is constructed so that coordinates within each tiling represent metrical distances on the earths surface.\n",
"This has the nice property that in order to areas and surfaces within a UTM tiling can be calculated simply by counting.\n",
"This has the nice property that lengths and areas within a UTM tiling can be calculated simply by counting pixels or using Euclidian distance arithmetic.[^in_contrast_to_lat_lon]\n",
"\n",
"(content:orbits)=\n",
"## Product Shape and Orbit Number\n",
@ -272,8 +272,9 @@
"As mentioned above, each square is a single product that can be downloaded from the Copernicus Open Access Hub.\n",
"The visualizations above and the true color rendering in [](01b-visualization.ipynb) shows that these products are not often not perfect squares, but that they have missing slices.\n",
"\n",
"This is because of the satellite orbit at the time of capturing the data.\n",
"To visualize this we plot the available products per orbit:"
"This is because of path along which the satellite orbits the Earth, which can be shown by plotting the available products per orbit number:\n",
"\n",
"[^in_contrast_to_lat_lon]: This is in contrast to Coordinate Reference Systems which use Latitude and Longitude, such as the widely used WGS84, which does not express coordinates on a plane, and has to rely on more complex eliptical distance calculations."
]
},
{
@ -398,9 +399,9 @@
"source": [
"## Ensuring Complete Coverage\n",
"\n",
"If we do not want to wait an entire repeat cycle, what is the minimum time span in those four weeks to ensure a coverage of all of Brandenburg?\n",
"When not waiting for an entire repeat cycle to complete, what is the minimum time span in those four weeks to ensure complete data for Brandenburg?\n",
"\n",
"To find out we iterate through the returned products, for each iteration $i$ unifying the associated product's geometry $P_i$ with all products we already iterated through:\n",
"To answer this question, the returned products are iterated through. For each iteration $i = I$ the associated product's geometry $P_{i=I}$ is unified with all products $P_{i<I}$:\n",
"\n",
"\\begin{align*}\n",
" P &= \\{P_1, P_2, \\cdots, P_n\\} \\\\\n",
@ -523,7 +524,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"We re-use the `plot_downloaded_products` to get a visual impression of the area of the union of tiles just created:"
"The function `plot_downloaded_products` is reused to get a visual impression of the area of the union of tiles just created:"
]
},
{
@ -563,7 +564,7 @@
"metadata": {},
"source": [
"The algorithm worked, the entire area is comfortably covered.\n",
"We can calculate the time span over which the subselection was captured:"
"The time span over which the subselection was captured is 3 days:"
]
},
{
@ -591,7 +592,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Compare this to the repeat cycle observed in [](content:orbits):"
"This is a lot lower than the repeat cycle observed in [](content:orbits):"
]
},
{
@ -620,10 +621,10 @@
"source": [
"### Cloud Coverage\n",
"\n",
"For the union above we did not consider cloud coverage at all.\n",
"The union calculation above did not consider cloud coverage at all.\n",
"This means that a lot of pixels in the large area of interest might not have interesting data for us:\n",
"\n",
"A plot of the cloud coverage can give us an estimate of how useful the combined image would be without needing to plot it visually as described in [](01b-visualization.ipynb):"
"A plot of the cloud coverage can give an estimate of how useful the combined image would be without needing to plot it visually as described in [](01b-visualization.ipynb):"
]
},
{
@ -655,7 +656,7 @@
"The histogram shows that some of the products consist mostly of cloudy pixels - in fact, the relationship between cloudy and non-cloudy pixels is almost symmetrical.\n",
"A cloud coverage of 100% is not of much use for us because it amounts to an image that does not contain any of the surface features we are interested in.\n",
"\n",
"To include only less cloudy products, a compromise on up-to-dateness hast to be made.\n",
"To include only less cloudy products, a compromise on up-to-dateness has to be made.\n",
"Setting a maximum cloud coverage of 50% increases the time delta to 15 days:"
]
},
@ -778,7 +779,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Downloads are expensive because each product to be downloaded has a size of approximately 1GB. We try to reduce the amount of products we need to download by dropping identical geometries, keeping the one with the smallest cloud cover:"
"Downloads are expensive because each product to be downloaded has a size of approximately 1GB. The amount of products to download is reduced by dropping identical geometries, keeping the one with the smallest cloud cover:"
]
},
{