mirror of
https://github.com/heyarne/earth-observation-for-journalism.git
synced 2026-05-06 19:13:40 +02:00
Change chapter file order and fix references
This commit is contained in:
parent
86096cabc6
commit
414e7745a1
12 changed files with 21 additions and 21 deletions
290
sources/02c-time-series-multiprocessing-benchmarks.ipynb
Normal file
290
sources/02c-time-series-multiprocessing-benchmarks.ipynb
Normal file
|
|
@ -0,0 +1,290 @@
|
|||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Multi-Processing Performance Benchmarks\n",
|
||||
"\n",
|
||||
"This section contains a performance comparison of single- and multiprocess-based calculation methods used in [](02c-time-series.ipynb).\n",
|
||||
"\n",
|
||||
"The `%%timeit` cell magic runs the cell content multiple times and prints statistics on those multiple runs. This reduces factors such as garbage collection pauses which influence the runtime performance and can be used to verify performance assumptions."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from multiprocessing import Pool, cpu_count\n",
|
||||
"from numpy import ma\n",
|
||||
"from pathlib import Path\n",
|
||||
"import rasterio as r"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Number of files: 30\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"test_files = list(Path('resources/tempelhofer_feld/ndvi').glob('*.tif'))\n",
|
||||
"print(f'Number of files: {len(test_files)}')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The performance is tested with the following function call:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def average(file_path):\n",
|
||||
" with r.open(file_path) as src:\n",
|
||||
" data = src.read(1, masked=True)\n",
|
||||
" return file_path, ma.average(data)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The function receives a string or `pathlib.Path`, reads the data using `rasterio`, and calculates the average value inside the data using the `ma.average` function provided by `numpy`.\n",
|
||||
"\n",
|
||||
"The benchmark is performed on 4 CPUs:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"4"
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"cpu_count()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Single File\n",
|
||||
"### In a Single Process"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"4.35 ms ± 24.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"%%timeit\n",
|
||||
"average(test_files[0])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### With Multiprocessing"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"187 ms ± 5.09 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"%%timeit\n",
|
||||
"with Pool() as pool:\n",
|
||||
" averages = [avg for avg in pool.map(average, test_files[:1])]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## All Files in Folder\n",
|
||||
"### In a Single Process"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"131 ms ± 366 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"%%timeit\n",
|
||||
"averages = list(map(average, test_files))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### With Multiprocessing"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"248 ms ± 10 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"%%timeit\n",
|
||||
"with Pool() as pool:\n",
|
||||
" averages = list(pool.map(average, test_files))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## All Files Multiple Times\n",
|
||||
"### In a Single Process"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"654 ms ± 952 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"%%timeit\n",
|
||||
"averages = list(map(average, test_files * 5))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### With Multiprocessing"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"434 ms ± 16.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"%%timeit\n",
|
||||
"with Pool() as pool:\n",
|
||||
" averages = list(pool.map(average, test_files * 5))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Result\n",
|
||||
"\n",
|
||||
"Multiprocessing comes with an overhead, which is significant enough to rule it out when processing a single element.\n",
|
||||
"When the list of elements is sufficiently large, there is a slight reduction in processing time, that, even with a higher standard deviation, manages to be faster than the single-process version.\n",
|
||||
"\n",
|
||||
"Averaging the masked array is an operation that can be implemented to scale in $O(N)$ with the size of the input array.\n",
|
||||
"The time reduction should be even higher for more complex tasks."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.8.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 4
|
||||
}
|
||||
Loading…
Add table
Add a link
Reference in a new issue