Finish section 02

This commit is contained in:
heyarne 2021-03-08 14:50:04 +00:00
commit 0cb8da0286
5 changed files with 230 additions and 287 deletions

View file

@ -4,11 +4,11 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Multi-Threading Performance Benchmark\n",
"# Multi-Processing Performance Benchmark\n",
"\n",
"This notebook contains a performance comparison of different methods to process the NDVI calculations.\n",
"This section contains a performance comparison of single- and multiprocess-based calculation methods used in [](02b Timeseries.ipynb).\n",
"\n",
"The `%%timeit` cell magic runs the cell content multiple times and outputs statistics on those multiple runs, thereby reducing factors such as garbage collection pauses etc."
"The `%%timeit` cell magic runs the cell content multiple times and prints statistics on those multiple runs. This reduces factors such as garbage collection pauses which influence the runtime performance and can be used to verify performance assumptions."
]
},
{
@ -32,12 +32,12 @@
"name": "stdout",
"output_type": "stream",
"text": [
"Number of files: 27\n"
"Number of files: 30\n"
]
}
],
"source": [
"test_files = list(Path('output/ndvi').glob('*.tif'))\n",
"test_files = list(Path('resources/tempelhofer_feld/ndvi').glob('*.tif'))\n",
"print(f'Number of files: {len(test_files)}')"
]
},
@ -45,7 +45,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The function we test with:"
"The performance is tested with the following function call:"
]
},
{
@ -64,91 +64,15 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## In a single process\n",
"### Time to process a single file"
"The function receives a string or `pathlib.Path`, reads the data using `rasterio`, and calculates the average value inside the data using the `ma.average` function provided by `numpy`.\n",
"\n",
"The benchmark is performed on 4 CPUs:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"36.2 ms ± 42.6 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)\n"
]
}
],
"source": [
"%%timeit\n",
"average(test_files[0])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Time to process all files"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"980 ms ± 7.38 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
]
}
],
"source": [
"%%timeit\n",
"averages = [avg for avg in map(average, test_files)]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Increasing the list size"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"4.86 s ± 10.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
]
}
],
"source": [
"%%timeit\n",
"averages = [avg for avg in map(average, test_files * 5)]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Time when using a worker pool\n",
"\n",
"Number of CPUs the multiprocessing pools can access:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
@ -156,7 +80,7 @@
"4"
]
},
"execution_count": 7,
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
@ -169,19 +93,45 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### On One element"
"## Single File\n",
"### In a Single Process"
]
},
{
"cell_type": "code",
"execution_count": 8,
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"277 ms ± 3.92 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
"4.35 ms ± 24.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n"
]
}
],
"source": [
"%%timeit\n",
"average(test_files[0])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Multiprocessing"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"187 ms ± 5.09 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
]
}
],
@ -195,52 +145,104 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### On the complete list"
"## All Files in Folder\n",
"### In a Single Process"
]
},
{
"cell_type": "code",
"execution_count": 9,
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"630 ms ± 8.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
"131 ms ± 366 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)\n"
]
}
],
"source": [
"%%timeit\n",
"with Pool() as pool:\n",
" averages = [avg for avg in pool.map(average, test_files)]"
"averages = list(map(average, test_files))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Increasing the list size"
"### With Multiprocessing"
]
},
{
"cell_type": "code",
"execution_count": 10,
"execution_count": 12,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"2.1 s ± 20 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
"248 ms ± 10 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
]
}
],
"source": [
"%%timeit\n",
"with Pool() as pool:\n",
" averages = [avg for avg in pool.map(average, test_files * 5)]"
" averages = list(pool.map(average, test_files))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## All Files Multiple Times\n",
"### In a Single Process"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"654 ms ± 952 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
]
}
],
"source": [
"%%timeit\n",
"averages = list(map(average, test_files * 5))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### With Multiprocessing"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"434 ms ± 16.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
]
}
],
"source": [
"%%timeit\n",
"with Pool() as pool:\n",
" averages = list(pool.map(average, test_files * 5))"
]
},
{
@ -250,9 +252,9 @@
"## Result\n",
"\n",
"As we can see when processing a single element, multiprocessing comes with an overhead.\n",
"When the list to be processed is sufficiently large, we get a reduction in processing time of roughly 30%-50%, depending on list size.\n",
"When the list to be processed is sufficiently large, we get a slight reduction in processing time, that, even with a higher standard deviation, manages to be faster than the single-process version.\n",
"\n",
"Averaging the masked array is a fairly simple operation that scales in $O(N)$ with the size of the input array.\n",
"Averaging the masked array is an operation that can be implemented to scale in $O(N)$ with the size of the input array.\n",
"The time reduction should be even higher for more complex tasks."
]
},