Add logo, add intro to chapter one, add note about hardware requirements

This commit is contained in:
heyarne 2021-03-01 14:00:35 +00:00
commit 4eb4079bfd
50 changed files with 37 additions and 28 deletions

View file

@ -0,0 +1,288 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Multi-Threading Comparison\n",
"\n",
"This notebook contains a performance comparison of different methods to process the NDVI calculations.\n",
"\n",
"The `%%timeit` cell magic runs the cell content multiple times and outputs statistics on those multiple runs, thereby reducing factors such as garbage collection pauses etc."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"from multiprocessing import Pool, cpu_count\n",
"from numpy import ma\n",
"from pathlib import Path\n",
"import rasterio as r"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Number of files: 27\n"
]
}
],
"source": [
"test_files = list(Path('output/ndvi').glob('*.tif'))\n",
"print(f'Number of files: {len(test_files)}')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The function we test with:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"def average(file_path):\n",
" with r.open(file_path) as src:\n",
" data = src.read(1, masked=True)\n",
" return file_path, ma.average(data)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## In a single process\n",
"### Time to process a single file"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"36.2 ms ± 42.6 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)\n"
]
}
],
"source": [
"%%timeit\n",
"average(test_files[0])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Time to process all files"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"980 ms ± 7.38 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
]
}
],
"source": [
"%%timeit\n",
"averages = [avg for avg in map(average, test_files)]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Increasing the list size"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"4.86 s ± 10.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
]
}
],
"source": [
"%%timeit\n",
"averages = [avg for avg in map(average, test_files * 5)]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Time when using a worker pool\n",
"\n",
"Number of CPUs the multiprocessing pools can access:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"4"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cpu_count()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### On One element"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"277 ms ± 3.92 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
]
}
],
"source": [
"%%timeit\n",
"with Pool() as pool:\n",
" averages = [avg for avg in pool.map(average, test_files[:1])]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### On the complete list"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"630 ms ± 8.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
]
}
],
"source": [
"%%timeit\n",
"with Pool() as pool:\n",
" averages = [avg for avg in pool.map(average, test_files)]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Increasing the list size"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"2.1 s ± 20 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
]
}
],
"source": [
"%%timeit\n",
"with Pool() as pool:\n",
" averages = [avg for avg in pool.map(average, test_files * 5)]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Result\n",
"\n",
"As we can see when processing a single element, multiprocessing comes with an overhead.\n",
"When the list to be processed is sufficiently large, we get a reduction in processing time of roughly 30%-50%, depending on list size.\n",
"\n",
"Averaging the masked array is a fairly simple operation that scales in $O(N)$ with the size of the input array.\n",
"The time reduction should be even higher for more complex tasks."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.6"
}
},
"nbformat": 4,
"nbformat_minor": 4
}