mirror of
https://github.com/heyarne/earth-observation-for-journalism.git
synced 2026-05-06 11:03:40 +02:00
288 lines
5.8 KiB
Text
288 lines
5.8 KiB
Text
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Multi-Threading Comparison\n",
|
|
"\n",
|
|
"This notebook contains a performance comparison of different methods to process the NDVI calculations.\n",
|
|
"\n",
|
|
"The `%%timeit` cell magic runs the cell content multiple times and outputs statistics on those multiple runs, thereby reducing factors such as garbage collection pauses etc."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 1,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from multiprocessing import Pool, cpu_count\n",
|
|
"from numpy import ma\n",
|
|
"from pathlib import Path\n",
|
|
"import rasterio as r"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 2,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"Number of files: 27\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"test_files = list(Path('output/ndvi').glob('*.tif'))\n",
|
|
"print(f'Number of files: {len(test_files)}')"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"The function we test with:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 3,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"def average(file_path):\n",
|
|
" with r.open(file_path) as src:\n",
|
|
" data = src.read(1, masked=True)\n",
|
|
" return file_path, ma.average(data)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## In a single process\n",
|
|
"### Time to process a single file"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 4,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"36.2 ms ± 42.6 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"%%timeit\n",
|
|
"average(test_files[0])"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Time to process all files"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 5,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"980 ms ± 7.38 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"%%timeit\n",
|
|
"averages = [avg for avg in map(average, test_files)]"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Increasing the list size"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 6,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"4.86 s ± 10.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"%%timeit\n",
|
|
"averages = [avg for avg in map(average, test_files * 5)]"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Time when using a worker pool\n",
|
|
"\n",
|
|
"Number of CPUs the multiprocessing pools can access:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 7,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"4"
|
|
]
|
|
},
|
|
"execution_count": 7,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"cpu_count()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### On One element"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 8,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"277 ms ± 3.92 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"%%timeit\n",
|
|
"with Pool() as pool:\n",
|
|
" averages = [avg for avg in pool.map(average, test_files[:1])]"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### On the complete list"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 9,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"630 ms ± 8.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"%%timeit\n",
|
|
"with Pool() as pool:\n",
|
|
" averages = [avg for avg in pool.map(average, test_files)]"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Increasing the list size"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 10,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"2.1 s ± 20 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"%%timeit\n",
|
|
"with Pool() as pool:\n",
|
|
" averages = [avg for avg in pool.map(average, test_files * 5)]"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Result\n",
|
|
"\n",
|
|
"As we can see when processing a single element, multiprocessing comes with an overhead.\n",
|
|
"When the list to be processed is sufficiently large, we get a reduction in processing time of roughly 30%-50%, depending on list size.\n",
|
|
"\n",
|
|
"Averaging the masked array is a fairly simple operation that scales in $O(N)$ with the size of the input array.\n",
|
|
"The time reduction should be even higher for more complex tasks."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": []
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python 3",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.8.6"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 4
|
|
}
|