{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Multi-Threading Performance Benchmark\n", "\n", "This notebook contains a performance comparison of different methods to process the NDVI calculations.\n", "\n", "The `%%timeit` cell magic runs the cell content multiple times and outputs statistics on those multiple runs, thereby reducing factors such as garbage collection pauses etc." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from multiprocessing import Pool, cpu_count\n", "from numpy import ma\n", "from pathlib import Path\n", "import rasterio as r" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Number of files: 27\n" ] } ], "source": [ "test_files = list(Path('output/ndvi').glob('*.tif'))\n", "print(f'Number of files: {len(test_files)}')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The function we test with:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "def average(file_path):\n", " with r.open(file_path) as src:\n", " data = src.read(1, masked=True)\n", " return file_path, ma.average(data)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## In a single process\n", "### Time to process a single file" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "36.2 ms ± 42.6 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)\n" ] } ], "source": [ "%%timeit\n", "average(test_files[0])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Time to process all files" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "980 ms ± 7.38 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n" ] } ], "source": [ "%%timeit\n", "averages = [avg for avg in map(average, test_files)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Increasing the list size" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "4.86 s ± 10.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n" ] } ], "source": [ "%%timeit\n", "averages = [avg for avg in map(average, test_files * 5)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Time when using a worker pool\n", "\n", "Number of CPUs the multiprocessing pools can access:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "4" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cpu_count()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### On One element" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "277 ms ± 3.92 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n" ] } ], "source": [ "%%timeit\n", "with Pool() as pool:\n", " averages = [avg for avg in pool.map(average, test_files[:1])]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### On the complete list" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "630 ms ± 8.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n" ] } ], "source": [ "%%timeit\n", "with Pool() as pool:\n", " averages = [avg for avg in pool.map(average, test_files)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Increasing the list size" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2.1 s ± 20 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n" ] } ], "source": [ "%%timeit\n", "with Pool() as pool:\n", " averages = [avg for avg in pool.map(average, test_files * 5)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Result\n", "\n", "As we can see when processing a single element, multiprocessing comes with an overhead.\n", "When the list to be processed is sufficiently large, we get a reduction in processing time of roughly 30%-50%, depending on list size.\n", "\n", "Averaging the masked array is a fairly simple operation that scales in $O(N)$ with the size of the input array.\n", "The time reduction should be even higher for more complex tasks." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.6" } }, "nbformat": 4, "nbformat_minor": 4 }