cardinal_pythonlib.parallel


Original code copyright (C) 2009-2022 Rudolf Cardinal (rudolf@pobox.com).

This file is part of cardinal_pythonlib.

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.


Parallel processing assistance functions.

cardinal_pythonlib.parallel.gen_parallel_results_efficiently(fn: Callable, *iterables: Iterable, max_workers: int | None = None, threaded: bool = False, verbose: bool = False, loglevel: int = 20) Iterable[source]

Memory-efficient way of using concurrent.futures.ProcessPoolExecutor, as per https://alexwlchan.net/2019/10/adventures-with-concurrent-futures/. The problem is that the normal method via e.g. ProcessPoolExecutor.map() creates large numbers of Future objects and runs out of memory; it doesn’t scale to large (or infinite) inputs.

Implemented 2020-04-19 with some tweaks to the original, and tested with Python 3.6.

Note that there are also Python bug reports about this:

Parameters:
  • fn – The function of interest to be run. A callable that will take as many arguments as there are passed iterables.

  • iterables – Arguments to be sent fn. For example, if you call parallelize_process_efficiently(fn, [a, b, c], [d, e, f]) then calls to fn will be fn(a, d), fn(b, e), and fn(c, f).

  • max_workers – Maximum number of processes/threads at one time.

  • threaded – Use threads? Otherwise, use processes.

  • verbose – Report progress?

  • loglevel – If verbose, which loglevel to use?

Yields:

results from fn, in no guaranteed order

Note re itertools.islice:

from itertools import islice

x = range(100)  # a range object; not an iterator

print(list(islice(x, 10)))  # [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
print(list(islice(x, 10)))  # [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

y = zip(x)  # a generator; an iterator

print(list(islice(y, 10)))  # [(0,), (1,), ..., (9,)]
print(list(islice(y, 10)))  # [(10,), (11,), ..., (19,)]

# ... with a zip() result, islice() continues where it left off.
# Verified: this code does call the right number of subprocesses.