While these kind of articles are useful for learners, I hope someone please explain concurrency models for uWSGI/Gunicorn/uvicorn/gevent. Like how long does global variables live? How does context switching (like the magic request from Flask) work? How to spawn async background tasks? Is it safe to mix task schedulers inside web code? How to measure when concurrency is full and how to scale? What data can be shared between executors and how? How to detect back-pressure? How to interrupt long-running function when client disconnects (nginx 499)? How to proper handle unix signals for threads/multiprocess/asyncio?
I reality no one writes from scratch with threads, processes or asyncio unless you are a library author.
Unfortunately this starts with a definition of concurrency quoted from the Python wiki [0] which is imprecise: "Concurrency in programming means that multiple computations happen at the same time."
Not necessarily. It means multiple computations could happen at the same time. Wikipedia has a broader definition [1]: "Concurrency refers to the ability of a system to execute multiple tasks through simultaneous execution or time-sharing (context switching), sharing resources and managing interactions."
In other words it could be at the same time or it could be context switching (quickly changing from one to another). Parallel [2] means explicitly at the same time: "Parallel computing is a type of computation in which many calculations or processes are carried out simultaneously."
Which while it sounds like a nit the OPs definition of concurrency means that asyncio as implemented in Python (and others) is not a form of concurrent programming.
Wondering anyone have multiprocessing freezing on Windows? As it seems you need to have __main__ for multiprocessing to work in Windows, which I do not have as I'm using pyproject scripts with click to run. Have anyone face these issue? Is there any solution for this issue?
I would have liked to see the performance of the ash c version, seems like a surprising omission.
I’m also confused about the perf2 performance. For the threads example it starts around 70_000 reqs/sec, while the processes example runs at 3_500 reqs/sec. That’s a 20 times difference that isn’t mentioned in the text.
Not really anything new in there. Been dealing with python concurrency a lot and i dont find it great compared to other languages (eg kotlin).
One thing I am struggling with right now is how do I handle a function that its both I/O intensive and CPU-bound? To give more context, I am processing data which on paper is easy to parallelise. Say for 1000 lines of data, I have to execute my function f for each line, in any order. However f using the cpu a lot, but also doing up to 4 network requests.
My current approach is to divide 1000/n_cores, then launch n_cores processes and on each of them run the function f asynchronoulsy on all inputs of that process, async to handle switching on I/O. I wonder if my approach could be improved.
Yes. When you use N batches by the number of cores the total time is defined by the slowest batch. At the end it will be just one job running. If you make batches smaller, like 1000/n_cores/k then you may get better CPU utilization and start-to-end total time. Making k too big will add overhead. Assuming n_cores==10 then k==5 may be a good compromise. Depends on start/stop time per job.
Interested in seeing if you have tried 3.13 free threading. Your usage case might be worth a test there if moving from a process to threading model isn't too much work.
Where does your implementation bottleneck?
Python concurrency does suffer from being relatively new and being bolted on to a decades old language. I'd expect the state of the art of python to be much cleaner once no-Gil is hammered on for a few release cycles.
As always I suggest Core.py podcast as it has a bunch of background details[1]. There are no-Gil updates throughout the series.
While these kind of articles are useful for learners, I hope someone please explain concurrency models for uWSGI/Gunicorn/uvicorn/gevent. Like how long does global variables live? How does context switching (like the magic request from Flask) work? How to spawn async background tasks? Is it safe to mix task schedulers inside web code? How to measure when concurrency is full and how to scale? What data can be shared between executors and how? How to detect back-pressure? How to interrupt long-running function when client disconnects (nginx 499)? How to proper handle unix signals for threads/multiprocess/asyncio?
I reality no one writes from scratch with threads, processes or asyncio unless you are a library author.
Unfortunately this starts with a definition of concurrency quoted from the Python wiki [0] which is imprecise: "Concurrency in programming means that multiple computations happen at the same time."
Not necessarily. It means multiple computations could happen at the same time. Wikipedia has a broader definition [1]: "Concurrency refers to the ability of a system to execute multiple tasks through simultaneous execution or time-sharing (context switching), sharing resources and managing interactions."
In other words it could be at the same time or it could be context switching (quickly changing from one to another). Parallel [2] means explicitly at the same time: "Parallel computing is a type of computation in which many calculations or processes are carried out simultaneously."
0: https://wiki.python.org/moin/Concurrency
1: https://en.wikipedia.org/wiki/Concurrency_(computer_science)
2: https://en.wikipedia.org/wiki/Parallel_computing
Which while it sounds like a nit the OPs definition of concurrency means that asyncio as implemented in Python (and others) is not a form of concurrent programming.
> Summary
Interesting takeaway! For web services mine would be:1. always use asyncio
2. use threads with asyncio.to_thread[0] to convert blocking calls to asyncio with ease
3. use aiomultiprocess[1] for any cpu intensive tasks
[0]: https://docs.python.org/3/library/asyncio-task.html#asyncio....
[1]: https://aiomultiprocess.omnilib.dev/en/stable/
Common in practice variant: don't use pure Python for cpu-intensive tasks (offload to C extensions)
Don't know if the author will show up here, but the code highlighting theme is almost unreadable, at least on chrome on android.
Looks fine to me on Firefox/android in light mode.
It's unreadable in dark mode.
Same on iPhone, Safari.
Wondering anyone have multiprocessing freezing on Windows? As it seems you need to have __main__ for multiprocessing to work in Windows, which I do not have as I'm using pyproject scripts with click to run. Have anyone face these issue? Is there any solution for this issue?
It’s not hard to use main, and it’s a requirement for multiprocessing.
The real fun is when you have multiple processes also spawning threads.
I would have liked to see the performance of the ash c version, seems like a surprising omission.
I’m also confused about the perf2 performance. For the threads example it starts around 70_000 reqs/sec, while the processes example runs at 3_500 reqs/sec. That’s a 20 times difference that isn’t mentioned in the text.
Grok made better job of explaining different solutions in micropython environment. Summary:
* Task Parallelism and Multi-threading are good for computational tasks spread across the ESP32's dual cores.
* Asynchronous Programming shines in scenarios where I/O operations are predominant.
* Hardware Parallelism via RMT can offload tasks from the CPU, enhancing overall efficiency for specific types of applications.
Real question: why use grok over literally any other LLM? I’ve never heard of them being SOTA.
There seems to be less artificial restrictions. You can ask for example "who is Bryan Lunduke".
This important especially in non-english world. As forbidden words have different significance in other cultures.
Not really anything new in there. Been dealing with python concurrency a lot and i dont find it great compared to other languages (eg kotlin).
One thing I am struggling with right now is how do I handle a function that its both I/O intensive and CPU-bound? To give more context, I am processing data which on paper is easy to parallelise. Say for 1000 lines of data, I have to execute my function f for each line, in any order. However f using the cpu a lot, but also doing up to 4 network requests.
My current approach is to divide 1000/n_cores, then launch n_cores processes and on each of them run the function f asynchronoulsy on all inputs of that process, async to handle switching on I/O. I wonder if my approach could be improved.
> I wonder if my approach could be improved.
Yes. When you use N batches by the number of cores the total time is defined by the slowest batch. At the end it will be just one job running. If you make batches smaller, like 1000/n_cores/k then you may get better CPU utilization and start-to-end total time. Making k too big will add overhead. Assuming n_cores==10 then k==5 may be a good compromise. Depends on start/stop time per job.
Interested in seeing if you have tried 3.13 free threading. Your usage case might be worth a test there if moving from a process to threading model isn't too much work.
Where does your implementation bottleneck?
Python concurrency does suffer from being relatively new and being bolted on to a decades old language. I'd expect the state of the art of python to be much cleaner once no-Gil is hammered on for a few release cycles.
As always I suggest Core.py podcast as it has a bunch of background details[1]. There are no-Gil updates throughout the series.
[1] https://podcasts.apple.com/us/podcast/core-py/id1712665877