WSGI Adapter Internals¶
This document covers the internal architecture, design decisions, and
known limitations of the cymongoose.wsgi module. For user-facing
documentation see the WSGI/ASGI Guide.
Architecture Overview¶
The WSGI adapter bridges two fundamentally different execution models:
- cymongoose: single-threaded, non-blocking event loop (C)
- WSGI: synchronous, blocking callable (Python)
The adapter uses a ThreadPoolExecutor to run WSGI callables in
worker threads. Results are sent back to the event loop via
Manager.wakeup().
Response Paths¶
There are two response paths, selected automatically based on the accumulated body size during WSGI iteration:
| Path | Trigger | Transport | Latency |
|---|---|---|---|
| Buffered | body < 1 MB (_STREAM_THRESHOLD) |
Single wakeup (inline or stash) | One round-trip |
| Streaming | body >= 1 MB | Per-connection queue.Queue + drain wakeups |
Per-chunk |
The buffered path uses conn.reply() for a single complete response.
The streaming path uses conn.http_chunk() with chunked transfer
encoding.
Wakeup Message Types¶
All communication from worker threads to the event loop uses
Manager.wakeup() with a single-byte prefix:
| Prefix | Name | Payload | Description |
|---|---|---|---|
I |
Inline | <json_meta>\n<body> |
Complete buffered response, payload fits in wakeup buffer |
S |
Stash | <uuid_hex> |
Complete buffered response, payload stored in _stash dict |
H |
Header inline | <raw_http_headers> |
Start chunked response, headers fit in wakeup buffer |
h |
Header stash | <uuid_hex> |
Start chunked response, headers stored in _stash dict |
D |
Drain | (empty) | Signal that chunks are waiting in the stream queue |
Wakeup Size Limit¶
mg_wakeup() transmits data over a socketpair using non-blocking
send(). The effective send buffer varies by platform:
| Platform | Approximate limit |
|---|---|
| macOS | ~9 KB |
| Linux | ~64 KB |
Payloads exceeding _WAKEUP_MAX_BYTES (8 KB) are stored in a
thread-safe _stash dict and only a 32-byte UUID key is sent via
wakeup. If the wakeup payload exceeds the socket buffer, the
send() silently fails and the data is lost -- this is why the 8 KB
threshold is conservative.
Streaming Design¶
Queue-Based Transport¶
Each streaming response gets a per-connection queue.Queue(maxsize=16)
stored in _streams[conn_id]. The worker pushes chunks into the
queue and sends a tiny D wakeup to notify the event loop. The
event loop drains all available chunks and sends them via
conn.http_chunk().
A None sentinel in the queue signals end-of-stream. The event loop
sends an empty chunk to close the chunked response and removes the
stream entry.
Back-Pressure¶
The bounded queue (maxsize=16) provides natural back-pressure.
When the queue is full, the worker blocks on q.put(timeout=5.0).
If the put times out (e.g. because the connection closed and the
queue is no longer being drained), the worker aborts cleanly.
Chunk Batching¶
Fast generators that yield many small chunks (e.g. 5-byte strings in a
tight loop) are batched up to _STREAM_BATCH_SIZE (256 KB) before
being pushed to the queue. This prevents flooding the wakeup
socketpair with hundreds of tiny drain notifications.
Disconnect Cleanup¶
When a connection closes, MG_EV_CLOSE fires and the event handler
pops the stream entry from _streams. The queue object and its
contents are garbage-collected. The worker, which holds a local
reference to the queue, may still push a few more chunks that will
never be consumed. The put(timeout=5.0) ensures the worker
eventually aborts instead of blocking forever.
Thread Safety¶
GIL-Dependent Dict Access¶
The _streams dict is accessed from both the event loop thread
(reads and pops in _drain_stream, MG_EV_CLOSE) and worker threads
(writes in _worker_stream, reads in error handling). No explicit
lock protects it.
This is safe in CPython because dict operations (__getitem__,
__setitem__, pop) are atomic under the GIL. It is also safe in
PyPy for the same reason. However, this is an implementation detail
of these interpreters, not a language-level guarantee. If cymongoose
ever targets a GIL-free Python (PEP 703), this dict must be protected
with a lock.
The _stash dict, by contrast, is protected by _stash_lock because
it was designed before the queue approach and the explicit lock is
harmless.
Thread-Safe Methods¶
Only Manager.wakeup() is safe to call from worker threads. All
Connection methods (reply, send, http_chunk) are called
exclusively from the event loop thread via wakeup dispatch.
Known Limitations¶
Response Buffering Below Threshold¶
Responses under 1 MB are fully buffered in memory before sending.
This is intentional -- the buffered path is faster (single wakeup,
single conn.reply()) and covers the vast majority of web API
responses. The 1 MB threshold is not currently configurable.
Disconnect Stall¶
When a client disconnects mid-stream, the worker blocks on
q.put(timeout=stream_timeout) until the timeout expires. During
this time it occupies a thread pool slot. With workers=4 and 4
simultaneous disconnects, the server is unresponsive for up to
stream_timeout seconds.
The default is 5 seconds, which avoids false positives on slow networks. Reduce it for latency-sensitive deployments:
# Abort stalled workers after 1 second instead of 5
server = WSGIServer(app, workers=8, stream_timeout=1.0)
# Or via the one-liner
serve(app, workers=8, stream_timeout=1.0)
Truncated Response on Mid-Stream Error¶
If the WSGI application raises an exception after streaming has
started (headers already sent), the worker puts None in the queue
to end the chunked response. The client receives a truncated
response with no error indication -- the 200 status code is already
on the wire.
This is inherent to HTTP chunked encoding and matches the behavior of production WSGI servers (gunicorn, uwsgi, waitress).
Duplicate Response Headers (Resolved)¶
Both the buffered and streaming paths now construct raw HTTP
responses via conn.send() instead of conn.reply(). This
preserves duplicate headers (e.g. multiple Set-Cookie) since
headers are serialised from the list of tuples returned by the
WSGI start_response callable, without converting to a dict.
No Concurrent Streaming Load Test¶
The test suite covers concurrent buffered requests and single
streaming responses, but does not test multiple simultaneous
streaming responses under load. Real-world load testing with tools
like wrk is recommended before deploying the streaming path in
production.
Constants Reference¶
| Constant | Value | Description |
|---|---|---|
_WAKEUP_MAX_BYTES |
8 KB | Max inline wakeup payload |
_STREAM_THRESHOLD |
1 MB | Body size that triggers streaming |
_STREAM_QUEUE_SIZE |
16 | Max pending chunks in stream queue |
_STREAM_BATCH_SIZE |
256 KB | Max bytes batched before queue push |
_STREAM_PUT_TIMEOUT |
5.0 s | Timeout for queue put (deadlock prevention) |
See Also¶
- WSGI/ASGI Guide -- user-facing documentation
- Threading Guide -- wakeup payload limits
- Performance Tuning -- benchmarking