WSGI Adapter Internals¶

This document covers the internal architecture, design decisions, and known limitations of the cymongoose.wsgi module. For user-facing documentation see the WSGI/ASGI Guide.

Architecture Overview¶

The WSGI adapter bridges two fundamentally different execution models:

cymongoose: single-threaded, non-blocking event loop (C)
WSGI: synchronous, blocking callable (Python)

The adapter uses a ThreadPoolExecutor to run WSGI callables in worker threads. Results are sent back to the event loop via Manager.wakeup().

Response Paths¶

There are two response paths, selected automatically based on the accumulated body size during WSGI iteration:

Path	Trigger	Transport	Latency
Buffered	body < 1 MB (`_STREAM_THRESHOLD`)	Single wakeup (inline or stash)	One round-trip
Streaming	body >= 1 MB	Per-connection `queue.Queue` + drain wakeups	Per-chunk

The buffered path uses conn.reply() for a single complete response. The streaming path uses conn.http_chunk() with chunked transfer encoding.

Wakeup Message Types¶

All communication from worker threads to the event loop uses Manager.wakeup() with a single-byte prefix:

Prefix	Name	Payload	Description
`I`	Inline	`<json_meta>\n<body>`	Complete buffered response, payload fits in wakeup buffer
`S`	Stash	`<uuid_hex>`	Complete buffered response, payload stored in `_stash` dict
`H`	Header inline	`<raw_http_headers>`	Start chunked response, headers fit in wakeup buffer
`h`	Header stash	`<uuid_hex>`	Start chunked response, headers stored in `_stash` dict
`D`	Drain	(empty)	Signal that chunks are waiting in the stream queue

Wakeup Size Limit¶

mg_wakeup() transmits data over a socketpair using non-blocking send(). The effective send buffer varies by platform:

Platform	Approximate limit
macOS	~9 KB
Linux	~64 KB

Payloads exceeding _WAKEUP_MAX_BYTES (8 KB) are stored in a thread-safe _stash dict and only a 32-byte UUID key is sent via wakeup. If the wakeup payload exceeds the socket buffer, the send() silently fails and the data is lost -- this is why the 8 KB threshold is conservative.

Streaming Design¶

Queue-Based Transport¶

Each streaming response gets a per-connection queue.Queue(maxsize=16) stored in _streams[conn_id]. The worker pushes chunks into the queue and sends a tiny D wakeup to notify the event loop. The event loop drains all available chunks and sends them via conn.http_chunk().

A None sentinel in the queue signals end-of-stream. The event loop sends an empty chunk to close the chunked response and removes the stream entry.

Back-Pressure¶

The bounded queue (maxsize=16) provides natural back-pressure. When the queue is full, the worker blocks on q.put(timeout=5.0). If the put times out (e.g. because the connection closed and the queue is no longer being drained), the worker aborts cleanly.

Chunk Batching¶

Fast generators that yield many small chunks (e.g. 5-byte strings in a tight loop) are batched up to _STREAM_BATCH_SIZE (256 KB) before being pushed to the queue. This prevents flooding the wakeup socketpair with hundreds of tiny drain notifications.

Disconnect Cleanup¶

When a connection closes, MG_EV_CLOSE fires and the event handler pops the stream entry from _streams. The queue object and its contents are garbage-collected. The worker, which holds a local reference to the queue, may still push a few more chunks that will never be consumed. The put(timeout=5.0) ensures the worker eventually aborts instead of blocking forever.

Thread Safety¶

GIL-Dependent Dict Access¶

The _streams dict is accessed from both the event loop thread (reads and pops in _drain_stream, MG_EV_CLOSE) and worker threads (writes in _worker_stream, reads in error handling). No explicit lock protects it.

This is safe in CPython because dict operations (__getitem__, __setitem__, pop) are atomic under the GIL. It is also safe in PyPy for the same reason. However, this is an implementation detail of these interpreters, not a language-level guarantee. If cymongoose ever targets a GIL-free Python (PEP 703), this dict must be protected with a lock.

The _stash dict, by contrast, is protected by _stash_lock because it was designed before the queue approach and the explicit lock is harmless.

Thread-Safe Methods¶

Only Manager.wakeup() is safe to call from worker threads. All Connection methods (reply, send, http_chunk) are called exclusively from the event loop thread via wakeup dispatch.

Known Limitations¶

Response Buffering Below Threshold¶

Responses under 1 MB are fully buffered in memory before sending. This is intentional -- the buffered path is faster (single wakeup, single conn.reply()) and covers the vast majority of web API responses. The 1 MB threshold is not currently configurable.

Disconnect Stall¶

When a client disconnects mid-stream, the worker blocks on q.put(timeout=stream_timeout) until the timeout expires. During this time it occupies a thread pool slot. With workers=4 and 4 simultaneous disconnects, the server is unresponsive for up to stream_timeout seconds.

The default is 5 seconds, which avoids false positives on slow networks. Reduce it for latency-sensitive deployments:

# Abort stalled workers after 1 second instead of 5
server = WSGIServer(app, workers=8, stream_timeout=1.0)

# Or via the one-liner
serve(app, workers=8, stream_timeout=1.0)

Truncated Response on Mid-Stream Error¶

If the WSGI application raises an exception after streaming has started (headers already sent), the worker puts None in the queue to end the chunked response. The client receives a truncated response with no error indication -- the 200 status code is already on the wire.

This is inherent to HTTP chunked encoding and matches the behavior of production WSGI servers (gunicorn, uwsgi, waitress).

Duplicate Response Headers (Resolved)¶

Both the buffered and streaming paths now construct raw HTTP responses via conn.send() instead of conn.reply(). This preserves duplicate headers (e.g. multiple Set-Cookie) since headers are serialised from the list of tuples returned by the WSGI start_response callable, without converting to a dict.

No Concurrent Streaming Load Test¶

The test suite covers concurrent buffered requests and single streaming responses, but does not test multiple simultaneous streaming responses under load. Real-world load testing with tools like wrk is recommended before deploying the streaming path in production.

Constants Reference¶

Constant	Value	Description
`_WAKEUP_MAX_BYTES`	8 KB	Max inline wakeup payload
`_STREAM_THRESHOLD`	1 MB	Body size that triggers streaming
`_STREAM_QUEUE_SIZE`	16	Max pending chunks in stream queue
`_STREAM_BATCH_SIZE`	256 KB	Max bytes batched before queue push
`_STREAM_PUT_TIMEOUT`	5.0 s	Timeout for queue put (deadlock prevention)