Onset Detection¶
The cysox.onset module provides C-optimized transient detection for locating note onsets, drum hits, and other audio events. It is implemented in Cython with KissFFT acceleration.
Quick Start¶
from cysox import onset
# Detect onsets in a file
times = onset.detect('drums.wav')
print(f"Found {len(times)} onsets: {times}")
# Use with slice_loop for automatic beat slicing
import cysox
cysox.slice_loop('drums.wav', 'slices/', threshold=0.3)
Functions¶
onset.detect()¶
Detect onsets in an audio file. This is the main entry point -- it handles file reading internally.
onset.detect(
path,
threshold=0.3,
sensitivity=1.5,
min_spacing=0.05,
method="hfc",
frame_size=1024,
hop_size=256,
)
Returns: List of onset times in seconds.
| Parameter | Type | Default | Description |
|---|---|---|---|
path |
str or Path | required | Path to audio file. |
threshold |
float | 0.3 | Detection threshold, 0.0-1.0. Lower = more sensitive. |
sensitivity |
float | 1.5 | Peak picking strictness, 1.0-3.0. Higher = fewer detections. |
min_spacing |
float | 0.05 | Minimum time between onsets in seconds (prevents double triggers). |
method |
str | "hfc" |
Detection algorithm (see Methods). |
frame_size |
int | 1024 | FFT analysis frame size in samples. |
hop_size |
int | 256 | Hop size between analysis frames in samples. |
Superflux-only parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
n_mels |
int | 138 | Number of mel frequency bands. |
fmin |
float | 27.5 | Minimum frequency in Hz. |
fmax |
float | 16000.0 | Maximum frequency in Hz. |
max_size |
int | 3 | Max-filter width along frequency axis. |
lag |
int | 2 | Frame lag for temporal reference comparison. |
onset.detect_onsets()¶
Low-level onset detection on raw sample data. Use this when you already have samples in memory.
onset.detect_onsets(
samples,
sample_rate,
channels,
threshold=0.3,
sensitivity=1.5,
min_spacing=0.05,
method="hfc",
frame_size=1024,
hop_size=256,
)
Returns: List of onset times in seconds.
| Parameter | Type | Default | Description |
|---|---|---|---|
samples |
list[int] | required | Audio samples as int32 values (sox_sample_t format). Interleaved for multi-channel: [L, R, L, R, ...]. |
sample_rate |
int | required | Sample rate in Hz. |
channels |
int | required | Number of audio channels. |
All other parameters are identical to detect().
Detection Methods¶
| Method | Best for | Description |
|---|---|---|
"hfc" |
Percussive transients (drums, plucks) | High-Frequency Content: weights spectral energy by frequency, emphasizing sharp attacks. Fast and accurate for rhythmic material. |
"flux" |
General onsets (tonal + percussive) | Spectral Flux: measures the rate of spectral change between frames. Good all-around choice. |
"energy" |
Loud/quiet transitions | Energy-based: detects changes in overall signal energy. Simplest and fastest, but less precise. |
"complex" |
Tonal onsets (pitch changes) | Complex domain: uses both phase and magnitude analysis. Most accurate for pitched instruments, but slower. |
"superflux" |
Vibrato-heavy material | Superflux (Boeck & Widmer, DAFx 2013): mel-scaled spectral analysis with max-filter for vibrato suppression. Best when other methods produce false positives from vibrato or tremolo. |
Tuning Parameters¶
Threshold¶
The threshold parameter (0.0-1.0) is the primary sensitivity control. It is normalized relative to the maximum onset detection function value, so the same threshold works across different recordings.
- 0.1-0.2: Very sensitive, catches quiet onsets. May produce false positives.
- 0.3: Good default for most drum loops and percussive material.
- 0.5-0.7: Catches only strong, unambiguous onsets.
- 0.8+: Only the loudest hits.
Sensitivity¶
The sensitivity parameter (1.0-3.0) controls peak picking strictness. An onset is only reported if it exceeds sensitivity times the local mean of the detection function.
- 1.0: Permissive -- most peaks are reported.
- 1.5: Default -- good balance.
- 2.0-3.0: Strict -- only peaks that clearly stand out.
min_spacing¶
Minimum time gap (in seconds) between reported onsets. Prevents double-triggering on a single event. The default of 0.05s (50ms) works for most material. Increase to 0.1s+ for slower tempos.
Examples¶
Detect drum hits¶
from cysox import onset
times = onset.detect('drums.wav', threshold=0.3, method='hfc')
for i, t in enumerate(times):
print(f"Hit {i+1}: {t:.3f}s")
Onset-based slicing¶
import cysox
# Slice at every detected transient
slices = cysox.slice_loop(
'break.wav',
'slices/',
threshold=0.3,
onset_method='hfc',
sensitivity=1.5,
)
print(f"Created {len(slices)} slices")
Compare detection methods¶
from cysox import onset
for method in ['hfc', 'flux', 'energy', 'complex', 'superflux']:
times = onset.detect('audio.wav', method=method, threshold=0.3)
print(f"{method:10s}: {len(times)} onsets")