Analysis¶
Loudness metering, spectral features, pitch detection, onset detection, and resampling.
Usage examples¶
Loudness metering¶
from nanodsp import analysis
from nanodsp.buffer import AudioBuffer
buf = AudioBuffer.from_file("input.wav")
# Measure integrated loudness (ITU-R BS.1770-4)
lufs = analysis.loudness_lufs(buf)
print(f"Loudness: {lufs:.1f} LUFS")
# Normalize to -14 LUFS (streaming target)
normalized = analysis.normalize_lufs(buf, target_lufs=-14.0)
Spectral features¶
# Brightness tracking
centroid = analysis.spectral_centroid(buf, window_size=2048)
# Spectral spread
bandwidth = analysis.spectral_bandwidth(buf)
# High-frequency rolloff (85th percentile)
rolloff = analysis.spectral_rolloff(buf, percentile=0.85)
# Onset-correlated spectral change
flux = analysis.spectral_flux(buf, rectify=True)
# Noisiness measure (0 = tonal, 1 = noise-like)
flatness = analysis.spectral_flatness_curve(buf)
# Pitch class distribution (12 bins)
chroma = analysis.chromagram(buf, n_chroma=12, tuning_hz=440.0)
Pitch detection¶
# YIN algorithm for monophonic f0 estimation
f0, confidence = analysis.pitch_detect(
buf, method="yin", fmin=80.0, fmax=800.0, threshold=0.2
)
# f0: array of frequency estimates per frame
# confidence: array of confidence values (higher = more reliable)
Onset detection¶
# Detect note onsets
onsets = analysis.onset_detect(buf, method="spectral_flux", threshold=0.5)
# onsets: array of frame indices
# With backtracking (move to nearest energy minimum)
onsets = analysis.onset_detect(buf, backtrack=True)
Resampling¶
# Polyphase resampling (madronalib backend)
buf_48k = analysis.resample(buf, target_sr=48000.0)
# FFT-based resampling
buf_22k = analysis.resample_fft(buf, target_sr=22050.0)
Delay estimation (GCC-PHAT)¶
# Estimate time delay between two microphone signals
delay_sec, correlation = analysis.gcc_phat(mic1, mic2)
print(f"Estimated delay: {delay_sec * 1000:.2f} ms")
API reference¶
analysis
¶
Audio analysis: loudness, spectral features, pitch/onset detection, resampling.
loudness_lufs
¶
Measure integrated loudness per ITU-R BS.1770-4.
Implements the gated loudness measurement algorithm defined in ITU-R BS.1770-4 (10/2015), "Algorithms to measure audio programme loudness and true-peak audio level."
| RETURNS | DESCRIPTION |
|---|---|
float
|
Integrated loudness in LUFS. Returns |
References
.. [1] ITU-R BS.1770-4, "Algorithms to measure audio programme loudness and true-peak audio level," International Telecommunication Union, 2015. https://www.itu.int/rec/R-REC-BS.1770
true_peak_dbtp
¶
Measure true peak level per ITU-R BS.1770-4.
Oversamples by 4x to detect inter-sample peaks that exceed the sample peak, then returns the maximum absolute value in dBTP.
| RETURNS | DESCRIPTION |
|---|---|
float
|
True peak level in dBTP. Returns |
References
.. [1] ITU-R BS.1770-4, "Algorithms to measure audio programme loudness and true-peak audio level," International Telecommunication Union, 2015.
normalize_lufs
¶
Normalize loudness to target_lufs.
| PARAMETER | DESCRIPTION |
|---|---|
target_lufs
|
Target integrated loudness in LUFS. Typical: -23 (broadcast) to -14 (streaming).
TYPE:
|
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If the input is silent or too short to measure. |
spectral_centroid
¶
spectral_centroid(
buf: AudioBuffer,
window_size: int = 2048,
hop_size: int | None = None,
) -> np.ndarray
Weighted mean frequency per STFT frame.
Returns Hz values shaped [num_frames] (mono) or [channels, num_frames].
spectral_bandwidth
¶
spectral_bandwidth(
buf: AudioBuffer,
window_size: int = 2048,
hop_size: int | None = None,
) -> np.ndarray
Weighted standard deviation around spectral centroid per frame.
Returns Hz values shaped [num_frames] (mono) or [channels, num_frames].
spectral_rolloff
¶
spectral_rolloff(
buf: AudioBuffer,
window_size: int = 2048,
hop_size: int | None = None,
percentile: float = 0.85,
) -> np.ndarray
Frequency below which percentile of spectral energy lies.
| PARAMETER | DESCRIPTION |
|---|---|
percentile
|
Energy fraction, 0.0--1.0 (e.g. 0.85 = 85th percentile).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
ndarray
|
Hz values shaped |
spectral_flux
¶
spectral_flux(
buf: AudioBuffer,
window_size: int = 2048,
hop_size: int | None = None,
rectify: bool = False,
) -> np.ndarray
L2 norm of frame-to-frame magnitude difference.
If rectify is True, only positive changes are counted (half-wave rectification). Returns [num_frames] (mono) or [channels, num_frames].
spectral_flatness_curve
¶
spectral_flatness_curve(
buf: AudioBuffer,
window_size: int = 2048,
hop_size: int | None = None,
) -> np.ndarray
Geometric/arithmetic mean ratio per frame (Wiener entropy).
Range [0, 1]: 0=tonal, 1=noise-like. Returns [num_frames] (mono) or [channels, num_frames].
chromagram
¶
chromagram(
buf: AudioBuffer,
window_size: int = 4096,
hop_size: int | None = None,
n_chroma: int = 12,
tuning_hz: float = 440.0,
) -> np.ndarray
Pitch class energy distribution.
Maps FFT bins to chroma classes and sums magnitudes. Returns [n_chroma, num_frames] (mono) or [channels, n_chroma, num_frames].
pitch_detect
¶
pitch_detect(
buf: AudioBuffer,
method: str = "yin",
window_size: int = 2048,
hop_size: int | None = None,
fmin: float = 50.0,
fmax: float = 2000.0,
threshold: float = 0.2,
) -> tuple[np.ndarray, np.ndarray]
Detect fundamental frequency using the YIN algorithm.
Implements the YIN autocorrelation-based F0 estimator with cumulative mean normalized difference function and parabolic interpolation.
| PARAMETER | DESCRIPTION |
|---|---|
fmin
|
Minimum detectable frequency in Hz, > 0. Typical: 50--200.
TYPE:
|
fmax
|
Maximum detectable frequency in Hz, > fmin. Typical: 2000--4000.
TYPE:
|
threshold
|
YIN aperiodicity threshold, 0.0--1.0. Lower = stricter voicing detection. Typical: 0.1--0.3.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
tuple[ndarray, ndarray]
|
(frequencies, confidences) where frequencies are F0 in Hz (0.0 where
unvoiced) and confidences are 0.0--1.0. Shape is |
References
.. [1] A. de Cheveigne and H. Kawahara, "YIN, a fundamental frequency estimator for speech and music," J. Acoust. Soc. Am., vol. 111, no. 4, pp. 1917--1930, 2002.
onset_detect
¶
onset_detect(
buf: AudioBuffer,
method: str = "spectral_flux",
window_size: int = 2048,
hop_size: int | None = None,
threshold: float | None = None,
backtrack: bool = False,
pre_max: int = 3,
post_max: int = 3,
pre_avg: int = 3,
post_avg: int = 3,
wait: int = 5,
) -> np.ndarray
Detect onsets in audio.
Returns sample indices (int64) of detected onsets. Multi-channel input is mixed to mono first.
resample
¶
Resample audio to a different sample rate.
Uses madronalib Downsampler/Upsampler for power-of-2 ratios (higher quality), and linear interpolation for arbitrary ratios.
resample_fft
¶
Resample audio to a different sample rate using FFT-based method.
| PARAMETER | DESCRIPTION |
|---|---|
buf
|
Input audio.
TYPE:
|
target_sr
|
Target sample rate in Hz.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
AudioBuffer
|
Resampled audio with sample_rate set to target_sr. |
gcc_phat
¶
gcc_phat(
buf: AudioBuffer,
ref: AudioBuffer,
sample_rate: float | None = None,
) -> tuple[float, np.ndarray]
Estimate time delay between two signals using GCC-PHAT.
Implements the Generalized Cross-Correlation with Phase Transform (GCC-PHAT) method for robust time-delay estimation.
| PARAMETER | DESCRIPTION |
|---|---|
buf
|
Signal of interest (mono or mixed to mono).
TYPE:
|
ref
|
Reference signal (mono or mixed to mono).
TYPE:
|
sample_rate
|
Override sample rate for delay computation. Defaults to
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
tuple[float, ndarray]
|
(delay_seconds, correlation) -- delay in seconds (positive means buf is delayed relative to ref), and the full GCC-PHAT correlation array. |
References
.. [1] C. Knapp and G. Carter, "The generalized correlation method for estimation of time delay," IEEE Trans. Acoust., Speech, Signal Process., vol. 24, no. 4, pp. 320--327, 1976.