Analysis¶

Loudness metering, spectral features, pitch detection, onset detection, and resampling.

Usage examples¶

Loudness metering¶

from nanodsp import analysis
from nanodsp.buffer import AudioBuffer

buf = AudioBuffer.from_file("input.wav")

# Measure integrated loudness (ITU-R BS.1770-4)
lufs = analysis.loudness_lufs(buf)
print(f"Loudness: {lufs:.1f} LUFS")

# Normalize to -14 LUFS (streaming target)
normalized = analysis.normalize_lufs(buf, target_lufs=-14.0)

Spectral features¶

# Brightness tracking
centroid = analysis.spectral_centroid(buf, window_size=2048)

# Spectral spread
bandwidth = analysis.spectral_bandwidth(buf)

# High-frequency rolloff (85th percentile)
rolloff = analysis.spectral_rolloff(buf, percentile=0.85)

# Onset-correlated spectral change
flux = analysis.spectral_flux(buf, rectify=True)

# Noisiness measure (0 = tonal, 1 = noise-like)
flatness = analysis.spectral_flatness_curve(buf)

# Pitch class distribution (12 bins)
chroma = analysis.chromagram(buf, n_chroma=12, tuning_hz=440.0)

Pitch detection¶

# YIN algorithm for monophonic f0 estimation
f0, confidence = analysis.pitch_detect(
    buf, method="yin", fmin=80.0, fmax=800.0, threshold=0.2
)
# f0: array of frequency estimates per frame
# confidence: array of confidence values (higher = more reliable)

Onset detection¶

# Detect note onsets
onsets = analysis.onset_detect(buf, method="spectral_flux", threshold=0.5)
# onsets: array of frame indices

# With backtracking (move to nearest energy minimum)
onsets = analysis.onset_detect(buf, backtrack=True)

Resampling¶

# Polyphase resampling (madronalib backend)
buf_48k = analysis.resample(buf, target_sr=48000.0)

# FFT-based resampling
buf_22k = analysis.resample_fft(buf, target_sr=22050.0)

Delay estimation (GCC-PHAT)¶

# Estimate time delay between two microphone signals
delay_sec, correlation = analysis.gcc_phat(mic1, mic2)
print(f"Estimated delay: {delay_sec * 1000:.2f} ms")

API reference¶

analysis ¶

Audio analysis: loudness, spectral features, pitch/onset detection, resampling.

loudness_lufs ¶

loudness_lufs(buf: AudioBuffer) -> float

Measure integrated loudness per ITU-R BS.1770-4.

Implements the gated loudness measurement algorithm defined in ITU-R BS.1770-4 (10/2015), "Algorithms to measure audio programme loudness and true-peak audio level."

RETURNS	DESCRIPTION
`float`	Integrated loudness in LUFS. Returns `-inf` for silence or signals shorter than 400 ms.

References

.. [1] ITU-R BS.1770-4, "Algorithms to measure audio programme loudness and true-peak audio level," International Telecommunication Union, 2015. https://www.itu.int/rec/R-REC-BS.1770

true_peak_dbtp ¶

true_peak_dbtp(buf: AudioBuffer) -> float

Measure true peak level per ITU-R BS.1770-4.

Oversamples by 4x to detect inter-sample peaks that exceed the sample peak, then returns the maximum absolute value in dBTP.

RETURNS	DESCRIPTION
`float`	True peak level in dBTP. Returns `-inf` for silence.

References

.. [1] ITU-R BS.1770-4, "Algorithms to measure audio programme loudness and true-peak audio level," International Telecommunication Union, 2015.

normalize_lufs ¶

normalize_lufs(
    buf: AudioBuffer, target_lufs: float = -14.0
) -> AudioBuffer

Normalize loudness to target_lufs.

PARAMETER	DESCRIPTION
`target_lufs`	Target integrated loudness in LUFS. Typical: -23 (broadcast) to -14 (streaming). TYPE: `float` DEFAULT: `-14.0`

RAISES	DESCRIPTION
`ValueError`	If the input is silent or too short to measure.

spectral_centroid ¶

spectral_centroid(
    buf: AudioBuffer,
    window_size: int = 2048,
    hop_size: int | None = None,
) -> np.ndarray

Weighted mean frequency per STFT frame.

Returns Hz values shaped [num_frames] (mono) or [channels, num_frames].

spectral_bandwidth ¶

spectral_bandwidth(
    buf: AudioBuffer,
    window_size: int = 2048,
    hop_size: int | None = None,
) -> np.ndarray

Weighted standard deviation around spectral centroid per frame.

Returns Hz values shaped [num_frames] (mono) or [channels, num_frames].

spectral_rolloff ¶

spectral_rolloff(
    buf: AudioBuffer,
    window_size: int = 2048,
    hop_size: int | None = None,
    percentile: float = 0.85,
) -> np.ndarray

Frequency below which percentile of spectral energy lies.

PARAMETER	DESCRIPTION
`percentile`	Energy fraction, 0.0--1.0 (e.g. 0.85 = 85th percentile). TYPE: `float` DEFAULT: `0.85`

RETURNS	DESCRIPTION
`ndarray`	Hz values shaped `[num_frames]` (mono) or `[channels, num_frames]`.

spectral_flux ¶

spectral_flux(
    buf: AudioBuffer,
    window_size: int = 2048,
    hop_size: int | None = None,
    rectify: bool = False,
) -> np.ndarray

L2 norm of frame-to-frame magnitude difference.

If rectify is True, only positive changes are counted (half-wave rectification). Returns [num_frames] (mono) or [channels, num_frames].

spectral_flatness_curve ¶

spectral_flatness_curve(
    buf: AudioBuffer,
    window_size: int = 2048,
    hop_size: int | None = None,
) -> np.ndarray

Geometric/arithmetic mean ratio per frame (Wiener entropy).

Range [0, 1]: 0=tonal, 1=noise-like. Returns [num_frames] (mono) or [channels, num_frames].

chromagram ¶

chromagram(
    buf: AudioBuffer,
    window_size: int = 4096,
    hop_size: int | None = None,
    n_chroma: int = 12,
    tuning_hz: float = 440.0,
) -> np.ndarray

Pitch class energy distribution.

Maps FFT bins to chroma classes and sums magnitudes. Returns [n_chroma, num_frames] (mono) or [channels, n_chroma, num_frames].

pitch_detect ¶

pitch_detect(
    buf: AudioBuffer,
    method: str = "yin",
    window_size: int = 2048,
    hop_size: int | None = None,
    fmin: float = 50.0,
    fmax: float = 2000.0,
    threshold: float = 0.2,
) -> tuple[np.ndarray, np.ndarray]

Detect fundamental frequency using the YIN algorithm.

Implements the YIN autocorrelation-based F0 estimator with cumulative mean normalized difference function and parabolic interpolation.

PARAMETER	DESCRIPTION
`fmin`	Minimum detectable frequency in Hz, > 0. Typical: 50--200. TYPE: `float` DEFAULT: `50.0`
`fmax`	Maximum detectable frequency in Hz, > fmin. Typical: 2000--4000. TYPE: `float` DEFAULT: `2000.0`
`threshold`	YIN aperiodicity threshold, 0.0--1.0. Lower = stricter voicing detection. Typical: 0.1--0.3. TYPE: `float` DEFAULT: `0.2`

RETURNS	DESCRIPTION
`tuple[ndarray, ndarray]`	(frequencies, confidences) where frequencies are F0 in Hz (0.0 where unvoiced) and confidences are 0.0--1.0. Shape is `[num_frames]` for mono or `[channels, num_frames]` for multi-channel.

References

.. [1] A. de Cheveigne and H. Kawahara, "YIN, a fundamental frequency estimator for speech and music," J. Acoust. Soc. Am., vol. 111, no. 4, pp. 1917--1930, 2002.

onset_detect ¶

onset_detect(
    buf: AudioBuffer,
    method: str = "spectral_flux",
    window_size: int = 2048,
    hop_size: int | None = None,
    threshold: float | None = None,
    backtrack: bool = False,
    pre_max: int = 3,
    post_max: int = 3,
    pre_avg: int = 3,
    post_avg: int = 3,
    wait: int = 5,
) -> np.ndarray

Detect onsets in audio.

Returns sample indices (int64) of detected onsets. Multi-channel input is mixed to mono first.

resample ¶

resample(buf: AudioBuffer, target_sr: float) -> AudioBuffer

Resample audio to a different sample rate.

Uses madronalib Downsampler/Upsampler for power-of-2 ratios (higher quality), and linear interpolation for arbitrary ratios.

resample_fft ¶

resample_fft(
    buf: AudioBuffer, target_sr: float
) -> AudioBuffer

Resample audio to a different sample rate using FFT-based method.

PARAMETER	DESCRIPTION
`buf`	Input audio. TYPE: `AudioBuffer`
`target_sr`	Target sample rate in Hz. TYPE: `float`

RETURNS	DESCRIPTION
`AudioBuffer`	Resampled audio with sample_rate set to target_sr.

gcc_phat ¶

gcc_phat(
    buf: AudioBuffer,
    ref: AudioBuffer,
    sample_rate: float | None = None,
) -> tuple[float, np.ndarray]

Estimate time delay between two signals using GCC-PHAT.

Implements the Generalized Cross-Correlation with Phase Transform (GCC-PHAT) method for robust time-delay estimation.

PARAMETER	DESCRIPTION
`buf`	Signal of interest (mono or mixed to mono). TYPE: `AudioBuffer`
`ref`	Reference signal (mono or mixed to mono). TYPE: `AudioBuffer`
`sample_rate`	Override sample rate for delay computation. Defaults to `buf.sample_rate`. TYPE: `float or None` DEFAULT: `None`

RETURNS	DESCRIPTION
`tuple[float, ndarray]`	(delay_seconds, correlation) -- delay in seconds (positive means buf is delayed relative to ref), and the full GCC-PHAT correlation array.

References

.. [1] C. Knapp and G. Carter, "The generalized correlation method for estimation of time delay," IEEE Trans. Acoust., Speech, Signal Process., vol. 24, no. 4, pp. 320--327, 1976.