Skip to content

Analysis

Loudness metering, spectral features, pitch detection, onset detection, and resampling.

Usage examples

Loudness metering

from nanodsp import analysis
from nanodsp.buffer import AudioBuffer

buf = AudioBuffer.from_file("input.wav")

# Measure integrated loudness (ITU-R BS.1770-4)
lufs = analysis.loudness_lufs(buf)
print(f"Loudness: {lufs:.1f} LUFS")

# Normalize to -14 LUFS (streaming target)
normalized = analysis.normalize_lufs(buf, target_lufs=-14.0)

Spectral features

# Brightness tracking
centroid = analysis.spectral_centroid(buf, window_size=2048)

# Spectral spread
bandwidth = analysis.spectral_bandwidth(buf)

# High-frequency rolloff (85th percentile)
rolloff = analysis.spectral_rolloff(buf, percentile=0.85)

# Onset-correlated spectral change
flux = analysis.spectral_flux(buf, rectify=True)

# Noisiness measure (0 = tonal, 1 = noise-like)
flatness = analysis.spectral_flatness_curve(buf)

# Pitch class distribution (12 bins)
chroma = analysis.chromagram(buf, n_chroma=12, tuning_hz=440.0)

Pitch detection

# YIN algorithm for monophonic f0 estimation
f0, confidence = analysis.pitch_detect(
    buf, method="yin", fmin=80.0, fmax=800.0, threshold=0.2
)
# f0: array of frequency estimates per frame
# confidence: array of confidence values (higher = more reliable)

Onset detection

# Detect note onsets
onsets = analysis.onset_detect(buf, method="spectral_flux", threshold=0.5)
# onsets: array of frame indices

# With backtracking (move to nearest energy minimum)
onsets = analysis.onset_detect(buf, backtrack=True)

Resampling

# Polyphase resampling (madronalib backend)
buf_48k = analysis.resample(buf, target_sr=48000.0)

# FFT-based resampling
buf_22k = analysis.resample_fft(buf, target_sr=22050.0)

Delay estimation (GCC-PHAT)

# Estimate time delay between two microphone signals
delay_sec, correlation = analysis.gcc_phat(mic1, mic2)
print(f"Estimated delay: {delay_sec * 1000:.2f} ms")

API reference

analysis

Audio analysis: loudness, spectral features, pitch/onset detection, resampling.

loudness_lufs

loudness_lufs(buf: AudioBuffer) -> float

Measure integrated loudness per ITU-R BS.1770-4.

Implements the gated loudness measurement algorithm defined in ITU-R BS.1770-4 (10/2015), "Algorithms to measure audio programme loudness and true-peak audio level."

RETURNS DESCRIPTION
float

Integrated loudness in LUFS. Returns -inf for silence or signals shorter than 400 ms.

References

.. [1] ITU-R BS.1770-4, "Algorithms to measure audio programme loudness and true-peak audio level," International Telecommunication Union, 2015. https://www.itu.int/rec/R-REC-BS.1770

true_peak_dbtp

true_peak_dbtp(buf: AudioBuffer) -> float

Measure true peak level per ITU-R BS.1770-4.

Oversamples by 4x to detect inter-sample peaks that exceed the sample peak, then returns the maximum absolute value in dBTP.

RETURNS DESCRIPTION
float

True peak level in dBTP. Returns -inf for silence.

References

.. [1] ITU-R BS.1770-4, "Algorithms to measure audio programme loudness and true-peak audio level," International Telecommunication Union, 2015.

normalize_lufs

normalize_lufs(
    buf: AudioBuffer, target_lufs: float = -14.0
) -> AudioBuffer

Normalize loudness to target_lufs.

PARAMETER DESCRIPTION
target_lufs

Target integrated loudness in LUFS. Typical: -23 (broadcast) to -14 (streaming).

TYPE: float DEFAULT: -14.0

RAISES DESCRIPTION
ValueError

If the input is silent or too short to measure.

spectral_centroid

spectral_centroid(
    buf: AudioBuffer,
    window_size: int = 2048,
    hop_size: int | None = None,
) -> np.ndarray

Weighted mean frequency per STFT frame.

Returns Hz values shaped [num_frames] (mono) or [channels, num_frames].

spectral_bandwidth

spectral_bandwidth(
    buf: AudioBuffer,
    window_size: int = 2048,
    hop_size: int | None = None,
) -> np.ndarray

Weighted standard deviation around spectral centroid per frame.

Returns Hz values shaped [num_frames] (mono) or [channels, num_frames].

spectral_rolloff

spectral_rolloff(
    buf: AudioBuffer,
    window_size: int = 2048,
    hop_size: int | None = None,
    percentile: float = 0.85,
) -> np.ndarray

Frequency below which percentile of spectral energy lies.

PARAMETER DESCRIPTION
percentile

Energy fraction, 0.0--1.0 (e.g. 0.85 = 85th percentile).

TYPE: float DEFAULT: 0.85

RETURNS DESCRIPTION
ndarray

Hz values shaped [num_frames] (mono) or [channels, num_frames].

spectral_flux

spectral_flux(
    buf: AudioBuffer,
    window_size: int = 2048,
    hop_size: int | None = None,
    rectify: bool = False,
) -> np.ndarray

L2 norm of frame-to-frame magnitude difference.

If rectify is True, only positive changes are counted (half-wave rectification). Returns [num_frames] (mono) or [channels, num_frames].

spectral_flatness_curve

spectral_flatness_curve(
    buf: AudioBuffer,
    window_size: int = 2048,
    hop_size: int | None = None,
) -> np.ndarray

Geometric/arithmetic mean ratio per frame (Wiener entropy).

Range [0, 1]: 0=tonal, 1=noise-like. Returns [num_frames] (mono) or [channels, num_frames].

chromagram

chromagram(
    buf: AudioBuffer,
    window_size: int = 4096,
    hop_size: int | None = None,
    n_chroma: int = 12,
    tuning_hz: float = 440.0,
) -> np.ndarray

Pitch class energy distribution.

Maps FFT bins to chroma classes and sums magnitudes. Returns [n_chroma, num_frames] (mono) or [channels, n_chroma, num_frames].

pitch_detect

pitch_detect(
    buf: AudioBuffer,
    method: str = "yin",
    window_size: int = 2048,
    hop_size: int | None = None,
    fmin: float = 50.0,
    fmax: float = 2000.0,
    threshold: float = 0.2,
) -> tuple[np.ndarray, np.ndarray]

Detect fundamental frequency using the YIN algorithm.

Implements the YIN autocorrelation-based F0 estimator with cumulative mean normalized difference function and parabolic interpolation.

PARAMETER DESCRIPTION
fmin

Minimum detectable frequency in Hz, > 0. Typical: 50--200.

TYPE: float DEFAULT: 50.0

fmax

Maximum detectable frequency in Hz, > fmin. Typical: 2000--4000.

TYPE: float DEFAULT: 2000.0

threshold

YIN aperiodicity threshold, 0.0--1.0. Lower = stricter voicing detection. Typical: 0.1--0.3.

TYPE: float DEFAULT: 0.2

RETURNS DESCRIPTION
tuple[ndarray, ndarray]

(frequencies, confidences) where frequencies are F0 in Hz (0.0 where unvoiced) and confidences are 0.0--1.0. Shape is [num_frames] for mono or [channels, num_frames] for multi-channel.

References

.. [1] A. de Cheveigne and H. Kawahara, "YIN, a fundamental frequency estimator for speech and music," J. Acoust. Soc. Am., vol. 111, no. 4, pp. 1917--1930, 2002.

onset_detect

onset_detect(
    buf: AudioBuffer,
    method: str = "spectral_flux",
    window_size: int = 2048,
    hop_size: int | None = None,
    threshold: float | None = None,
    backtrack: bool = False,
    pre_max: int = 3,
    post_max: int = 3,
    pre_avg: int = 3,
    post_avg: int = 3,
    wait: int = 5,
) -> np.ndarray

Detect onsets in audio.

Returns sample indices (int64) of detected onsets. Multi-channel input is mixed to mono first.

resample

resample(buf: AudioBuffer, target_sr: float) -> AudioBuffer

Resample audio to a different sample rate.

Uses madronalib Downsampler/Upsampler for power-of-2 ratios (higher quality), and linear interpolation for arbitrary ratios.

resample_fft

resample_fft(
    buf: AudioBuffer, target_sr: float
) -> AudioBuffer

Resample audio to a different sample rate using FFT-based method.

PARAMETER DESCRIPTION
buf

Input audio.

TYPE: AudioBuffer

target_sr

Target sample rate in Hz.

TYPE: float

RETURNS DESCRIPTION
AudioBuffer

Resampled audio with sample_rate set to target_sr.

gcc_phat

gcc_phat(
    buf: AudioBuffer,
    ref: AudioBuffer,
    sample_rate: float | None = None,
) -> tuple[float, np.ndarray]

Estimate time delay between two signals using GCC-PHAT.

Implements the Generalized Cross-Correlation with Phase Transform (GCC-PHAT) method for robust time-delay estimation.

PARAMETER DESCRIPTION
buf

Signal of interest (mono or mixed to mono).

TYPE: AudioBuffer

ref

Reference signal (mono or mixed to mono).

TYPE: AudioBuffer

sample_rate

Override sample rate for delay computation. Defaults to buf.sample_rate.

TYPE: float or None DEFAULT: None

RETURNS DESCRIPTION
tuple[float, ndarray]

(delay_seconds, correlation) -- delay in seconds (positive means buf is delayed relative to ref), and the full GCC-PHAT correlation array.

References

.. [1] C. Knapp and G. Carter, "The generalized correlation method for estimation of time delay," IEEE Trans. Acoust., Speech, Signal Process., vol. 24, no. 4, pp. 320--327, 1976.