Skip to content

Spectral

Short-time Fourier transform, spectral transforms, and EQ matching.

Usage examples

STFT round-trip

from nanodsp import spectral
from nanodsp.buffer import AudioBuffer

buf = AudioBuffer.from_file("input.wav")

# Analyze
spec = spectral.stft(buf, window_size=2048, hop_size=512)

# Inspect
mag = spectral.magnitude(spec)    # magnitude array
ph = spectral.phase(spec)         # phase array
print(f"Frames: {spec.num_frames}, Bins: {spec.bins}")

# Reconstruct
reconstructed = spectral.istft(spec)

Window types

# Available: "hann", "hamming", "blackman", "bartlett", "rectangular"
spec = spectral.stft(buf, window_size=2048, window="blackman")
out = spectral.istft(spec, window="blackman")

Spectral processing

spec = spectral.stft(buf, window_size=2048, hop_size=512)

# Gate: silence bins below threshold
cleaned = spectral.spectral_gate(spec, threshold_db=-40.0)

# Tilt EQ: boost highs, cut lows (or vice versa)
tilted = spectral.spectral_emphasis(spec, low_db=-3.0, high_db=3.0)

# Convert between polar and complex
mag = spectral.magnitude(spec)
ph = spectral.phase(spec)
spec2 = spectral.from_polar(mag, ph, spec)

# Apply a binary mask
import numpy as np
mask = np.ones_like(mag)
mask[:, :, :10] = 0.0   # zero first 10 bins
masked = spectral.apply_mask(spec, mask)

Time stretching and pitch shifting

buf = AudioBuffer.from_file("input.wav")

# Slow down to half speed (double duration)
spec = spectral.stft(buf, window_size=2048, hop_size=512)
stretched = spectral.time_stretch(spec, rate=0.5)
slow = spectral.istft(stretched)

# Pitch shift up 5 semitones (preserves duration)
shifted = spectral.pitch_shift_spectral(buf, semitones=5.0)

Spectral effects

spec = spectral.stft(buf, window_size=2048, hop_size=512)

# Freeze a single frame into a sustained texture
frozen = spectral.spectral_freeze(spec, frame_index=10, num_frames=200)

# Morph between two sounds
spec_a = spectral.stft(buf_a, window_size=2048, hop_size=512)
spec_b = spectral.stft(buf_b, window_size=2048, hop_size=512)
morphed = spectral.spectral_morph(spec_a, spec_b, mix=0.5)

# Phase locking (identity phase-lock for cleaner stretching)
locked = spectral.phase_lock(spec)

Noise reduction

# Assumes first 10 frames are noise-only
spec = spectral.stft(buf, window_size=2048, hop_size=512)
denoised = spectral.spectral_denoise(spec, noise_frames=10, reduction_db=-20.0)
clean = spectral.istft(denoised)

EQ matching

# Make source sound like target in tonal balance
matched = spectral.eq_match(source_buf, target_buf, window_size=4096)

Frequency / bin conversion

spec = spectral.stft(buf, window_size=2048)

freq = spectral.bin_freq(spec, bin_index=10)     # bin -> Hz
b = spectral.freq_to_bin(spec, freq_hz=1000.0)   # Hz -> bin

API reference

spectral

STFT, spectral transforms, and EQ matching.

stft

stft(
    buf: AudioBuffer,
    window_size: int = 2048,
    hop_size: int | None = None,
    window: WindowType = "hann",
) -> Spectrogram

Short-time Fourier transform using windowed RealFFT + overlap.

PARAMETER DESCRIPTION
buf

Input audio.

TYPE: AudioBuffer

window_size

Analysis window length in samples, must be > 0. Typical: 512--4096.

TYPE: int DEFAULT: 2048

hop_size

Hop between successive windows, > 0. Defaults to window_size // 4. Smaller hops give finer time resolution at higher computational cost.

TYPE: int or None DEFAULT: None

window

Window function name. One of "hann" (default), "hamming", "blackman", "bartlett", "rectangular"/"ones".

TYPE: str DEFAULT: 'hann'

RETURNS DESCRIPTION
Spectrogram

Complex64 data shaped [channels, num_stft_frames, fft_size // 2].

istft

istft(
    spec: Spectrogram, window: WindowType = "hann"
) -> AudioBuffer

Inverse STFT via overlap-add with COLA normalization.

PARAMETER DESCRIPTION
spec

Output from :func:stft.

TYPE: Spectrogram

window

Window function name (must match the window used in :func:stft).

TYPE: str DEFAULT: 'hann'

RETURNS DESCRIPTION
AudioBuffer

Reconstructed audio, trimmed to the original frame count.

magnitude

magnitude(spec: Spectrogram) -> np.ndarray

Return magnitude of spectral data.

PARAMETER DESCRIPTION
spec

Input spectrogram.

TYPE: Spectrogram

RETURNS DESCRIPTION
ndarray

float32 array shaped [channels, num_frames, bins].

phase

phase(spec: Spectrogram) -> np.ndarray

Return phase angle of spectral data.

PARAMETER DESCRIPTION
spec

Input spectrogram.

TYPE: Spectrogram

RETURNS DESCRIPTION
ndarray

float32 array shaped [channels, num_frames, bins] in radians.

from_polar

from_polar(
    mag: ndarray, ph: ndarray, spec: Spectrogram
) -> Spectrogram

Reconstruct a Spectrogram from magnitude and phase arrays.

PARAMETER DESCRIPTION
mag

Magnitude array, broadcastable to spec.data.shape.

TYPE: ndarray

ph

Phase array in radians, broadcastable to spec.data.shape.

TYPE: ndarray

spec

Reference spectrogram whose metadata is copied.

TYPE: Spectrogram

RETURNS DESCRIPTION
Spectrogram

New spectrogram with mag * exp(j * ph) as data.

apply_mask

apply_mask(spec: Spectrogram, mask: ndarray) -> Spectrogram

Multiply spectral data by a real-valued mask.

PARAMETER DESCRIPTION
spec

Input spectrogram.

TYPE: Spectrogram

mask

Real-valued mask broadcastable to [channels, num_frames, bins].

TYPE: ndarray

RETURNS DESCRIPTION
Spectrogram

New spectrogram with masked data.

RAISES DESCRIPTION
ValueError

If mask cannot be broadcast to the spectrogram shape.

spectral_gate

spectral_gate(
    spec: Spectrogram,
    threshold_db: float = -40.0,
    noise_floor_db: float = -80.0,
) -> Spectrogram

Gate spectral bins below a dB threshold.

Bins whose magnitude falls below threshold_db are attenuated to noise_floor_db rather than zeroed, reducing musical noise artifacts.

PARAMETER DESCRIPTION
spec

Input spectrogram.

TYPE: Spectrogram

threshold_db

Magnitude threshold in dB. Bins at or above this pass through. Typical: -60 to -20.

TYPE: float DEFAULT: -40.0

noise_floor_db

Attenuation applied to bins below the threshold, in dB relative to the threshold. Should be < threshold_db. Typical: -80 to -40.

TYPE: float DEFAULT: -80.0

RETURNS DESCRIPTION
Spectrogram

Gated spectrogram.

spectral_emphasis

spectral_emphasis(
    spec: Spectrogram,
    low_db: float = 0.0,
    high_db: float = 0.0,
) -> Spectrogram

Apply a linear dB tilt across frequency bins.

Gain varies linearly from low_db at DC to high_db at Nyquist.

PARAMETER DESCRIPTION
spec

Input spectrogram.

TYPE: Spectrogram

low_db

Gain at DC in dB. Typical: -12 to +12.

TYPE: float DEFAULT: 0.0

high_db

Gain at Nyquist in dB. Typical: -12 to +12.

TYPE: float DEFAULT: 0.0

RETURNS DESCRIPTION
Spectrogram

Emphasized spectrogram.

bin_freq

bin_freq(spec: Spectrogram, bin_index: int) -> float

Return the center frequency in Hz of a given FFT bin.

PARAMETER DESCRIPTION
spec

Reference spectrogram.

TYPE: Spectrogram

bin_index

Bin index (0 = DC).

TYPE: int

RETURNS DESCRIPTION
float

Frequency in Hz.

freq_to_bin

freq_to_bin(spec: Spectrogram, freq_hz: float) -> int

Return the nearest FFT bin for a given frequency.

PARAMETER DESCRIPTION
spec

Reference spectrogram.

TYPE: Spectrogram

freq_hz

Frequency in Hz.

TYPE: float

RETURNS DESCRIPTION
int

Nearest bin index, clamped to [0, bins - 1].

RAISES DESCRIPTION
ValueError

If freq_hz is negative or >= Nyquist.

time_stretch

time_stretch(spec: Spectrogram, rate: float) -> Spectrogram

Phase-vocoder time stretch.

Resamples the STFT magnitude and propagates phase using instantaneous frequency estimation, following the classic phase vocoder approach.

PARAMETER DESCRIPTION
spec

Input spectrogram.

TYPE: Spectrogram

rate

Stretch rate. rate > 1 makes audio shorter (faster), rate < 1 makes audio longer (slower).

TYPE: float

RETURNS DESCRIPTION
Spectrogram

Time-stretched spectrogram with updated original_frames.

RAISES DESCRIPTION
ValueError

If rate <= 0.

References

.. [1] J. Flanagan and R. Golden, "Phase vocoder," Bell Syst. Tech. J., vol. 45, no. 9, pp. 1493--1509, 1966. .. [2] J. Laroche and M. Dolson, "Improved phase vocoder time-scale modification of audio," IEEE Trans. Speech Audio Process., vol. 7, no. 3, pp. 323--332, 1999.

phase_lock

phase_lock(spec: Spectrogram) -> Spectrogram

Identity phase-locking (Laroche & Dolson 1999).

Finds spectral peaks in each frame and propagates their phase to neighboring bins, reducing phasiness.

PARAMETER DESCRIPTION
spec

Input spectrogram.

TYPE: Spectrogram

RETURNS DESCRIPTION
Spectrogram

Phase-locked spectrogram with identical magnitudes.

spectral_freeze

spectral_freeze(
    spec: Spectrogram,
    frame_index: int = 0,
    num_frames: int | None = None,
) -> Spectrogram

Repeat a single STFT frame to produce a static ("frozen") texture.

PARAMETER DESCRIPTION
spec

Input spectrogram.

TYPE: Spectrogram

frame_index

Index of the frame to freeze. Negative indices are supported.

TYPE: int DEFAULT: 0

num_frames

Number of output STFT frames. Defaults to spec.num_frames.

TYPE: int or None DEFAULT: None

RETURNS DESCRIPTION
Spectrogram

Spectrogram with the chosen frame repeated num_frames times.

RAISES DESCRIPTION
IndexError

If frame_index is out of range.

spectral_morph

spectral_morph(
    spec_a: Spectrogram,
    spec_b: Spectrogram,
    mix: float | ndarray = 0.5,
) -> Spectrogram

Interpolate between two spectrograms in the polar domain.

Magnitudes are interpolated linearly; phases use shortest-arc circular interpolation, avoiding the cancellation artefacts of complex-valued lerp.

PARAMETER DESCRIPTION
spec_a

Input spectrograms. Must share fft_size, window_size, hop_size, and channel count. If frame counts differ the shorter length is used.

TYPE: Spectrogram

spec_b

Input spectrograms. Must share fft_size, window_size, hop_size, and channel count. If frame counts differ the shorter length is used.

TYPE: Spectrogram

mix

Blend factor. 0.0 returns spec_a, 1.0 returns spec_b. May be a scalar or an array broadcastable to [channels, num_frames, bins] for time-varying morphing.

TYPE: float or ndarray DEFAULT: 0.5

RETURNS DESCRIPTION
Spectrogram
RAISES DESCRIPTION
ValueError

If the two spectrograms have incompatible parameters.

pitch_shift_spectral

pitch_shift_spectral(
    buf: AudioBuffer,
    semitones: float,
    window_size: int = 2048,
    hop_size: int | None = None,
) -> AudioBuffer

Pitch-shift audio via phase vocoder + resampling.

Combines :func:time_stretch with linear resampling so that pitch changes without altering duration.

PARAMETER DESCRIPTION
buf

Input audio.

TYPE: AudioBuffer

semitones

Pitch shift in semitones. Positive = higher, negative = lower.

TYPE: float

window_size

STFT analysis window size.

TYPE: int DEFAULT: 2048

hop_size

STFT hop size. Defaults to window_size // 4.

TYPE: int or None DEFAULT: None

RETURNS DESCRIPTION
AudioBuffer

Pitch-shifted audio with the same duration and sample rate.

spectral_denoise

spectral_denoise(
    spec: Spectrogram,
    noise_frames: int = 10,
    reduction_db: float = -20.0,
    smoothing: int = 0,
) -> Spectrogram

Spectral noise reduction using a profile estimated from leading frames.

Computes the mean magnitude of the first noise_frames STFT frames per bin, then attenuates bins whose magnitude falls at or below that noise floor. The leading frames should ideally contain only noise.

PARAMETER DESCRIPTION
spec

Input spectrogram.

TYPE: Spectrogram

noise_frames

Number of leading STFT frames used to build the noise profile, >= 1. Typical: 5--20.

TYPE: int DEFAULT: 10

reduction_db

Attenuation in dB applied to bins at or below the noise floor. More negative = more aggressive reduction. Typical: -40 to -10.

TYPE: float DEFAULT: -20.0

smoothing

If > 0, apply a moving-average of this width (in bins) to the noise profile, reducing musical-noise artefacts. Typical: 0--5.

TYPE: int DEFAULT: 0

RETURNS DESCRIPTION
Spectrogram

Denoised spectrogram.

RAISES DESCRIPTION
ValueError

If noise_frames < 1 or exceeds the number of available frames.

eq_match

eq_match(
    buf: AudioBuffer,
    target: AudioBuffer,
    window_size: int = 4096,
    smoothing: int = 0,
) -> AudioBuffer

Match the spectral envelope of buf to target.

PARAMETER DESCRIPTION
buf

Source audio to be adjusted.

TYPE: AudioBuffer

target

Reference audio whose spectral envelope is matched.

TYPE: AudioBuffer

window_size

STFT window size, > 0. Typical: 2048--8192.

TYPE: int DEFAULT: 4096

smoothing

If > 0, apply a moving-average of this width (in bins) to the correction curve. Typical: 0--20.

TYPE: int DEFAULT: 0

RAISES DESCRIPTION
ValueError

If sample rates or channel counts differ.