Spectral¶

Short-time Fourier transform, spectral transforms, and EQ matching.

Usage examples¶

STFT round-trip¶

from nanodsp import spectral
from nanodsp.buffer import AudioBuffer

buf = AudioBuffer.from_file("input.wav")

# Analyze
spec = spectral.stft(buf, window_size=2048, hop_size=512)

# Inspect
mag = spectral.magnitude(spec)    # magnitude array
ph = spectral.phase(spec)         # phase array
print(f"Frames: {spec.num_frames}, Bins: {spec.bins}")

# Reconstruct
reconstructed = spectral.istft(spec)

Window types¶

# Available: "hann", "hamming", "blackman", "bartlett", "rectangular"
spec = spectral.stft(buf, window_size=2048, window="blackman")
out = spectral.istft(spec, window="blackman")

Spectral processing¶

spec = spectral.stft(buf, window_size=2048, hop_size=512)

# Gate: silence bins below threshold
cleaned = spectral.spectral_gate(spec, threshold_db=-40.0)

# Tilt EQ: boost highs, cut lows (or vice versa)
tilted = spectral.spectral_emphasis(spec, low_db=-3.0, high_db=3.0)

# Convert between polar and complex
mag = spectral.magnitude(spec)
ph = spectral.phase(spec)
spec2 = spectral.from_polar(mag, ph, spec)

# Apply a binary mask
import numpy as np
mask = np.ones_like(mag)
mask[:, :, :10] = 0.0   # zero first 10 bins
masked = spectral.apply_mask(spec, mask)

Time stretching and pitch shifting¶

buf = AudioBuffer.from_file("input.wav")

# Slow down to half speed (double duration)
spec = spectral.stft(buf, window_size=2048, hop_size=512)
stretched = spectral.time_stretch(spec, rate=0.5)
slow = spectral.istft(stretched)

# Pitch shift up 5 semitones (preserves duration)
shifted = spectral.pitch_shift_spectral(buf, semitones=5.0)

Spectral effects¶

spec = spectral.stft(buf, window_size=2048, hop_size=512)

# Freeze a single frame into a sustained texture
frozen = spectral.spectral_freeze(spec, frame_index=10, num_frames=200)

# Morph between two sounds
spec_a = spectral.stft(buf_a, window_size=2048, hop_size=512)
spec_b = spectral.stft(buf_b, window_size=2048, hop_size=512)
morphed = spectral.spectral_morph(spec_a, spec_b, mix=0.5)

# Phase locking (identity phase-lock for cleaner stretching)
locked = spectral.phase_lock(spec)

Noise reduction¶

# Assumes first 10 frames are noise-only
spec = spectral.stft(buf, window_size=2048, hop_size=512)
denoised = spectral.spectral_denoise(spec, noise_frames=10, reduction_db=-20.0)
clean = spectral.istft(denoised)

EQ matching¶

# Make source sound like target in tonal balance
matched = spectral.eq_match(source_buf, target_buf, window_size=4096)

Frequency / bin conversion¶

spec = spectral.stft(buf, window_size=2048)

freq = spectral.bin_freq(spec, bin_index=10)     # bin -> Hz
b = spectral.freq_to_bin(spec, freq_hz=1000.0)   # Hz -> bin

API reference¶

spectral ¶

STFT, spectral transforms, and EQ matching.

stft ¶

stft(
    buf: AudioBuffer,
    window_size: int = 2048,
    hop_size: int | None = None,
    window: WindowType = "hann",
) -> Spectrogram

Short-time Fourier transform using windowed RealFFT + overlap.

PARAMETER	DESCRIPTION
`buf`	Input audio. TYPE: `AudioBuffer`
`window_size`	Analysis window length in samples, must be > 0. Typical: 512--4096. TYPE: `int` DEFAULT: `2048`
`hop_size`	Hop between successive windows, > 0. Defaults to `window_size // 4`. Smaller hops give finer time resolution at higher computational cost. TYPE: `int or None` DEFAULT: `None`
`window`	Window function name. One of `"hann"` (default), `"hamming"`, `"blackman"`, `"bartlett"`, `"rectangular"`/`"ones"`. TYPE: `str` DEFAULT: `'hann'`

RETURNS	DESCRIPTION
`Spectrogram`	Complex64 data shaped `[channels, num_stft_frames, fft_size // 2]`.

istft ¶

istft(
    spec: Spectrogram, window: WindowType = "hann"
) -> AudioBuffer

Inverse STFT via overlap-add with COLA normalization.

PARAMETER	DESCRIPTION
`spec`	Output from :func:`stft`. TYPE: `Spectrogram`
`window`	Window function name (must match the window used in :func:`stft`). TYPE: `str` DEFAULT: `'hann'`

RETURNS	DESCRIPTION
`AudioBuffer`	Reconstructed audio, trimmed to the original frame count.

magnitude ¶

magnitude(spec: Spectrogram) -> np.ndarray

Return magnitude of spectral data.

PARAMETER	DESCRIPTION
`spec`	Input spectrogram. TYPE: `Spectrogram`

RETURNS	DESCRIPTION
`ndarray`	float32 array shaped `[channels, num_frames, bins]`.

phase ¶

phase(spec: Spectrogram) -> np.ndarray

Return phase angle of spectral data.

PARAMETER	DESCRIPTION
`spec`	Input spectrogram. TYPE: `Spectrogram`

RETURNS	DESCRIPTION
`ndarray`	float32 array shaped `[channels, num_frames, bins]` in radians.

from_polar ¶

from_polar(
    mag: ndarray, ph: ndarray, spec: Spectrogram
) -> Spectrogram

Reconstruct a Spectrogram from magnitude and phase arrays.

PARAMETER	DESCRIPTION
`mag`	Magnitude array, broadcastable to `spec.data.shape`. TYPE: `ndarray`
`ph`	Phase array in radians, broadcastable to `spec.data.shape`. TYPE: `ndarray`
`spec`	Reference spectrogram whose metadata is copied. TYPE: `Spectrogram`

RETURNS	DESCRIPTION
`Spectrogram`	New spectrogram with `mag * exp(j * ph)` as data.

apply_mask ¶

apply_mask(spec: Spectrogram, mask: ndarray) -> Spectrogram

Multiply spectral data by a real-valued mask.

PARAMETER	DESCRIPTION
`spec`	Input spectrogram. TYPE: `Spectrogram`
`mask`	Real-valued mask broadcastable to `[channels, num_frames, bins]`. TYPE: `ndarray`

RETURNS	DESCRIPTION
`Spectrogram`	New spectrogram with masked data.

RAISES	DESCRIPTION
`ValueError`	If mask cannot be broadcast to the spectrogram shape.

spectral_gate ¶

spectral_gate(
    spec: Spectrogram,
    threshold_db: float = -40.0,
    noise_floor_db: float = -80.0,
) -> Spectrogram

Gate spectral bins below a dB threshold.

Bins whose magnitude falls below threshold_db are attenuated to noise_floor_db rather than zeroed, reducing musical noise artifacts.

PARAMETER	DESCRIPTION
`spec`	Input spectrogram. TYPE: `Spectrogram`
`threshold_db`	Magnitude threshold in dB. Bins at or above this pass through. Typical: -60 to -20. TYPE: `float` DEFAULT: `-40.0`
`noise_floor_db`	Attenuation applied to bins below the threshold, in dB relative to the threshold. Should be < threshold_db. Typical: -80 to -40. TYPE: `float` DEFAULT: `-80.0`

RETURNS	DESCRIPTION
`Spectrogram`	Gated spectrogram.

spectral_emphasis ¶

spectral_emphasis(
    spec: Spectrogram,
    low_db: float = 0.0,
    high_db: float = 0.0,
) -> Spectrogram

Apply a linear dB tilt across frequency bins.

Gain varies linearly from low_db at DC to high_db at Nyquist.

PARAMETER	DESCRIPTION
`spec`	Input spectrogram. TYPE: `Spectrogram`
`low_db`	Gain at DC in dB. Typical: -12 to +12. TYPE: `float` DEFAULT: `0.0`
`high_db`	Gain at Nyquist in dB. Typical: -12 to +12. TYPE: `float` DEFAULT: `0.0`

RETURNS	DESCRIPTION
`Spectrogram`	Emphasized spectrogram.

bin_freq ¶

bin_freq(spec: Spectrogram, bin_index: int) -> float

Return the center frequency in Hz of a given FFT bin.

PARAMETER	DESCRIPTION
`spec`	Reference spectrogram. TYPE: `Spectrogram`
`bin_index`	Bin index (0 = DC). TYPE: `int`

RETURNS	DESCRIPTION
`float`	Frequency in Hz.

freq_to_bin ¶

freq_to_bin(spec: Spectrogram, freq_hz: float) -> int

Return the nearest FFT bin for a given frequency.

PARAMETER	DESCRIPTION
`spec`	Reference spectrogram. TYPE: `Spectrogram`
`freq_hz`	Frequency in Hz. TYPE: `float`

RETURNS	DESCRIPTION
`int`	Nearest bin index, clamped to `[0, bins - 1]`.

RAISES	DESCRIPTION
`ValueError`	If freq_hz is negative or >= Nyquist.

time_stretch ¶

time_stretch(spec: Spectrogram, rate: float) -> Spectrogram

Phase-vocoder time stretch.

Resamples the STFT magnitude and propagates phase using instantaneous frequency estimation, following the classic phase vocoder approach.

PARAMETER	DESCRIPTION
`spec`	Input spectrogram. TYPE: `Spectrogram`
`rate`	Stretch rate. `rate > 1` makes audio shorter (faster), `rate < 1` makes audio longer (slower). TYPE: `float`

RETURNS	DESCRIPTION
`Spectrogram`	Time-stretched spectrogram with updated `original_frames`.

RAISES	DESCRIPTION
`ValueError`	If rate <= 0.

References

.. [1] J. Flanagan and R. Golden, "Phase vocoder," Bell Syst. Tech. J., vol. 45, no. 9, pp. 1493--1509, 1966. .. [2] J. Laroche and M. Dolson, "Improved phase vocoder time-scale modification of audio," IEEE Trans. Speech Audio Process., vol. 7, no. 3, pp. 323--332, 1999.

phase_lock ¶

phase_lock(spec: Spectrogram) -> Spectrogram

Identity phase-locking (Laroche & Dolson 1999).

Finds spectral peaks in each frame and propagates their phase to neighboring bins, reducing phasiness.

PARAMETER	DESCRIPTION
`spec`	Input spectrogram. TYPE: `Spectrogram`

RETURNS	DESCRIPTION
`Spectrogram`	Phase-locked spectrogram with identical magnitudes.

spectral_freeze ¶

spectral_freeze(
    spec: Spectrogram,
    frame_index: int = 0,
    num_frames: int | None = None,
) -> Spectrogram

Repeat a single STFT frame to produce a static ("frozen") texture.

PARAMETER	DESCRIPTION
`spec`	Input spectrogram. TYPE: `Spectrogram`
`frame_index`	Index of the frame to freeze. Negative indices are supported. TYPE: `int` DEFAULT: `0`
`num_frames`	Number of output STFT frames. Defaults to `spec.num_frames`. TYPE: `int or None` DEFAULT: `None`

RETURNS	DESCRIPTION
`Spectrogram`	Spectrogram with the chosen frame repeated num_frames times.

RAISES	DESCRIPTION
`IndexError`	If frame_index is out of range.

spectral_morph ¶

spectral_morph(
    spec_a: Spectrogram,
    spec_b: Spectrogram,
    mix: float | ndarray = 0.5,
) -> Spectrogram

Interpolate between two spectrograms in the polar domain.

Magnitudes are interpolated linearly; phases use shortest-arc circular interpolation, avoiding the cancellation artefacts of complex-valued lerp.

PARAMETER	DESCRIPTION
`spec_a`	Input spectrograms. Must share `fft_size`, `window_size`, `hop_size`, and channel count. If frame counts differ the shorter length is used. TYPE: `Spectrogram`
`spec_b`	Input spectrograms. Must share `fft_size`, `window_size`, `hop_size`, and channel count. If frame counts differ the shorter length is used. TYPE: `Spectrogram`
`mix`	Blend factor. `0.0` returns spec_a, `1.0` returns spec_b. May be a scalar or an array broadcastable to `[channels, num_frames, bins]` for time-varying morphing. TYPE: `float or ndarray` DEFAULT: `0.5`

RETURNS	DESCRIPTION
`Spectrogram`

RAISES	DESCRIPTION
`ValueError`	If the two spectrograms have incompatible parameters.

pitch_shift_spectral ¶

pitch_shift_spectral(
    buf: AudioBuffer,
    semitones: float,
    window_size: int = 2048,
    hop_size: int | None = None,
) -> AudioBuffer

Pitch-shift audio via phase vocoder + resampling.

Combines :func:time_stretch with linear resampling so that pitch changes without altering duration.

PARAMETER	DESCRIPTION
`buf`	Input audio. TYPE: `AudioBuffer`
`semitones`	Pitch shift in semitones. Positive = higher, negative = lower. TYPE: `float`
`window_size`	STFT analysis window size. TYPE: `int` DEFAULT: `2048`
`hop_size`	STFT hop size. Defaults to `window_size // 4`. TYPE: `int or None` DEFAULT: `None`

RETURNS	DESCRIPTION
`AudioBuffer`	Pitch-shifted audio with the same duration and sample rate.

spectral_denoise ¶

spectral_denoise(
    spec: Spectrogram,
    noise_frames: int = 10,
    reduction_db: float = -20.0,
    smoothing: int = 0,
) -> Spectrogram

Spectral noise reduction using a profile estimated from leading frames.

Computes the mean magnitude of the first noise_frames STFT frames per bin, then attenuates bins whose magnitude falls at or below that noise floor. The leading frames should ideally contain only noise.

PARAMETER	DESCRIPTION
`spec`	Input spectrogram. TYPE: `Spectrogram`
`noise_frames`	Number of leading STFT frames used to build the noise profile, >= 1. Typical: 5--20. TYPE: `int` DEFAULT: `10`
`reduction_db`	Attenuation in dB applied to bins at or below the noise floor. More negative = more aggressive reduction. Typical: -40 to -10. TYPE: `float` DEFAULT: `-20.0`
`smoothing`	If > 0, apply a moving-average of this width (in bins) to the noise profile, reducing musical-noise artefacts. Typical: 0--5. TYPE: `int` DEFAULT: `0`

RETURNS	DESCRIPTION
`Spectrogram`	Denoised spectrogram.

RAISES	DESCRIPTION
`ValueError`	If noise_frames < 1 or exceeds the number of available frames.

eq_match ¶

eq_match(
    buf: AudioBuffer,
    target: AudioBuffer,
    window_size: int = 4096,
    smoothing: int = 0,
) -> AudioBuffer

Match the spectral envelope of buf to target.

PARAMETER	DESCRIPTION
`buf`	Source audio to be adjusted. TYPE: `AudioBuffer`
`target`	Reference audio whose spectral envelope is matched. TYPE: `AudioBuffer`
`window_size`	STFT window size, > 0. Typical: 2048--8192. TYPE: `int` DEFAULT: `4096`
`smoothing`	If > 0, apply a moving-average of this width (in bins) to the correction curve. Typical: 0--20. TYPE: `int` DEFAULT: `0`

RAISES	DESCRIPTION
`ValueError`	If sample rates or channel counts differ.