Spectral¶
Short-time Fourier transform, spectral transforms, and EQ matching.
Usage examples¶
STFT round-trip¶
from nanodsp import spectral
from nanodsp.buffer import AudioBuffer
buf = AudioBuffer.from_file("input.wav")
# Analyze
spec = spectral.stft(buf, window_size=2048, hop_size=512)
# Inspect
mag = spectral.magnitude(spec) # magnitude array
ph = spectral.phase(spec) # phase array
print(f"Frames: {spec.num_frames}, Bins: {spec.bins}")
# Reconstruct
reconstructed = spectral.istft(spec)
Window types¶
# Available: "hann", "hamming", "blackman", "bartlett", "rectangular"
spec = spectral.stft(buf, window_size=2048, window="blackman")
out = spectral.istft(spec, window="blackman")
Spectral processing¶
spec = spectral.stft(buf, window_size=2048, hop_size=512)
# Gate: silence bins below threshold
cleaned = spectral.spectral_gate(spec, threshold_db=-40.0)
# Tilt EQ: boost highs, cut lows (or vice versa)
tilted = spectral.spectral_emphasis(spec, low_db=-3.0, high_db=3.0)
# Convert between polar and complex
mag = spectral.magnitude(spec)
ph = spectral.phase(spec)
spec2 = spectral.from_polar(mag, ph, spec)
# Apply a binary mask
import numpy as np
mask = np.ones_like(mag)
mask[:, :, :10] = 0.0 # zero first 10 bins
masked = spectral.apply_mask(spec, mask)
Time stretching and pitch shifting¶
buf = AudioBuffer.from_file("input.wav")
# Slow down to half speed (double duration)
spec = spectral.stft(buf, window_size=2048, hop_size=512)
stretched = spectral.time_stretch(spec, rate=0.5)
slow = spectral.istft(stretched)
# Pitch shift up 5 semitones (preserves duration)
shifted = spectral.pitch_shift_spectral(buf, semitones=5.0)
Spectral effects¶
spec = spectral.stft(buf, window_size=2048, hop_size=512)
# Freeze a single frame into a sustained texture
frozen = spectral.spectral_freeze(spec, frame_index=10, num_frames=200)
# Morph between two sounds
spec_a = spectral.stft(buf_a, window_size=2048, hop_size=512)
spec_b = spectral.stft(buf_b, window_size=2048, hop_size=512)
morphed = spectral.spectral_morph(spec_a, spec_b, mix=0.5)
# Phase locking (identity phase-lock for cleaner stretching)
locked = spectral.phase_lock(spec)
Noise reduction¶
# Assumes first 10 frames are noise-only
spec = spectral.stft(buf, window_size=2048, hop_size=512)
denoised = spectral.spectral_denoise(spec, noise_frames=10, reduction_db=-20.0)
clean = spectral.istft(denoised)
EQ matching¶
# Make source sound like target in tonal balance
matched = spectral.eq_match(source_buf, target_buf, window_size=4096)
Frequency / bin conversion¶
spec = spectral.stft(buf, window_size=2048)
freq = spectral.bin_freq(spec, bin_index=10) # bin -> Hz
b = spectral.freq_to_bin(spec, freq_hz=1000.0) # Hz -> bin
API reference¶
spectral
¶
STFT, spectral transforms, and EQ matching.
stft
¶
stft(
buf: AudioBuffer,
window_size: int = 2048,
hop_size: int | None = None,
window: WindowType = "hann",
) -> Spectrogram
Short-time Fourier transform using windowed RealFFT + overlap.
| PARAMETER | DESCRIPTION |
|---|---|
buf
|
Input audio.
TYPE:
|
window_size
|
Analysis window length in samples, must be > 0. Typical: 512--4096.
TYPE:
|
hop_size
|
Hop between successive windows, > 0. Defaults to
TYPE:
|
window
|
Window function name. One of
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Spectrogram
|
Complex64 data shaped |
istft
¶
Inverse STFT via overlap-add with COLA normalization.
| PARAMETER | DESCRIPTION |
|---|---|
spec
|
Output from :func:
TYPE:
|
window
|
Window function name (must match the window used in :func:
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
AudioBuffer
|
Reconstructed audio, trimmed to the original frame count. |
magnitude
¶
Return magnitude of spectral data.
| PARAMETER | DESCRIPTION |
|---|---|
spec
|
Input spectrogram.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
ndarray
|
float32 array shaped |
phase
¶
Return phase angle of spectral data.
| PARAMETER | DESCRIPTION |
|---|---|
spec
|
Input spectrogram.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
ndarray
|
float32 array shaped |
from_polar
¶
Reconstruct a Spectrogram from magnitude and phase arrays.
| PARAMETER | DESCRIPTION |
|---|---|
mag
|
Magnitude array, broadcastable to
TYPE:
|
ph
|
Phase array in radians, broadcastable to
TYPE:
|
spec
|
Reference spectrogram whose metadata is copied.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Spectrogram
|
New spectrogram with |
apply_mask
¶
Multiply spectral data by a real-valued mask.
| PARAMETER | DESCRIPTION |
|---|---|
spec
|
Input spectrogram.
TYPE:
|
mask
|
Real-valued mask broadcastable to
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Spectrogram
|
New spectrogram with masked data. |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If mask cannot be broadcast to the spectrogram shape. |
spectral_gate
¶
spectral_gate(
spec: Spectrogram,
threshold_db: float = -40.0,
noise_floor_db: float = -80.0,
) -> Spectrogram
Gate spectral bins below a dB threshold.
Bins whose magnitude falls below threshold_db are attenuated to noise_floor_db rather than zeroed, reducing musical noise artifacts.
| PARAMETER | DESCRIPTION |
|---|---|
spec
|
Input spectrogram.
TYPE:
|
threshold_db
|
Magnitude threshold in dB. Bins at or above this pass through. Typical: -60 to -20.
TYPE:
|
noise_floor_db
|
Attenuation applied to bins below the threshold, in dB relative to the threshold. Should be < threshold_db. Typical: -80 to -40.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Spectrogram
|
Gated spectrogram. |
spectral_emphasis
¶
Apply a linear dB tilt across frequency bins.
Gain varies linearly from low_db at DC to high_db at Nyquist.
| PARAMETER | DESCRIPTION |
|---|---|
spec
|
Input spectrogram.
TYPE:
|
low_db
|
Gain at DC in dB. Typical: -12 to +12.
TYPE:
|
high_db
|
Gain at Nyquist in dB. Typical: -12 to +12.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Spectrogram
|
Emphasized spectrogram. |
bin_freq
¶
Return the center frequency in Hz of a given FFT bin.
| PARAMETER | DESCRIPTION |
|---|---|
spec
|
Reference spectrogram.
TYPE:
|
bin_index
|
Bin index (0 = DC).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
float
|
Frequency in Hz. |
freq_to_bin
¶
Return the nearest FFT bin for a given frequency.
| PARAMETER | DESCRIPTION |
|---|---|
spec
|
Reference spectrogram.
TYPE:
|
freq_hz
|
Frequency in Hz.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
int
|
Nearest bin index, clamped to |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If freq_hz is negative or >= Nyquist. |
time_stretch
¶
Phase-vocoder time stretch.
Resamples the STFT magnitude and propagates phase using instantaneous frequency estimation, following the classic phase vocoder approach.
| PARAMETER | DESCRIPTION |
|---|---|
spec
|
Input spectrogram.
TYPE:
|
rate
|
Stretch rate.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Spectrogram
|
Time-stretched spectrogram with updated |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If rate <= 0. |
References
.. [1] J. Flanagan and R. Golden, "Phase vocoder," Bell Syst. Tech. J., vol. 45, no. 9, pp. 1493--1509, 1966. .. [2] J. Laroche and M. Dolson, "Improved phase vocoder time-scale modification of audio," IEEE Trans. Speech Audio Process., vol. 7, no. 3, pp. 323--332, 1999.
phase_lock
¶
Identity phase-locking (Laroche & Dolson 1999).
Finds spectral peaks in each frame and propagates their phase to neighboring bins, reducing phasiness.
| PARAMETER | DESCRIPTION |
|---|---|
spec
|
Input spectrogram.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Spectrogram
|
Phase-locked spectrogram with identical magnitudes. |
spectral_freeze
¶
spectral_freeze(
spec: Spectrogram,
frame_index: int = 0,
num_frames: int | None = None,
) -> Spectrogram
Repeat a single STFT frame to produce a static ("frozen") texture.
| PARAMETER | DESCRIPTION |
|---|---|
spec
|
Input spectrogram.
TYPE:
|
frame_index
|
Index of the frame to freeze. Negative indices are supported.
TYPE:
|
num_frames
|
Number of output STFT frames. Defaults to
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Spectrogram
|
Spectrogram with the chosen frame repeated num_frames times. |
| RAISES | DESCRIPTION |
|---|---|
IndexError
|
If frame_index is out of range. |
spectral_morph
¶
spectral_morph(
spec_a: Spectrogram,
spec_b: Spectrogram,
mix: float | ndarray = 0.5,
) -> Spectrogram
Interpolate between two spectrograms in the polar domain.
Magnitudes are interpolated linearly; phases use shortest-arc circular interpolation, avoiding the cancellation artefacts of complex-valued lerp.
| PARAMETER | DESCRIPTION |
|---|---|
spec_a
|
Input spectrograms. Must share
TYPE:
|
spec_b
|
Input spectrograms. Must share
TYPE:
|
mix
|
Blend factor.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Spectrogram
|
|
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If the two spectrograms have incompatible parameters. |
pitch_shift_spectral
¶
pitch_shift_spectral(
buf: AudioBuffer,
semitones: float,
window_size: int = 2048,
hop_size: int | None = None,
) -> AudioBuffer
Pitch-shift audio via phase vocoder + resampling.
Combines :func:time_stretch with linear resampling so that pitch
changes without altering duration.
| PARAMETER | DESCRIPTION |
|---|---|
buf
|
Input audio.
TYPE:
|
semitones
|
Pitch shift in semitones. Positive = higher, negative = lower.
TYPE:
|
window_size
|
STFT analysis window size.
TYPE:
|
hop_size
|
STFT hop size. Defaults to
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
AudioBuffer
|
Pitch-shifted audio with the same duration and sample rate. |
spectral_denoise
¶
spectral_denoise(
spec: Spectrogram,
noise_frames: int = 10,
reduction_db: float = -20.0,
smoothing: int = 0,
) -> Spectrogram
Spectral noise reduction using a profile estimated from leading frames.
Computes the mean magnitude of the first noise_frames STFT frames per bin, then attenuates bins whose magnitude falls at or below that noise floor. The leading frames should ideally contain only noise.
| PARAMETER | DESCRIPTION |
|---|---|
spec
|
Input spectrogram.
TYPE:
|
noise_frames
|
Number of leading STFT frames used to build the noise profile, >= 1. Typical: 5--20.
TYPE:
|
reduction_db
|
Attenuation in dB applied to bins at or below the noise floor. More negative = more aggressive reduction. Typical: -40 to -10.
TYPE:
|
smoothing
|
If > 0, apply a moving-average of this width (in bins) to the noise profile, reducing musical-noise artefacts. Typical: 0--5.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Spectrogram
|
Denoised spectrogram. |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If noise_frames < 1 or exceeds the number of available frames. |
eq_match
¶
eq_match(
buf: AudioBuffer,
target: AudioBuffer,
window_size: int = 4096,
smoothing: int = 0,
) -> AudioBuffer
Match the spectral envelope of buf to target.
| PARAMETER | DESCRIPTION |
|---|---|
buf
|
Source audio to be adjusted.
TYPE:
|
target
|
Reference audio whose spectral envelope is matched.
TYPE:
|
window_size
|
STFT window size, > 0. Typical: 2048--8192.
TYPE:
|
smoothing
|
If > 0, apply a moving-average of this width (in bins) to the correction curve. Typical: 0--20.
TYPE:
|
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If sample rates or channel counts differ. |