Backend operations of Kapre.

This module summarizes operations and functions that are used in Kapre layers.


‘channels_first’, a pre-defined string.


‘channels_last’, a pre-defined string.


‘default’, a pre-defined string.


Return a window function given its name. This function is used inside layers such as STFT to get a window function.

  • window_name (None or str) – name of window function. On Tensorflow 2.3, there are five windows available in
  • tf.signal (hamming_window, hann_window, kaiser_bessel_derived_window, kaiser_window, vorbis_window) –

A function that validates the data format string.

kapre.backend.magnitude_to_decibel(x, ref_value=1.0, amin=1e-05, dynamic_range=80.0)[source]

A function that converts magnitude to decibel scaling. In essence, it runs 10 * log10(x), but with some other utility operations.

Similar to librosa.power_to_db with ref=1.0 and top_db=dynamic_range

  • x (Tensor) – float tensor. Can be batch or not. Something like magnitude of STFT.
  • ref_value (float) – an input value that would become 0 dB in the result. For spectrogram magnitudes, ref_value=1.0 usually make the decibel-scaled output to be around zero if the input audio was in [-1, 1].
  • amin (float) – the noise floor of the input. An input that is smaller than amin, it’s converted to amin.
  • dynamic_range (float) – range of the resulting value. E.g., if the maximum magnitude is 30 dB, the noise floor of the output would become (30 - dynamic_range) dB

a decibel-scaled version of x.

Return type:

log_spec (Tensor)


In many deep learning based application, the input spectrogram magnitudes (e.g., abs(STFT)) are decibel-scaled (=logarithmically mapped) for a better performance.


input_shape = (2048, 1)  # mono signal
model = Sequential()
model.add(kapre.Frame(frame_length=1024, hop_length=512, input_shape=input_shape))
# now the shape is (batch, n_frame=3, frame_length=1024, ch=1)
kapre.backend.filterbank_mel(sample_rate, n_freq, n_mels=128, f_min=0.0, f_max=None, htk=False, norm='slaney')[source]

A wrapper for librosa.filters.mel that additionally does transpose and tensor conversion

  • sample_rate (int) – sample rate of the input audio
  • n_freq (int) – number of frequency bins in the input STFT magnitude.
  • n_mels (int) – the number of mel bands
  • f_min (float) – lowest frequency that is going to be included in the mel filterbank (Hertz)
  • f_max (float) – highest frequency that is going to be included in the mel filterbank (Hertz)
  • htk (bool) – whether to use htk formula or not
  • norm – The default, ‘slaney’, would normalize the the mel weights by the width of the mel band.

mel filterbanks. Shape=`(n_freq, n_mels)`

Return type:


kapre.backend.filterbank_log(sample_rate, n_freq, n_bins=84, bins_per_octave=12, f_min=None, spread=0.125)[source]

A function that returns a approximation of constant-Q filter banks for a fixed-window STFT. Each filter is a log-normal window centered at the corresponding frequency.

  • sample_rate (int) – audio sampling rate
  • n_freq (int) – number of the input frequency bins. E.g., n_fft / 2 + 1
  • n_bins (int) – number of the resulting log-frequency bins. Defaults to 84 (7 octaves).
  • bins_per_octave (int) – number of bins per octave. Defaults to 12 (semitones).
  • f_min (float) – lowest frequency that is going to be included in the log filterbank. Defaults to C1 ~= 32.70
  • spread (float) – spread of each filter, as a fraction of a bin.

log-frequency filterbanks. Shape=`(n_freq, n_bins)`

Return type:



The code is originally from logfrequency in librosa 0.4 (deprecated) and copy-and-pasted. tuning parameter was removed and we use n_freq instead of n_fft.

kapre.backend.mu_law_encoding(signal, quantization_channels)[source]

Encode signal based on mu-law companding. Also called mu-law compressing.

This algorithm assumes the signal has been scaled to between -1 and 1 and returns a signal encoded with values from 0 to quantization_channels - 1. See Wikipedia for more details.

  • signal (float Tensor) – audio signal to encode
  • quantization_channels (positive int) – Number of channels. For 8-bit encoding, use 256.

mu-encoded signal

Return type:

signal_mu (int Tensor)

kapre.backend.mu_law_decoding(signal_mu, quantization_channels)[source]

Decode mu-law encoded signals based on mu-law companding. Also called mu-law expanding.

See Wikipedia for more details.

  • signal_mu (int Tensor) – mu-encoded signal to decode
  • quantization_channels (positive int) – Number of channels. For 8-bit encoding, use 256.

decoded audio signal

Return type:

signal (float Tensor)