backend¶
Backend operations of Kapre.
This module summarizes operations and functions that are used in Kapre layers.

kapre.backend.
_CH_FIRST_STR
¶ ‘channels_first’, a predefined string.
Type: str

kapre.backend.
_CH_LAST_STR
¶ ‘channels_last’, a predefined string.
Type: str

kapre.backend.
_CH_DEFAULT_STR
¶ ‘default’, a predefined string.
Type: str

kapre.backend.
get_window_fn
(window_name=None)[source]¶ Return a window function given its name. This function is used inside layers such as STFT to get a window function.
Parameters:  window_name (None or str) – name of window function. On Tensorflow 2.3, there are five windows available in
 tf.signal (hamming_window, hann_window, kaiser_bessel_derived_window, kaiser_window, vorbis_window) –

kapre.backend.
validate_data_format_str
(data_format)[source]¶ A function that validates the data format string.

kapre.backend.
magnitude_to_decibel
(x, ref_value=1.0, amin=1e05, dynamic_range=80.0)[source]¶ A function that converts magnitude to decibel scaling. In essence, it runs 10 * log10(x), but with some other utility operations.
Similar to librosa.power_to_db with ref=1.0 and top_db=dynamic_range
Parameters:  x (Tensor) – float tensor. Can be batch or not. Something like magnitude of STFT.
 ref_value (float) – an input value that would become 0 dB in the result. For spectrogram magnitudes, ref_value=1.0 usually make the decibelscaled output to be around zero if the input audio was in [1, 1].
 amin (float) – the noise floor of the input. An input that is smaller than amin, it’s converted to amin.
 dynamic_range (float) – range of the resulting value. E.g., if the maximum magnitude is 30 dB, the noise floor of the output would become (30  dynamic_range) dB
Returns: a decibelscaled version of x.
Return type: log_spec (Tensor)
Note
In many deep learning based application, the input spectrogram magnitudes (e.g., abs(STFT)) are decibelscaled (=logarithmically mapped) for a better performance.
Example
input_shape = (2048, 1) # mono signal model = Sequential() model.add(kapre.Frame(frame_length=1024, hop_length=512, input_shape=input_shape)) # now the shape is (batch, n_frame=3, frame_length=1024, ch=1)

kapre.backend.
filterbank_mel
(sample_rate, n_freq, n_mels=128, f_min=0.0, f_max=None, htk=False, norm='slaney')[source]¶ A wrapper for librosa.filters.mel that additionally does transpose and tensor conversion
Parameters:  sample_rate (int) – sample rate of the input audio
 n_freq (int) – number of frequency bins in the input STFT magnitude.
 n_mels (int) – the number of mel bands
 f_min (float) – lowest frequency that is going to be included in the mel filterbank (Hertz)
 f_max (float) – highest frequency that is going to be included in the mel filterbank (Hertz)
 htk (bool) – whether to use htk formula or not
 norm – The default, ‘slaney’, would normalize the the mel weights by the width of the mel band.
Returns: mel filterbanks. Shape=`(n_freq, n_mels)`
Return type: (Tensor)

kapre.backend.
filterbank_log
(sample_rate, n_freq, n_bins=84, bins_per_octave=12, f_min=None, spread=0.125)[source]¶ A function that returns a approximation of constantQ filter banks for a fixedwindow STFT. Each filter is a lognormal window centered at the corresponding frequency.
Parameters:  sample_rate (int) – audio sampling rate
 n_freq (int) – number of the input frequency bins. E.g., n_fft / 2 + 1
 n_bins (int) – number of the resulting logfrequency bins. Defaults to 84 (7 octaves).
 bins_per_octave (int) – number of bins per octave. Defaults to 12 (semitones).
 f_min (float) – lowest frequency that is going to be included in the log filterbank. Defaults to C1 ~= 32.70
 spread (float) – spread of each filter, as a fraction of a bin.
Returns: logfrequency filterbanks. Shape=`(n_freq, n_bins)`
Return type: (Tensor)
Note
The code is originally from logfrequency in librosa 0.4 (deprecated) and copyandpasted. tuning parameter was removed and we use n_freq instead of n_fft.

kapre.backend.
mu_law_encoding
(signal, quantization_channels)[source]¶ Encode signal based on mulaw companding. Also called mulaw compressing.
This algorithm assumes the signal has been scaled to between 1 and 1 and returns a signal encoded with values from 0 to quantization_channels  1. See Wikipedia for more details.
Parameters:  signal (float Tensor) – audio signal to encode
 quantization_channels (positive int) – Number of channels. For 8bit encoding, use 256.
Returns: muencoded signal
Return type: signal_mu (int Tensor)

kapre.backend.
mu_law_decoding
(signal_mu, quantization_channels)[source]¶ Decode mulaw encoded signals based on mulaw companding. Also called mulaw expanding.
See Wikipedia for more details.
Parameters:  signal_mu (int Tensor) – muencoded signal to decode
 quantization_channels (positive int) – Number of channels. For 8bit encoding, use 256.
Returns: decoded audio signal
Return type: signal (float Tensor)