backend¶

Backend operations of Kapre.

This module summarizes operations and functions that are used in Kapre layers.

kapre.backend._CH_FIRST_STR¶

‘channels_first’, a pre-defined string.

Type:	str

kapre.backend._CH_LAST_STR¶

‘channels_last’, a pre-defined string.

Type:	str

kapre.backend._CH_DEFAULT_STR¶

‘default’, a pre-defined string.

Type:	str

kapre.backend.get_window_fn(window_name=None)[source]¶

Return a window function given its name. This function is used inside layers such as STFT to get a window function.

Parameters:	window_name (None or str) – name of window function. On Tensorflow 2.3, there are five windows available in tf.signal (hamming_window, hann_window, kaiser_bessel_derived_window, kaiser_window, vorbis_window) –

kapre.backend.validate_data_format_str(data_format)[source]¶: A function that validates the data format string.

kapre.backend.magnitude_to_decibel(x, ref_value=1.0, amin=1e-05, dynamic_range=80.0)[source]¶

A function that converts magnitude to decibel scaling. In essence, it runs 10 * log10(x), but with some other utility operations.

Similar to librosa.power_to_db with ref=1.0 and top_db=dynamic_range

Parameters:	x (Tensor) – float tensor. Can be batch or not. Something like magnitude of STFT. ref_value (float) – an input value that would become 0 dB in the result. For spectrogram magnitudes, ref_value=1.0 usually make the decibel-scaled output to be around zero if the input audio was in [-1, 1]. amin (float) – the noise floor of the input. An input that is smaller than amin, it’s converted to amin. dynamic_range (float) – range of the resulting value. E.g., if the maximum magnitude is 30 dB, the noise floor of the output would become (30 - dynamic_range) dB
Returns:	a decibel-scaled version of x.
Return type:	log_spec (Tensor)

Note

In many deep learning based application, the input spectrogram magnitudes (e.g., abs(STFT)) are decibel-scaled (=logarithmically mapped) for a better performance.

Example

input_shape = (2048, 1)  # mono signal
model = Sequential()
model.add(kapre.Frame(frame_length=1024, hop_length=512, input_shape=input_shape))
# now the shape is (batch, n_frame=3, frame_length=1024, ch=1)

kapre.backend.filterbank_mel(sample_rate, n_freq, n_mels=128, f_min=0.0, f_max=None, htk=False, norm='slaney')[source]¶

A wrapper for librosa.filters.mel that additionally does transpose and tensor conversion

Parameters:	sample_rate (int) – sample rate of the input audio n_freq (int) – number of frequency bins in the input STFT magnitude. n_mels (int) – the number of mel bands f_min (float) – lowest frequency that is going to be included in the mel filterbank (Hertz) f_max (float) – highest frequency that is going to be included in the mel filterbank (Hertz) htk (bool) – whether to use htk formula or not norm – The default, ‘slaney’, would normalize the the mel weights by the width of the mel band.
Returns:	mel filterbanks. Shape=`(n_freq, n_mels)`
Return type:	(Tensor)

kapre.backend.filterbank_log(sample_rate, n_freq, n_bins=84, bins_per_octave=12, f_min=None, spread=0.125)[source]¶

A function that returns a approximation of constant-Q filter banks for a fixed-window STFT. Each filter is a log-normal window centered at the corresponding frequency.

Parameters:	sample_rate (int) – audio sampling rate n_freq (int) – number of the input frequency bins. E.g., n_fft / 2 + 1 n_bins (int) – number of the resulting log-frequency bins. Defaults to 84 (7 octaves). bins_per_octave (int) – number of bins per octave. Defaults to 12 (semitones). f_min (float) – lowest frequency that is going to be included in the log filterbank. Defaults to C1 ~= 32.70 spread (float) – spread of each filter, as a fraction of a bin.
Returns:	log-frequency filterbanks. Shape=`(n_freq, n_bins)`
Return type:	(Tensor)

Note

The code is originally from logfrequency in librosa 0.4 (deprecated) and copy-and-pasted. tuning parameter was removed and we use n_freq instead of n_fft.

kapre.backend.mu_law_encoding(signal, quantization_channels)[source]¶

Encode signal based on mu-law companding. Also called mu-law compressing.

This algorithm assumes the signal has been scaled to between -1 and 1 and returns a signal encoded with values from 0 to quantization_channels - 1. See Wikipedia for more details.

Parameters:	signal (float Tensor) – audio signal to encode quantization_channels (positive int) – Number of channels. For 8-bit encoding, use 256.
Returns:	mu-encoded signal
Return type:	signal_mu (int Tensor)

kapre.backend.mu_law_decoding(signal_mu, quantization_channels)[source]¶

Decode mu-law encoded signals based on mu-law companding. Also called mu-law expanding.

See Wikipedia for more details.

Parameters:	signal_mu (int Tensor) – mu-encoded signal to decode quantization_channels (positive int) – Number of channels. For 8-bit encoding, use 256.
Returns:	decoded audio signal
Return type:	signal (float Tensor)