backend¶
Backend operations of Kapre.
This module summarizes operations and functions that are used in Kapre layers.
-
kapre.backend.
_CH_FIRST_STR
¶ ‘channels_first’, a pre-defined string.
Type: str
-
kapre.backend.
_CH_LAST_STR
¶ ‘channels_last’, a pre-defined string.
Type: str
-
kapre.backend.
_CH_DEFAULT_STR
¶ ‘default’, a pre-defined string.
Type: str
-
kapre.backend.
get_window_fn
(window_name=None)[source]¶ Return a window function given its name. This function is used inside layers such as STFT to get a window function.
Parameters: - window_name (None or str) – name of window function. On Tensorflow 2.3, there are five windows available in
- tf.signal (hamming_window, hann_window, kaiser_bessel_derived_window, kaiser_window, vorbis_window) –
-
kapre.backend.
validate_data_format_str
(data_format)[source]¶ A function that validates the data format string.
-
kapre.backend.
magnitude_to_decibel
(x, ref_value=1.0, amin=1e-05, dynamic_range=80.0)[source]¶ A function that converts magnitude to decibel scaling. In essence, it runs 10 * log10(x), but with some other utility operations.
Similar to librosa.power_to_db with ref=1.0 and top_db=dynamic_range
Parameters: - x (Tensor) – float tensor. Can be batch or not. Something like magnitude of STFT.
- ref_value (float) – an input value that would become 0 dB in the result. For spectrogram magnitudes, ref_value=1.0 usually make the decibel-scaled output to be around zero if the input audio was in [-1, 1].
- amin (float) – the noise floor of the input. An input that is smaller than amin, it’s converted to amin.
- dynamic_range (float) – range of the resulting value. E.g., if the maximum magnitude is 30 dB, the noise floor of the output would become (30 - dynamic_range) dB
Returns: a decibel-scaled version of x.
Return type: log_spec (Tensor)
Note
In many deep learning based application, the input spectrogram magnitudes (e.g., abs(STFT)) are decibel-scaled (=logarithmically mapped) for a better performance.
Example
input_shape = (2048, 1) # mono signal model = Sequential() model.add(kapre.Frame(frame_length=1024, hop_length=512, input_shape=input_shape)) # now the shape is (batch, n_frame=3, frame_length=1024, ch=1)
-
kapre.backend.
filterbank_mel
(sample_rate, n_freq, n_mels=128, f_min=0.0, f_max=None, htk=False, norm='slaney')[source]¶ A wrapper for librosa.filters.mel that additionally does transpose and tensor conversion
Parameters: - sample_rate (int) – sample rate of the input audio
- n_freq (int) – number of frequency bins in the input STFT magnitude.
- n_mels (int) – the number of mel bands
- f_min (float) – lowest frequency that is going to be included in the mel filterbank (Hertz)
- f_max (float) – highest frequency that is going to be included in the mel filterbank (Hertz)
- htk (bool) – whether to use htk formula or not
- norm – The default, ‘slaney’, would normalize the the mel weights by the width of the mel band.
Returns: mel filterbanks. Shape=`(n_freq, n_mels)`
Return type: (Tensor)
-
kapre.backend.
filterbank_log
(sample_rate, n_freq, n_bins=84, bins_per_octave=12, f_min=None, spread=0.125)[source]¶ A function that returns a approximation of constant-Q filter banks for a fixed-window STFT. Each filter is a log-normal window centered at the corresponding frequency.
Parameters: - sample_rate (int) – audio sampling rate
- n_freq (int) – number of the input frequency bins. E.g., n_fft / 2 + 1
- n_bins (int) – number of the resulting log-frequency bins. Defaults to 84 (7 octaves).
- bins_per_octave (int) – number of bins per octave. Defaults to 12 (semitones).
- f_min (float) – lowest frequency that is going to be included in the log filterbank. Defaults to C1 ~= 32.70
- spread (float) – spread of each filter, as a fraction of a bin.
Returns: log-frequency filterbanks. Shape=`(n_freq, n_bins)`
Return type: (Tensor)
Note
The code is originally from logfrequency in librosa 0.4 (deprecated) and copy-and-pasted. tuning parameter was removed and we use n_freq instead of n_fft.
-
kapre.backend.
mu_law_encoding
(signal, quantization_channels)[source]¶ Encode signal based on mu-law companding. Also called mu-law compressing.
This algorithm assumes the signal has been scaled to between -1 and 1 and returns a signal encoded with values from 0 to quantization_channels - 1. See Wikipedia for more details.
Parameters: - signal (float Tensor) – audio signal to encode
- quantization_channels (positive int) – Number of channels. For 8-bit encoding, use 256.
Returns: mu-encoded signal
Return type: signal_mu (int Tensor)
-
kapre.backend.
mu_law_decoding
(signal_mu, quantization_channels)[source]¶ Decode mu-law encoded signals based on mu-law companding. Also called mu-law expanding.
See Wikipedia for more details.
Parameters: - signal_mu (int Tensor) – mu-encoded signal to decode
- quantization_channels (positive int) – Number of channels. For 8-bit encoding, use 256.
Returns: decoded audio signal
Return type: signal (float Tensor)