signal

Signal layers.

This module includes Kapre layers that deal with audio signals (waveforms).

class kapre.signal.Frame(frame_length, hop_length, pad_end=False, pad_value=0, data_format='default', **kwargs)[source]

Frame input audio signal. It is a wrapper of tf.signal.frame.

Parameters:
  • frame_length (int) – length of a frame
  • hop_length (int) – hop length aka frame rate
  • pad_end (bool) – whether to pad at the end of the signal of there would be a otherwise-discarded partial frame
  • pad_value (int or float) – value to use in the padding
  • data_format (str) – channels_first, channels_last, or default
  • **kwargs – optional keyword args for tf.keras.layers.Layer()

Example

input_shape = (2048, 1)  # mono signal
model = Sequential()
model.add(kapre.Frame(frame_length=1024, hop_length=512, input_shape=input_shape))
# now the shape is (batch, n_frame=3, frame_length=1024, ch=1)
call(x)[source]
Parameters:x (Tensor) – batch audio signal in the specified 1D format in initiation.
Returns:A framed tensor. The shape is (batch, time (frames), frame_length, channel) if channels_last, or (batch, channel, time (frames), frame_length) if channels_first.
Return type:(Tensor)
class kapre.signal.Energy(sample_rate=22050, ref_duration=0.1, frame_length=2205, hop_length=1102, pad_end=False, pad_value=0, data_format='default', **kwargs)[source]

Compute energy of each frame. The energy computed for each frame then is normalized so that the values would represent energy per ref_duration. I.e., if frame_length > sample_rate * ref_duration,

Parameters:
  • sample_rate (int) – sample rate of the audio
  • ref_duration (float) – reference duration for normalization
  • frame_length (int) – length of a frame that is used in computing energy
  • hop_length (int) – hop length aka frame rate. time resolution of the energy computation.
  • pad_end (bool) – whether to pad at the end of the signal of there would be a otherwise-discarded partial frame
  • pad_value (int or float) – value to use in the padding
  • data_format (str) – channels_first, channels_last, or default
  • **kwargs – optional keyword args for tf.keras.layers.Layer()

Example

input_shape = (2048, 1)  # mono signal
model = Sequential()
model.add(kapre.Energy(frame_length=1024, hop_length=512, input_shape=input_shape))
# now the shape is (batch, n_frame=3, ch=1)
call(x)[source]
Parameters:x (Tensor) – batch audio signal in the specified 1D format in initiation.
Returns:A framed tensor. The shape is (batch, time (frames), channel) if channels_last, or (batch, channel, time (frames)) if channels_first.
Return type:(Tensor)
class kapre.signal.MuLawEncoding(quantization_channels, **kwargs)[source]

Mu-law encoding (compression) of audio signal, in [-1, 1], to [0, quantization_channels - 1]. See Wikipedia for more details.

Parameters:
  • quantization_channels (positive int) – Number of channels. For 8-bit encoding, use 256.
  • **kwargs – optional keyword args for tf.keras.layers.Layer()

Note

Mu-law encoding was originally developed to increase signal-to-noise ratio of signal during transmission. In deep learning, mu-law became popular by WaveNet where 8-bit (256 channels) mu-law quantization was applied to the signal so that the generation of waveform amplitudes became a single-label 256-class classification problem.

Example

input_shape = (2048, 1)  # mono signal (float in [-1, 1])
model = Sequential()
model.add(kapre.MuLawEncoding(quantization_channels=256, input_shape=input_shape))
# now the shape is (batch, time=2048, ch=1) with int in [0, quantization_channels - 1]
call(x)[source]
Parameters:x (float Tensor) – audio signal to encode. Shape doesn’t matter.
Returns:mu-law encoded x. Shape doesn’t change.
Return type:(int Tensor)
class kapre.signal.MuLawDecoding(quantization_channels, **kwargs)[source]

Mu-law decoding (expansion) of mu-law encoded audio signal to [-1, 1]. See Wikipedia for more details.

Parameters:
  • quantization_channels (positive int) – Number of channels. For 8-bit encoding, use 256.
  • **kwargs – optional keyword args for tf.keras.layers.Layer()

Example

input_shape = (2048, 1)  # mono signal (int in [0, quantization_channels - 1])
model = Sequential()
model.add(kapre.MuLawDecoding(quantization_channels=256, input_shape=input_shape))
# now the shape is (batch, time=2048, ch=1) with float dtype in [-1, 1]
call(x)[source]
Parameters:x (int Tensor) – audio signal to decode. Shape doesn’t matter.
Returns:mu-law encoded x. Shape doesn’t change.
Return type:(float Tensor)
class kapre.signal.LogmelToMFCC(n_mfccs=20, data_format='default', **kwargs)[source]

Compute MFCC from log-melspectrogram.

It wraps tf.signal.mfccs_from_log_mel_spectrogram(), which performs DCT-II.

Note

In librosa, the DCT-II scales by sqrt(1/n) where n is the bin index of MFCC as it uses scipy. This is the correct orthogonal DCT. In Tensorflow though, because it follows HTK, it scales by (0.5 * sqrt(2/n)). This results in sqrt(2) scale difference in the first MFCC bins (n=1).

As long as all of your data in training / inference / deployment is consistent (i.e., do not mix librosa and kapre MFCC), it’ll be fine!

Parameters:
  • n_mfccs (int) – Number of MFCC
  • data_format (str) – channels_first, channels_last, or default
  • **kwargs – optional keyword args for tf.keras.layers.Layer()

Example

input_shape = (40, 128, 1)  # mono melspectrogram with 40 frames and n_mels=128
model = Sequential()
model.add(kapre.LogmelToMFCC(n_mfccs=20, input_shape=input_shape))
# now the shape is (batch, time=40, n_mfccs=20, ch=1)
call(log_melgrams)[source]
Parameters:log_melgrams (float Tensor) – a batch of log_melgrams. (b, time, mel, ch) if channels_last and (b, ch, time, mel) if channels_first.
Returns:MFCCs. (batch, time, n_mfccs, ch) if channels_last, (batch, ch, time, n_mfccs) if channels_first.
Return type:(float Tensor)