Net Encoders for Audio
A variety of audio-specific NetEncoder objects are now available to help solidly integrate the Audio object with the neural net framework. The encoders are a key part of the framework, since they provide an easy way to inject data into a neural net.
Inspect the features from each encoder computed on a recording of a bird.
The "Audio" net encoder simply returns the waveform after a resampling and downmixing step.
The "AudioSTFT" net encoder computes the Fourier transform on partitions of the input signal. This feature contains both time and frequency information.
The "AudioSpectrogram" net encoder returns the power spectrum computed on partitions of the input signal.
The "AudioMelSpectrogram" net encoder returns a spectrogram that has been filtered so that the frequency bins are nonlinearly spaced to mimic the pitch perception in humans.
The "AudioMFCC" net encoder performs some further dimensionality reduction on the mel spectrogram, while preserving most of the information contained in the signal.