Extract Features Using a Neural Net
The network used in AudioIdentify can be used not only for recognizing sounds but also to extract features from a recording. This allows any signal to be embedded in a semantically meaningful space, where similarities and distances can be computed.
Get the network used in AudioIdentify from the Wolfram Neural Net Repository.
Extract the core of the network: the signal is divided into fixed-size chunks, and this net is applied to the mel spectrogram of each of those chunks. To do so, you can use NetExtract.
Remove the last few layers that are in charge of the classification task and reinsert the resulting network into the original NetChain. This net will produced a fixed-size, semantically meaningful vector for each audio input.
Visualize the features for a single audio recording.
Use the network as the feature extractor.
Use another pre-trained feature extractor from the repository.