Inspect a Signal Using the Audio Identification Net
The network used in AudioIdentify can be a very powerful tool for doing other audio analysis tasks. This example modifies the net to get probabilities over time as a time-resolved result.
Import the network from the Wolfram Neural Net Repository.
Apply the net to an Audio object.
The network was trained on the AudioSet dataset, where each audio signal is annotated with the sound classes and sources that are present in the recording.
As a consequence of this, the probabilities of each class in the output are not mutually exclusive.
The core of the network takes a fixed-size chunk of the mel spectrogram of the input signal and is mapped over overlapping chunks using NetMapOperator.
This is the core net.
You can compute this result on the example signal.
The result of that is the sequence of independent probabilities for each class computed on every chunk. Since you are looking for all of the classes present in the signal, take the max over time instead of the average.
By doing some surgery, you can generate a net that outputs the class probabilities for each chunk.
Using WebAudioSearch, you can collect some instrument sounds and join them together.
You can define a function to compute the net's result, look at the n most probable classes in the whole sequence and output those probabilities over time.
Visualize the waveform, along with the probabilities for the 10 most probable classes as they evolve over time.