Wolfram Language

Computational Audio

Use DTW to Compare Recordings

Import, trim, and preprocess four recordings of the first sentence of Alice in Wonderland.

show complete Wolfram Language input
Copy to clipboard.
In[1]:=
Click for copyable input
urls = {"http://ia800503.us.archive.org/3/items/alices_adventures/\ aliceinwonderland_01_carroll.mp3", "http://ia800306.us.archive.org/25/items/alice_wonderland_0711_\ librivox/alice_01_carroll.mp3", "http://ia800201.us.archive.org/32/items/alices_adventures_1003/\ alices_adventures_01_carroll.mp3", "https://ia800904.us.archive.org/15/items/alicesadventure_abridged_\ pc_librivox/alicesadventuresinwonderlandabridged_01_carroll.mp3"}; times = {{27, 33.5}, {16.5, 25}, {22.3, 28.5}, {31, 38}};
Copy to clipboard.
In[2]:=
Click for copyable input
alice = ConformAudio[ MapThread[ AudioNormalize[ AudioChannelMix[AudioTrim[AudioResample[Import[#1], 11025], #2], 1]] &, {urls, times}]]
Out[2]=

Show the plots for the signals.

Copy to clipboard.
In[3]:=
Click for copyable input
AudioPlot[alice, ImageSize -> Medium]
Out[3]=

Compute and plot the MFCC features for the samples.

Copy to clipboard.
In[4]:=
Click for copyable input
mfcc = AudioLocalMeasurements[#, "MFCC", PartitionGranularity -> {.05, .01}]["Values"] & /@ alice;
Copy to clipboard.
In[5]:=
Click for copyable input
Column[MatrixPlot[#, PlotTheme -> "Minimal", ImageSize -> Medium] & /@ Transpose /@ mfcc]
Out[5]=

Compute the dynamic time warping distance between the recordings using WarpingDistance.

Copy to clipboard.
In[6]:=
Click for copyable input
DistanceMatrix[mfcc, DistanceFunction -> WarpingDistance] // MatrixPlot
Out[6]=

Compute the dynamic time warping correspondence between two of the recordings using WarpingCorrespondence.

Copy to clipboard.
In[7]:=
Click for copyable input
{n, m} = WarpingCorrespondence[mfcc[[1]], mfcc[[2]]];
show complete Wolfram Language input
Copy to clipboard.
In[8]:=
Click for copyable input
dur = QuantityMagnitude[Duration[alice[[1]]], "s"]; s = {n, m}\[Transpose]/Max[{n, m}] dur; Labeled[ ListLinePlot[ s, PlotRange -> {{0, dur}, {0, dur}}, AspectRatio -> 1, Axes -> False, PlotStyle -> Thickness[.01], ImageSize -> Medium, Frame -> True, FrameTicks -> None, Prolog -> {RGBColor[ 0.6666666666666666, 0.6666666666666666, 0.6666666666666666], {Line[{{#[[1]], 0}, #}], Line[{{0, #[[2]]}, #}]} & /@ (s[[;; ;; 100]])} ], AudioPlot[#, PlotStyle -> RGBColor[0.560181, 0.691569, 0.194885], Frame -> False, Axes -> False, ImageSize -> Medium, AspectRatio -> 1/15] & /@ (alice[[;; 2]]), {Bottom, Left}, RotateLabel -> True, Spacings -> {0, 0}]
Out[8]=

Related Examples