Create a Shakespearean Corpus with FileSystemScan
For this example, a directory is utilized including text files of all Shakespeare's works. Start by importing the textual content of the books with FileSystemMap, collecting only the textual content itself.
show complete Wolfram Language input
In[2]:=
![Click for copyable input](assets.en/create-a-shakespearean-corpus-with-filesystemscan/In_67.png)
works = Values[
FileSystemMap[Import, FileNameJoin[{$HomeDirectory, "Books"}], 2,
FileNameForms -> "*.txt"][[1]]]
Out[2]=
![](assets.en/create-a-shakespearean-corpus-with-filesystemscan/O_51.png)
Construct a single corpus using StringJoin.
In[3]:=
![Click for copyable input](assets.en/create-a-shakespearean-corpus-with-filesystemscan/In_68.png)
corpus = StringJoin[works]
Out[3]=
![](assets.en/create-a-shakespearean-corpus-with-filesystemscan/O_52.png)
The corpus can now be treated as a single searchable string, allowing for advanced text processing applications to be trivially utilized. Determine which countries are referenced in these works using TextCases while filtering out duplicates and issues of casing.
In[4]:=
![Click for copyable input](assets.en/create-a-shakespearean-corpus-with-filesystemscan/In_69.png)
countries =
ToLowerCase[TextCases[corpus, "Country"]] // DeleteDuplicates
Out[4]=
![](assets.en/create-a-shakespearean-corpus-with-filesystemscan/O_53.png)
show complete Wolfram Language input
Construct a GeoListPlot of the countries referred to in the works of Shakespeare.
In[6]:=
![Click for copyable input](assets.en/create-a-shakespearean-corpus-with-filesystemscan/In_71.png)
GeoListPlot[Interpreter["Country"] /@ countries]
Out[6]=
![](assets.en/create-a-shakespearean-corpus-with-filesystemscan/O_54.png)