42 | String Patterns and Templates |
String patterns work very much like other patterns in the Wolfram Language, except that they operate on sequences of characters in strings rather than parts of expressions. In a string pattern, you can combine pattern constructs like _ with strings like "abc" using ~~.
This picks out all instances of + followed by a single character:
StringCases["+string +patterns are +quite +easy", "+" ~~ _]
This picks out three characters after each +:
StringCases["+string +patterns are +quite +easy", "+" ~~ _ ~~ _ ~~ _]
Use the name x for the character after each +, and return that character framed:
StringCases["+string +patterns are +quite +easy",
"+" ~~ x_ -> Framed[x]]
In a string pattern, _ stands for any single character. __ (“double blank”) stands for any sequence of one or more characters, and ___ (“triple blank”) stands for any sequence of zero or more characters. __ and ___ will normally grab as much of the string as they can.
Pick out the sequence of characters between [ and ]:
StringCases["the [important] word", "[" ~~ x__ ~~ "]" -> Framed[x]]
__ normally matches as long a sequence of characters as it can:
StringCases["now [several] important [words]",
"[" ~~ x__ ~~ "]" -> Framed[x]]
Shortest forces the shortest match:
StringCases["now [several] important [words]",
"[" ~~ Shortest[x__] ~~ "]" -> Framed[x]]
StringCases picks out cases of a particular pattern in a string. StringReplace makes replacements.
Make replacements for characters in the string:
StringReplace["now [several] important [words]", {"[" -> "<<",
"]" -> ">>"}]
Make replacements for patterns, using to compute ToUpperCase in each case:
StringReplace["now [several] important [words]",
"[" ~~ Shortest[x__] ~~ "]" :> ToUpperCase[x]]
Use NestList to apply a string replacement repeatedly:
NestList[StringReplace[#, {"A" -> "AB", "B" -> "BA"}] &, "A", 5]
StringMatchQ tests whether a string matches a pattern.
Select common words that match the pattern of beginning with a and ending with b:
Select[WordList[ ], StringMatchQ[#, "a" ~~ ___ ~~ "b"] &]
Pick out any sequence of A or B repeated:
StringCases["the AAA and the BBB and the ABABBBABABABA", ("A" | "B") ..]
In a string pattern, LetterCharacter stands for any letter character, DigitCharacter for any digit character and Whitespace for any sequence of “white” characters such as spaces.
StringCases["12 and 123 and 4567 and 0x456", DigitCharacter ..]
Pick out sequences of digit characters “flanked” by whitespace:
StringCases["12 and 123 and 4567 and 0x456",
Whitespace ~~ DigitCharacter .. ~~ Whitespace]
It’s common in practice to want to go back and forth between strings and lists. You can split a string into a list of pieces using StringSplit.
Split a string into a list of pieces, by default breaking at spaces:
StringSplit["a string to split"]
This uses a string pattern to decide where to split:
StringSplit["you+can+split--at+any--delimiter", "+" | "--"]
Within strings, there’s a special newline character which indicates where the string should break onto a new line. The newline character is represented within strings as \n.
Split at newlines:
StringSplit["first line
second line
third line", "\n"]
StringJoin joins any list of strings together. In practice, though, one often wants to insert something between the strings before joining them. StringRiffle does this.
Join strings, riffling the string "---" in between them:
StringRiffle[{"a", "list", "of", "strings"}, "---"]
In assembling strings, one often wants to turn arbitrary Wolfram Language expressions into strings. One can do this using TextString.
TextString turns numbers and other Wolfram Language expressions into strings:
StringJoin["two to the ", TextString[50], " is ", TextString[2^50]]
A more convenient way to create strings from expressions is to use string templates. String templates work like pure functions in that they have slots into which arguments can be inserted.
In a string template each `` is a slot for a successive argument:
StringTemplate["first `` then ``"][100, 200]
Named slots pick elements from an association:
StringTemplate[
"first: `a`; second `b`; first again `a`"][<|"a" -> "AAAA",
"b" -> "BB BBB"|>]
You can insert any expression within a string template by enclosing it with <*...*>. The value of the expression is computed when the template is applied.
StringTemplate["2 to the 50 is <* 2^50 *>"][ ]
Use slots in the template (` is the backquote character):
StringTemplate["`1` to the `2` is <* #1^#2 *>"][2, 50]
The expression in the template is evaluated when the template is applied:
StringTemplate["the time now is <* Now *>"][ ]
patt1~~patt2 | sequence of string patterns | |
Shortest[patt] | shortest sequence that matches | |
StringCases[string,patt] | cases within a string matching a pattern | |
StringReplace[string,pattval] | replace a pattern within a string | |
StringMatchQ[string,patt] | test whether a string matches a pattern | |
LetterCharacter | pattern construct matching a letter | |
DigitCharacter | pattern construct matching a digit | |
Whitespace | pattern construct matching spaces, etc. | |
\n | newline character | |
StringSplit[string] | split a string into a list of pieces | |
StringJoin[{string1,string2, ...}] | join strings together | |
StringRiffle[{string1,string2, ...},m] | join strings, inserting m between them | |
TextString[expr] | make a text string out of anything | |
StringTemplate[string] | create a string template to apply | |
`` | slot in a string template | |
< *...*> | expression to evaluate in a string template |
42.2Get a sorted list of all sequences of 4 digits (representing possible dates) in the Wikipedia article on computers. »
42.3Extract “headings” in the Wikipedia article about computers, as indicated by strings starting and ending with "===". »
42.5Find names of integers below 50 that have an “i” somewhere before an “e”. »
42.6Make any 2-letter word uppercase in the first sentence from the Wikipedia article on computers. »
42.7Make a labeled bar chart of the number of countries whose TextString names start with each possible letter. »
42.8Find simpler code for
Grid[Table[StringJoin[TextString[i], "^", TextString[j], "=", TextString[i^j]], {i, 5}, {j, 5}]]. »
How should one read ~~ out loud?
It’s usually read “tilde tilde”. The underlying function is StringExpression.
How does one type `` to make a slot in a string template?
It’s a pair of what are usually called backquote or backtick characters. On many keyboards, they’re at the top left, along with ~ (tilde).
Can I write rules for understanding natural language?
Yes, but we didn’t cover that here. The key function is GrammarRules.
What does TextString do when things don’t have an obvious textual form?
It does its best to make something human readable, but if all else fails, it’ll fall back on InputForm.
- There’s a correspondence between patterns for strings, and patterns for sequences in lists. SequenceCases is the analog for lists of StringCases for strings.
- The option Overlaps specifies whether or not to allow overlaps in string matching. Different functions have different defaults.
- String patterns by default match longest sequences, so you need to specify Shortest if you want it. Expression patterns by default match shortest sequences.
- Among string pattern constructs are Whitespace, NumberString, WordBoundary, StartOfLine, EndOfLine, StartOfString and EndOfString.
- Anywhere in a Wolfram Language symbolic string pattern, you can use RegularExpression to include regular expression syntaxes like x* and [abc][def].
- TextString tries to make a simple human-readable textual version of anything, dropping things like the details of graphics. ToString[InputForm[expr]] gives a complete version, suitable for subsequent input.
- You can compare strings using operations like SequenceAlignment. This is particularly useful in bioinformatics.
- FileTemplate, XMLTemplate and NotebookTemplate let you do the analog of StringTemplate for files, XML (and HTML) documents and notebooks.
- The Wolfram Language includes the function TextSearch, for searching text in large collections of files.