The subject of advanced regular expression matching came up in another thread, so I thought I'd post more here.
The example was in reference to:
(Note: \B in this post should be lowercase b, forum software is mangling stuff, it also converted some of them into smileys. Gah!)
Here's an attempt to explain that stuff:
The first thing to note is that potion IDs have a pattern to them. You can list all the IDs like this:
tes3cmd dump --no-banner --list --type alch Morrowind.esm
On linux, it's even handier to be able to sort the output by piping to "sort", but in any case, notice that potion IDs start with "p_" or "potion_". And in the case when it starts with "p_" it ends with "_b", "_c", "_e", "_q", or "_s". So, the idea one might start with is that you can construct a pattern that will reliably match all the potion IDs. It turns out you can, but I only found that out after confirming it with a test.
To match what we have observed so far, we would write a pattern like this:
name:(potion_|p_.+?_[bceqs]\/cool.png' class='bbc_emoticon' alt='B)' />
We know that in a cell, object reference names are preceeded with "name:" so that will start our pattern. The parentheses and vertical bar are used to specify alternative patterns. And \b matches a word boundary (such as between the end of a word and a space that follows).
So the pattern above matches potion IDs that begin with either "potion_" or "p_" and in the latter case, when ended by one of [bceqs].
Further observation shows that some potion IDs end in "_unique" instead of the [bceqs], so we add that to our pattern as an altenative:
name:(potion_|p_.+?_(unique|[bceqs])\/cool.png' class='bbc_emoticon' alt='B)' />
There is also the potion "p_vintagecomberrybrandy1" so we can add that as yet another alternative to the stuff that comes after "p_":
name:(potion_|p_(vintage|.+?_(unique|[bceqs])\/cool.png' class='bbc_emoticon' alt='B)' />
So, let's test our pattern and see if any other record types (not ALCH) match:
tes3cmd dump --id "^(potion_|p_(vintage|.+?_(unique|[bceqs])$))" --no-banner --list Morrowind.esm
In this case I replaced "name:" with "^" which anchors the patter to the start of the string. If our pattern was chosen correctly, we should only see objects of type ALCH matching, and that is what we see. This confirms we have chosen a good pattern for matching potion IDs.
Note that "." by itself means match any character. ".+" means match any character repeated one or more times. and ".+?" means match any character repeated one or more times but only with the minimum number of characters. Normally "*" and "+" are greedy, which means they match repeated regular expressions of the longest length, but adding the question mark at the end of a regular expression means to do a non-greedy or minimal match.
Continuing on, we want a pattern that matches a non-zero X/Y angle. This is what I propose:
x_angle:[0.]*[^0. ]
[0.]* - matches any string containing only "0" and ".", repeated zero or more times. Note that "." inside brackets just matches the period.
[^0. ] - matches any string that does not contain "0", "." or " " (space).
So if any number other than 0 appears after the "x_angle:" and before a space, we should have a match.
We can add an alternative to also match for the y_angle:
(x_angle:[0.]*[^0. ]|y_angle:[0.]*[^0. ])
Putting things together we have:
(?s)name:(potion_|p_(vintage|.+?_(unique|[bceqs])\ /cool.png' class='bbc_emoticon' alt=':cool:' />).*(x_Angle:[0.]*[^0. ]|y_Angle:[0.]*[^0. ])
The initial (?s) is a way to add options to how the regular expression works. In this case the "s" means to treat the string we are matching as a single-line, or in other words "." is not special. Normally "." is special in that it does not match the newline character at the end of a line. We use this as the string that represents an object reference is a multi-line string, where each line is a different sub-record and we want to match multiple subrecords (the NAME, which contains the potion ID and DATA which contains the Angles).
Regular expressions are a deep subject, and they can be confusing. Entire books have been written about them. I think I covered most of what is needed to understand the example I gave, but if anyone wants more info, let me know.
(And sorry about the smileys and \B crap, they should all be backslash-lowercase "b")