SQL Regular Expression
I have been getting into biking more and more. Biking is a good exercise. I have read health articles that just 3 hours a week can reduce hart-attack and stokes by 50%.
I purchased a Word Combination Lock for the bike. It has four dials with ten letters each. I couldn’t find a word I liked so I searching the web for samples. But I didn’t find any lists.
Now for the real purpose of this blog. Using Regular Expressions. I used regular expressions to produce a list of possible combinations. This is how I did it:
1) first off, you need a list of English words. I did a search and found a huge list of words (WordList.txt). There was a lot of garbage in the file I found, so I modified it a bit.
2) Create a table to hold the word list:
Create table wordlist (word varchar(255));
3) Import the list into your database. There is many ways to get data into your database. I used toad's import tool.
There are duplicates in the list. That is why there isn’t a primary key defined.
4) Time for the query:
Letter options for each dial. The default combination is SHED. As you can see from the list, there is runs, fast, bike, loop, play and well.
|
S |
H |
E |
D |
|
T |
W |
R |
G |
|
D |
R |
M |
K |
|
M |
Y |
T |
M |
|
R |
U |
N |
S |
|
F |
A |
S |
T |
|
B |
I |
K |
E |
|
L |
O |
O |
P |
|
P |
L |
A |
Y |
|
W |
E |
L |
L |
Using the letter combinations for each dial, we can use simple regular expression to produce a list of words that can be formed.
See http://en.wikipedia.org/wiki/Regular_expression for definition and history of regular expression.
The Query:
select distinct word
from wordlist
where ucase (word) REGEXP '[STDMRFBLPW][HWRYUAIOLE][ERMTNSKOAL][DGKMSTEPYL]'
and length(word)=4;
Looking at the query.
1. select distinct word
Remember there are duplicates in the word list. Distinct will eliminate duplicates the list. Note, distinct shouldn’t be used in any production environment
2. where ucase (word)...
It is good practice to match the case that you are expressing. Ucase will convert all words in the table to upper case.
Do note, that some database engines will not use an index when changing case. Thus producing a slower query.
3. Looking at the simple version ...word regexp... is similar to ...word = 'SHED' or word='SHEG' and etc. looking at each character that matches each dial letter.
For the Microsoft SQL Server users: there really isn’t a good way to do regular expression. Click here for a goole search that might help.
The Result. The query produced a list of 5026 possible words. See Word_Lock_Sample_List.txt for the list. The list isn't the best. The source file did have non-existent words.