January 23: A word-matching game using regular expressions in Ruby

Background: The rules of Spelling Bee

What’s a regular expression? What’s it used for?

The find-and-replace function is one practical example of the application of regex. (Source: Microsoft Corporation)
"Norm".match?( /N./ )   => true
"Nancy".match?( /N./ ) => true
"Josh".match?( /N./ ) => false
"N1234%".match?( /N./ ) => true
Defining a puzzle class

class Puzzle
@@dictionary = File.open( "words.txt" ).read.split( "\r\n" )
attr_reader :match, :pangram
attr_accessor :letters, :guesses
def initialize( these_letters )
raise ArgumentError, "Puzzle must contain exactly 7 letters" unless these_letters.length == 7
@letters = these_letters.upcase
@guesses = []
@match = /^(?=.*#{ these_letters.upcase[ 0 ] })[#{ these_letters.upcase }]{4,}$/
@pangram = /^(?=.*#{ these_letters.upcase[ 0 ] })(?=.*#{ these_letters.upcase[ 1 ] })(?=.*#{ these_letters.upcase[ 2 ] })(?=.*#{ these_letters.upcase[ 3 ] })(?=.*#{ these_letters.upcase[ 4 ] })(?=.*#{ these_letters.upcase[ 5 ] })(?=.*#{ these_letters.upcase[ 6 ] })\w+$/
puzzle = Puzzle.new( "rtinavm" )
=> #<Puzzle:0x00007fc3c5f27c28
  • ^ is a metacharacter that matches the beginning of a string.
  • The series of metacharacters (?=.*R) is called a lookahead. It defines a condition (a mini-regex) to evaluate before even touching the string to match; if the lookahead doesn’t match, the whole string isn’t a match. In this case, we’re looking ahead to see if the string matches the regex .*R.
  • We already know . is a metacharacter matching to any literal except a line break. * is a quantifier, a metacharacter that defines the number of times the character right before it occurs in a string to match — in this case, zero or more times (permitted, but not required). So here, we’re asking if the character R (the center cell) occurs in the string we’re matching, with any number of characters before it. If it doesn’t, the guess isn’t valid according to the rules of Spelling Bee.
  • A set of square brackets [ ] is a character class, which will match to any of the characters defined within — so [RTINAVM] will match to a single occurrence of one of the letters in our puzzle. A set of curly brackets { } is a range quantifier. {1,2} will match if a character occurs once or twice, while {,3} will match if a character occurs up to three times. In Spelling Bee, a guess must have four or more of the letters in our puzzle, so [RTINAVM]{4,} will match only strings that contain four or more occurrences of any of the letters in the character class [RTINAVM].
  • The final metacharacter, $ , matches the end of a string.

Regular expressions in action

  • A string of letters;
  • An array of guesses;
  • A regular expression match defining the rules for a correct guess;
  • A regular expression pangram defining the rules for a pangram (which gets us juicy bonus points); and…
  • A class variable, dictionary — a gargantuan array of words, which we get by using Ruby’s File class to read and split a text file called words.txt containing all the roughly 280,000 words in the Scrabble word list, each on a separate line.
def possible_words
@@dictionary.select{ | word | word.match?( self.match ) }
def pangrams
@@dictionary.select{ | word | word.match?( self.pangram ) }
def correct?( guess )
self.possible_words.include?( guess )
def bonus?( guess )
self.pangrams.include?( guess )
"JOSH".match?( puzzle.match )
=> false
"TRAIN".match?( puzzle.match )
=> true
=> [“AARTI”,

Conclusion: You’ve gotta know when to hold ’em, know when to fold ’em, know when to walk away…




