Spelling correction

The primary issue with spelling correction is to identify when an input string is significantly close to one of a set of given strings. When a user enters a query, after lexical analysis and the other conditions are performed, IR Expert attempts to identify words in the index that are close to the unrecognized word. The requirement is to attain good selectivity while still exploring large databases in a timely manner. To do this, IR Expert uses two tests.

First, IR Expert looks for a shallow match. It compares all existing index terms to any unidentified query terms, taking into account order of letters. Finding shallow matches relies upon the identification of same letters in both words, ideally seeking a distance of zero (0) between the words/letters. IR Expert passes words with a distance of two (2) or less to the second test.

Secondly, IR Expert does a deep match test to prune obviously different words with merely similar letter arrangement, for example, bushland for husband. Deep matching verifies letter order within words. IR Expert uses the words with the lowest distance as corrections for unrecognized words.