Create Search Engine Thesaurus Files

The search engine uses the thesaurus when doing a simple search. A synonym search is a type of search that locates occurrences of either the search term or any of its synonyms. For example, a synonym search for computer might return documents that contain laptop or desktop .

Thesaurus expansion happens automatically for terms entered into the simple search screen and is not currently supported for advanced searches. The search engine performs thesaurus expansion on words in the natural language query box. Thesaurus expansion is done using the dictionary that matches your login language. To use a different language dictionary, change the query language parameter on the advanced search form.

Note: A synonym search term containing a phrase is not supported.

The search engine supports thesaurus files (or dictionaries) for individual languages, but none are provided out-of-box. You may create your own thesaurus dictionaries for use when searching content, or thesaurus expansion can occur automatically for terms that are entered into a simple search screen that is not currently supported by advanced search.

To add a thesaurus file for a certain language:

  1. Go to the {SERVICE_MANAGER_HOME}/Search_Engine/kmsearchengine/languages/thesaurus folder, and create an empty text file named synonyms_synonyms_<language id code>.txt.

    The thesaurus file name format includes the two-character language id. For example, the English thesaurus text file name is synonyms_en.txt and the French thesaurus text file name is synonyms_fr.txt.
  2. Add content to the thesaurus file. The thesaurus file format is as follows:

    # blank lines and lines starting with pound are comments.
    #Explicit mappings match any token sequence on the left hand side of 
     #"=>" and replace with all alternatives on the right hand side.
    #Examples:
    laptop, desktop => computer
    #Equivalent synonyms may be separated with commas
    #NOTE: When using commas in files, ensure that single-byte commas
    #are used instead of double-byte commas.
    #Examples:
    foozball , foosball
    universe , cosmos
    #"computer, laptop, desktop" is equivalent to the explicit mapping:
    computer, laptop, desktop => computer
    #multiple synonym mapping entries are merged.
    foosball => foosball
    foozball => foozball
    #is equivalent to
    foosball => foosball, foozball   
    Caution: When using commas to separate terms in files, you must use the single-byte commas instead of double-byte commas.
  3. Save the file in UTF-8 encoding.

    Cuation: Because UTF-8 is part of the Unicode standard which enables you to encode text in practically any script and language, be sure you save your files in UTF-8 encoding.

Related topics

 

Related topics