Configure data cleansing

User Role: Administrator

The purpose of data cleansing is to remove unwanted contents from the Smart Analytics source data set that is used to train and index into Smart Analytics as well as in runtime processing.

Note  

  • For Smart Ticket and Hot Topic Analytics features, data cleansing is only applied to the "Title Field" or "Content Fields" that are defined in configurations.
  • For Smart Search, data cleansing can be applied to any field individually which you can configure. For detailed information, see the data cleansing description in Manage Smart Search Knowledgebases.
  • All modification for data cleansing will take effect from next round of indexing.

To add a data cleansing configuration, follow these steps:

  1. From the System Navigator, click System Administration > Ongoing Maintenance > Smart Analytics > Data Cleansing.

  2. Select a module. For example, Interaction.

    Note In the module drop-down list there is a Global option. This means that it's a global data cleansing record for all modules which are using Hot Topic Analytics and Smart Ticket.

  3. Select one of the following actions:

    • Remove: Remove the matched texts and index the rest to SM Smart Analytics.

    • Include: Extract and index the texts between the start pattern and the end pattern exclusively.

    • Exclude: Exclude the texts that match the pattern (including start, end, and all the words between them) and index the rest to SM Smart Analytics.
    • ExtractFromTemplate: Extract the content from template that is configured as regular expressions. The capturing groups that are matched by the Regular Expressions are extracted and returned.

  4. Enter the text or pattern for the action that you selected. For the Remove action, you only need to type the text string to be removed. For the Include and Exclude actions, the start pattern is the text string that you need to specify while the end pattern can be one of these options: a text string that you specify, end of line, or end of document.

    The processing of the ExtractFromTemplate action is of first priority. The Data cleansing actions are processes in the following order:

    1. ExtractFromTemplate

      If there are matched texts found, then return. Otherwise, perform the Include action.

    2. Include

      If there are matched texts found, then perform the Remove action. Otherwise, perform the Exclude action..

    3. Exclude

      If there are matched texts found, then perform the Remove action.

    4. Remove

    To learn how the text or pattern takes effect, see the following examples.

    Note Regular expression is supported only for the ExtractFromTemplate action.

  5. Select the Match Case check box if you only want to find the texts that match the case of the text or pattern that you entered.
  6. Select the Active check box to activate this configuration.

  7. Click Add. The new data cleansing configuration is now added.