Configure data cleansing

User Role: Administrator

The purpose of data cleansing is to remove unwanted contents from the Smart Analytics source data set that is used to train and index into Smart Analytics as well as in runtime processing.

Note  

  • For Smart Ticket and Hot Topic Analytics features, data cleansing is only applied to the "Title Field" or "Content Fields" that are defined in configurations.
  • For Smart Search, data cleansing can be applied to any field individually which you can configure. For detailed information, see the data cleansing description in Manage Smart Search Knowledgebases.
  • All modification for data cleansing will take effect from next round of indexing.

To add a data cleansing configuration, follow these steps:

  1. From the System Navigator, click System Administration > Ongoing Maintenance > Smart Analytics > Data Cleansing.

  2. Select a module. For example, Interaction.

    Note In the module drop-down list there is a Global option. This means that it's a global data cleansing record for all modules which are using Hot Topic Analytics and Smart Ticket.

  3. Select one of the following actions:

    • Remove: Remove the matched texts and index the rest to SM Smart Analytics.

    • Include: Extract and index the texts between the start pattern and the end pattern exclusively.

    • Exclude: Exclude the texts that match the pattern (including start, end, and all the words between them) and index the rest to SM Smart Analytics.
    • ExtractFromTemplate: Extract the content from template that is configured as regular expressions. The capturing groups that are matched by the Regular Expressions are extracted and returned.

  4. Enter the text or pattern for the action that you selected. For the Remove action, you only need to type the text string to be removed. For the Include and Exclude actions, the start pattern is the text string that you need to specify while the end pattern can be one of these options: a text string that you specify, end of line, or end of document.

    The processing of the ExtractFromTemplate action is of first priority. The Data cleansing actions are processes in the following order:

    1. ExtractFromTemplate

      If there are matched texts found, then return. Otherwise, perform the Include action.

    2. Include

      If there are matched texts found, then perform the Remove action. Otherwise, perform the Exclude action..

    3. Exclude

      If there are matched texts found, then perform the Remove action.

    4. Remove

    To learn how the text or pattern takes effect, see the following examples.

    • Example of the Remove action:

      Original content
      [telephone communication history with customer]: Microsoft 
      Office keeps asking for installation of additional 
      components / language packs.
      Specified text to be removed
      [telephone communication history with customer]:
      After cleansing
      Microsoft Office keeps asking for installation of 
      additional components / language packs
    • Examples of the Include action:

      Original content
      Description of the issue: 
      Sent items are not being sent by Outlook. 
      Actions suggested by help desk agent: 
      asked customer to check network connection status, 
      shows connection is OK
      Start pattern
      description of the issue:
      End pattern
      actions suggested by help desk agent:
      After cleansing
      Sent items are not being sent by Outlook.

       

      Original content
      Description of the issue: Items are not sent by Outlook.
      Actions suggested by help desk agent: 
      asked customer to check network connection status, 
      shows connection is OK
      Start pattern
      description of the issue:
      End pattern

      End of line

      After cleansing
      Items are not sent by Outlook.

       

      Original content
      Description of the issue: 
      Sent items are not being sent by Outlook. 
      Actions suggested by help desk agent: 
      asked customer to check network connection status, 
      shows connection is OK
      Start pattern
      description of the issue:
      End pattern

      End of document

      After cleansing
      Sent items are not being sent by Outlook. 
      Actions suggested by help desk agent: 
      asked customer to check network connection status, 
      shows connection is OK
    • Examples of the Exclude action:

      Original content
      SQL Server is down and cannot be restarted.
      [appendix: error log] Details:
      Xxxxxxxxxxxxxxxx
      [end of appendix]
      
      Start pattern
      [appendix: error log]
      End pattern
      [end of appendix]
      After cleansing
      SQL Server is down and cannot be restarted.

       

      Original content
      SQL Server is down and cannot be restarted.
      [appendix: error log] Details:
      Xxxxxxxxxxxxxxxx
      [end of appendix]
      
      Start pattern
      [appendix: error log]
      End pattern

      End of line

      After cleansing
      SQL Server is down and cannot be restarted.
      Xxxxxxxxxxxxxxxx
      [end of appendix]
      

       

      Original content
      SQL Server is down and cannot be restarted.
      [appendix: error log] Details:
      Xxxxxxxxxxxxxxxx
      [end of appendix]
      
      Start pattern
      [appendix: error log]
      End pattern

      End of document

      After cleansing
      SQL Server is down and cannot be restarted.
    • Example of the ExtractFromTemplate action:

      Example configuration

      Brief description of the problem: ([^]*)FirstName:([^]*)LastName:([^]*)Phone:([^]*)

      Data before cleansing

      Brief description of the problem: The user called because he could not access to eDocs. The user was able to find the eDocs administrator but that person does not work for the company anymore

      First Name : Herr Maximo Christian

      Last Name : Graf

      Phone : 01 234 567

      Data after cleansing

      The user called because he could not access to eDocs. The user was able to find the eDocs administrator but that person does not work for the company anymore

      Herr Maximo Christian

      Graf

      01 234 567

    Note Regular expression is supported only for the ExtractFromTemplate action.

  5. Select the Match Case check box if you only want to find the texts that match the case of the text or pattern that you entered.
  6. Select the Active check box to activate this configuration.

  7. Click Add. The new data cleansing configuration is now added.