(Optional) Enable Incremental Indexing

When you have a huge number of documents to index (for example, hundreds of thousands of documents or even more), the indexing process may take quite long. By default, autoCommit is disabled in the Solr search engine, which means indexed documents are not committed to the search engine and hence not searchable to users until the entire indexing process is complete. Optionally, you can enable incremental indexing so that partial indexed documents are searchable before the initial indexing phase is complete.

In a master-slave environment, perform the following tasks to enable incremental indexing.

Task 1: Enable autoCommit on the master server

To do this, follow these steps:

  1. Open the <search engine root directory>\kmsearchengine\KMCores\kmcore\conf\solrconfig.xml file of the master server in a text editor.
  2. Specify a value for the maxDocs parameter.

    1. Locate the following line:

      <updateHandler class="solr.DirectUpdateHandler2" />

    2. Change this line to the following:

         <updateHandler class="solr.DirectUpdateHandler2">
          <autoCommit>
          <maxDocs>%{maxDocs}%</maxDocs>
          </autoCommit>
      </updateHandler>

      Where: %{maxDocs}% is a number (for example, 10000) that represents the maximum number of uncommitted documents that are allowed before an auto commit is triggered. When the number of uncommitted documents reaches this threshold, Solr performs a commit and saves the data to its index. Once committed, the documents are searchable to users.

  3. Modify the replication setting of the master server.

    1. Locate the following tag:

      <requestHandler name="/replication" class="solr.ReplicationHandler" >

    2. Change the content of this tag to the following:

         <requestHandler name="/replication" class="solr.ReplicationHandler" >
        <lst name="master">
          <str name="replicateAfter">commit</str>
          <str name="replicateAfter">startup</str>
          <str name="confFiles">schema.xml</str>
       </lst>
      </requestHandler>

      With this configuration, after the master server is started or has performed a commit, it creates tags that identify the files that have been changed. The slaves will then poll the master server and replicate the updated files from the master based on the tags created.

Task 2: Configure replication on the slave server

Note Repeat the following steps for each slave server.

  1. Open the <search engine root directory>\kmsearchengine\KMCores\kmcore\conf\solrconfig.xml file of the slave server in a text editor.
  2. Change the content of the replication requestHandler entry to the following:

      <requestHandler name="/replication" class="solr.ReplicationHandler" >
     <lst name="slave">
        <str name="enable">${enable.slave:true}</str> 
       <str name="masterUrl">http://<hostname>:<port>/KMCores/${solr.core.name}/replication</str>
       <!-- str name="pollInterval">00:00:20</str -->
     </lst>
    </requestHandler>

    Where: The hostname and port parameter values are the host name and port of the master server. You can uncomment the pollInterval parameter if you want the polling job to run at a specified interval.

Caution Either of the tasks requires a full re-index of the knowledgebases for the changes to take effect. Next, you need to perform a full reindex of all knowledgebases. For details, see Perform a Full Reindex on a Knowledgebase.