Use > Smart Analytics > Administrator tasks > Configure an HTTP connector

Configure an HTTP connector

User Role: Administrator

After you install a connector, you must configure the parameters for this connector from the corresponding .cfg file before you start the service. In this example, the parameters for an HTTP connector is configured from the <Smart Analytics Installation>/HTTPConnector/httpconnector.cfg file.

The following table describes the parameters of the [MYSITE] section in the httpconnector.cfg file. If you want to configure multiple tasks for one connector, you just need to copy the content in the [MYSITE] section and rename the section.

Note Remember to restart the service of this HTTP connector after you configure and save the parameters.

Parameter Description
URL=http://MYSITE.COM Use this parameter to specify the root URL of the website for web crawling.
DIRECTORY=MYSITE Specify the file location to save the crawling pages.
CantHaveCSVs=*.css,*.js Specify the file types which are excluded from search resources. In this example, the .css and .js files are excluded.
CantHaveCheck=1

Specify that the value specified in the CantHaveCSVs parameter must be excluded from the URL.

//StayOnSite=False The web crawling does not stay on the current site and will follow the links that leave the current page.
//Depth=99

Specify the maximum depth to which the connector can follow links during web crawling.

In this example, this parameter is commented, which means it uses the default value (3).

//ProxyHost=PROXY.COM Specify the proxy URL.
//ProxyPort=80 Specify the proxy port.
//FOLLOWROBOTPROTOCOL=FALSE Specify whether the HTTP connector follows the protocol of the website. Most websites have a robot protocol to claim which page can be fetched by the spider. If you enable this parameter, the HTTP connector will not follow the protocol.
//----Login with form---- Uncomment the content under this section if you use a login form to log in to your websites.
//LOGINMETHOD=FORMPOST Specify that the website requires you to enter information such as the user name and password, and the form uses the POST method to send this information to the site's server.
//LOGINURL=https://login.com/ Specify the login URL.
//LOGINUSERFIELD=os_username Specify the ID of the field in which you enter your username. You can get the ID by viewing the source of the web page.
//LOGINUSERVALUE=USERNAME@COMPANY.COM Specify the user name.
//LOGINPASSFIELD=os_password Specify the ID of the field in which you enter your password.
//LOGINPASSVALUE=PASSWORD_ENCRYPTED Specify your password.
//LoginSubmitField=ButtonID Specify the ID of the button you click to log in to your website
//----HTTP digest authentication---- Uncomment the content under this section if you use an HTTP digest authentication to log in to your website.
//DigestUsername=USERNAME Specify the user name for HTTP digest authentication.
//DigestPassword=PASSWORD_ENCRYPTED Specify the password for HTTP digest authentication.
//----NTLM authentication---- Uncomment the content under this section if you use NTLM authentication to log in to your web page.
//NTLMUsername=USERNAME Specify the user name for NTLM authentication.
//NTLMPassword=PASSWORD Specify the password for NTLM authentication.