Administer > Administer ITBA > Content Administrator > Data Management - Connect Data Source

Data Management - Connect the Data Sources

The Data Warehouse can connect to other products (data sources) and gather data about these products. An integration is available for each product (data source). The connection from the data source to the DWH is called a content pack. DCS extracts the data from the specific data source. Content packs contain all the artifacts needed to connect to the relevant data source and gather data from that data source.

The Data Source page enables you to manage the integration of data into the data warehouse through the activation of data sources. The available data source content packs are registered in the deployment process and can then be activated in the Connect Data Source page.

ClosedTo access:

Select ADMIN > Data Management >  Connect Data Source then click Add data source. Select the data source type to activate the integration processes.

Note For details on activating Content Packs created in the IDE, see Activate a new CP

ClosedLearn more about each data source integration

ITBA integrates with the multiple data sources. For details, see Semantic Layer - Context Designer

The UI elements of the wizard differ according to the selected data source.

For each source, enter the relevant information and click Next to proceed to the validation page.

Note If the activation process is taking more than one hour, you can change the status of the CP in the CONTENT_PACK table to “ERROR” and then activate the CP again.

ClosedData Collection Service (DCS)

Data Collection Service (DCS) is a standalone service module that is responsible to extract the data from various data sources into flat files according to the relevant extraction and source model that is generated by the IDE. The flat files can then be loaded into the Vertica and run ETL process. The extraction and source model consist of a plugable extractor framework for each data source. The extractor gathers data according to the request it receives from the Content Flow Manager, placing it into a set of relevant .TXT files. Each supported data source has a corresponding extractor (or multiple extractors) that is capable of extracting the relevant data out of the data source. All available extractors for Content Packs use the DCS framework.

  • Extraction Mechanism. The extraction is a self-managed web service. The data source connection information is registered to the framework when adding a new data source in ITBA > ADMIN > Data Management > Connect Data Source.

  • Extraction Methodology. The extraction is done using extractors running on the application container of the Data Warehouse. The extractors use various technologies (for example, JDBC, Web Services, data files, and other kinds of HTTP requests) to extract the data from the data source. Each extraction is an isolated job that cannot be affected by other extraction jobs. Each extraction has a unique batch ID. The batch ID is incremental and cannot duplicated even for different Content Pack instances.

  • ETL Source Extract. The first stage of the ETL is the Source Extract. In this phase, the Content Flow Manager performs an HTTP request that activates the relevant extractor.

    The DCS extractor extracts the data from the data source into flat files. All Content Packs integrate using DCS, where data is extracted from the data source into .TXT files with a well-defined standard structure.

  • Format of flat files. The first line of a flat file should be the headers of all columns, separated with “|” symbol. The data follows with the columns values separated with a “|” symbol and the lines separated with a “#” symbol.

    If a column value includes special characters like “|”, “#” and “\”, it should be escaped by adding a “\” symbol before the special character. The DCS framework has a FlatFileWriter will handle the details of writing the headers and values.

    Flat file example:

  • Data sources and Content Packs.

    The following data source types are available for each Content Pack:

    Content Pack Data source type
    ALM ALM
    AM MSSQL, Oracle
    AWS AWS
    AWSCW AWSCW
    Azure GENERIC
    CSA CSA
    PPM Oracle
    SA Oracle
    SM MSSQL(Non dbdict), Oracle(Non dbdict), MSSQL(dbdict), Oracle(dbdict), DB2(dbdict)
    CO CO
  • Troubleshooting Logs.

    • $HPBA_Home/glassfish/glassfish/domains/BTOA/logs/dcs.log: This log describes all of the current activity of the DCS framework as well as the activity of the common utilities and general extractors.
    • $HPBA_Home/glassfish/glassfish/domains/BTOA/logs/dcs.extractor.log: This log describes the activity of all the extractors.

ClosedAdd and activate a new data source instance

The process of integrating a data source into the Data Warehouseis done through activation of the source instance.

  1. Select ADMIN > Data Management > Connect Data Source.

  2. Click the Add data source to open the Data source wizard. The Add Data Source page opens.

  3. Select the data source type and click Next.

    The relevant data source page opens.

  4. Enter and select the configuration parameters.

  5. Complete the wizard.

    The data source instance is activated.

    Note If the first time activation of a data source instance fails, the instance is displayed in the source list with an Error status. You can then activate the data source by clicking Edit Settings and completing the configuration and activation.

ClosedReactivate an existing data source instance

  1. Select ADMIN > Data Management > Connect Data Source.

  2. Click next to the specific source and the source is activated.

ClosedDeactivate a data source instance

You can deactivate the source and stop the integration process, in order to change configuration details.

  1. Select ADMIN > Data Management > Connect Data Source.

  2. Click and the deactivation warning opens.

  3. Click OK.

ClosedView data source configuration settings

  1. Select ADMIN > Data Management > Connect Data Source.

  2. Click View Settings and the relevant data source page opens.

ClosedEdit data source settings and test the connection

  1. Select ADMIN > Data Management > Connect Data Source.

  2. If necessary, deactivate the data source by clicking .

  3. Click Edit Settings and edit the configuration parameters.

  4. Click Next to validate your changes and test the connection to the data source.

ClosedAdd a new data source to the integration mechanism

For details, see Add and activate a new data source instance.

ClosedConfigure DCS Properties

In data source activation, each source that is extracted with DCS must have the following properties configured:

Data Source Location of the properties Properties
PPM HP_BA/ContentPacks/PPM/
EXTRACTOR/extractor-ppm/settings.properties
  • max_retries=3 Defines the maximum retry times when error occurs during the test connection and extraction.
  • retry_interval=3000Defines the interval between the retries.

  • paging_enable=false

    • true. use the paging functionality for data extraction.
    • false. do not use the paging functionality for data extraction.
  • paging_bulk=10000 Defines the size of bulk when the paging is enabled.

  • parallel_entity_tasks=10 Defines the number of threads for data extraction. Increasing this value enhances the extraction performance but work load of ITBA server and DB server are also increased.

  • fetch_size=1000 Defines the rows of data that fetch from DB server for each request.

SM HP_BA
/ContentPacks/SM/EXTRACTOR/extractor-sm/settings.properties
  • max_retries=3 Defines the maximum retry times when error occurs during the test connection and extraction.
  • retry_interval=3000Defines the interval between the retries.

  • paging_enable=false

    • true. use the paging functionality for data extraction.
    • false. do not use the paging functionality for data extraction.
  • paging_bulk=10000 Defines the size of bulk when the paging is enabled.

  • parallel_entity_tasks=10 Defines the number of threads for data extraction. Increasing this value enhances the extraction performance but work load of ITBA server and DB server are also increased.

  • fetch_size=1000 Defines the rows of data that fetch from DB server for each request.

ALM HP_BA/ContentPacks/ALM/EXTRACTOR/extractor-alm/settings.properties
  • max_retry_count=3 The maximum retry count when error occurs during the test connection and data extraction.
  • thread_pool_size=5 Defines the thread pool size for data extraction, in which the tasks are executed in parallel.
  • alm_page_size=1000 Defines the number of records the extracted for each REST call to ALM server.
  • max_cache_size_in_MB=2048 Defines the upper limit of the memory in MB that used by ALM extractor cache.
CSA HP_BA/ContentPacks/CSA/EXTRACTOR/extractor-csa/settings.properties
  • timeout = 10 Defines the timeout value for one REST request. The unit is minute.
  • threadPool = 5 Defines the thread pool size of the data extractor, in which the tasks are executed in parallel.
  • invalidUsers = cdaInboundUser,csaReportingUser,ooInboundUser,admin
  • proxyHost= Defines the proxy host to connect to CSA server.
  • proxyPort= Defines the proxy port to connect to CSA server.

AWS

AWSCW

HP_BA/ContentPacks/ AWSCW/EXTRACTOR/extractor-aws/settings.properties
  • awsEndpoint = https://s3.amazonaws.com Defines the endpoint for the AWS service.
  • dimensionDelimiter = ; Defines the delimiter for output the dimensions describing qualities of the metric.
  • valueDelimiter = = Defines the delimiter for output dimension values.
  • period = 3600 Defines the granularity, in seconds, of the returned data points.
  • minimumScope = 3600 Defines the minimum scope in seconds for requesting the metric statistics.

ClosedConnect Data Source Page

The Connect Data Source page enables you to select from a list of Integration Content Packs recognized by the data warehouse. Additionally, it enables you to activate the integration of the data sources, as well as deactivate and make configuration changes.

User interface elements are described below (when relevant, unlabeled elements are shown in angle brackets):

UI Element

Description
Add data source Click to open the Data Source wizard. For details, see Data Source Wizard.
<Data Sources>

A list of sources, by Instance Name (instance name) and Content Pack Name (data source product), that have been added to the data warehouse. The current status of the data source is displayed next to the instance name:

  • Activated
  • Deactivated
  • Error
  • Initializing: Data source is currently being activated. Relevant only for first time activation.
View Settings

Available when the data source has been activated. Displays the read-only configuration of all connection parameters.

Note All connection settings are run-time related. You must deactivate the connection to the data source in order to change the settings.

Edit Settings

Available when the data source has been deactivated. Displays the configuration of all connection parameters and enables you to test the connection to the data source. The parameters can be edited.

If the first time activation of a data source instance fails, the instance is displayed in the source list with an Error status. You can then activate the data source by clicking Edit Settings and completing the configuration and activation.

Activates the relevant data source.

Deactivates the relevant data source.

Do not deactivate a content pack while ETL is running.

ClosedData Source Wizard

The Data Source wizard enables you to add and activate a selected data source instance.

  • ClosedAdd Data Source Page

    Click Next to move to the next page of the wizard.

    User interface elements are described below (when relevant, unlabeled elements are shown in angle brackets):

    UI Element

    Description
    Data source type Select the data source type you want to activate.
  • ClosedConfiguration Parameters Page

    Each Configuration Parameters page displays parameters specific for the data source. For details, see the Content Reference Guide relevant to your data source.

  • ClosedValidation Page

    This validation page displays activation status information pertaining to the selected data source.

    A message displays the data source status information.

    Click Finish to complete the wizard activation process.