Searching the Help
To search for information in the Help, type a word or phrase in the Search box. When you enter a group of words, OR is inferred. You can use Boolean operators to refine your search.
Results returned are case insensitive. However, results ranking takes case into account and assigns higher scores to case matches. Therefore, a search for "cats" followed by a search for "Cats" would return the same number of Help topics, but the order in which the topics are listed would be different.
![Closed](../../../../Skins/Default/Stylesheets/Images/transparent.gif)
Search for | Example | Results |
---|---|---|
A single word | cat
|
Topics that contain the word "cat". You will also find its grammatical variations, such as "cats". |
A phrase. You can specify that the search results contain a specific phrase. |
"cat food" (quotation marks) |
Topics that contain the literal phrase "cat food" and all its grammatical variations. Without the quotation marks, the query is equivalent to specifying an OR operator, which finds topics with one of the individual words instead of the phrase. |
![Closed](../../../../Skins/Default/Stylesheets/Images/transparent.gif)
Search for | Operator | Example |
---|---|---|
Two or more words in the same topic |
|
|
Either word in a topic |
|
|
Topics that do not contain a specific word or phrase |
|
|
Topics that contain one string and do not contain another | ^ (caret) |
cat ^ mouse
|
A combination of search types | ( ) parentheses |
|
Teaching Using Scan Files (Windows)
You can use scan files to teach applications for Windows.
![Closed](../../../../Skins/Default/Stylesheets/Images/transparent.gif)
Application teaching is the process of showing Data Flow Probe how to recognize new applications – or new versions of applications that are already included in an active SAI. An application consists of one or more files. Information about these file can exist either in the Master SAI provided by or it can be added to the User SAI.
In a Windows environment, files belonging to an application typically contain embedded Publisher, Application, and Version information. In both Windows and UNIX environments, certain installed package properties contain this type of information.
Questions to Ask Yourself
There are two key questions you must ask yourself when starting software recognition projects:
What is an acceptable level of recognition?
How long will it take to achieve the required recognition level?
![Closed](../../../../Skins/Default/Stylesheets/Images/transparent.gif)
The Unrecognized/File frequency chart available from the Analysis Workbench can be used to help you determine an acceptable level of recognition.
Displaying the Unrecognized/File Frequency Chart
In Analysis Workbench, after you have loaded your data, select the Charts > Unrecognized/File frequency option from the View main menu.
The Unrecognized/File frequency chart is a simplistic representation of the distribution of the number of files with a particular name compared with the number of times this file occurs across a sample of computers.
Interpreting the Chart
What is shown in the example chart is that the bulk of the files requiring recognition will exist on several machines within a sample, whereas the largest number of files to be recognized may only occur once or twice across the whole sample of computers.
If the computers on which these files were found can be identified, you can check them to see if they fit into the business critical category of machines needing attention.
This graph holds true for most situations where the number of copies and number of files vary with a particular sample of machines and the amount of recognition work carried on these same machines.
Using the graph, you can realistically assess the file recognition level required to achieve the goals for a particular project.
With these restrictions, 100% recognition is usually unrealistic.
Defining the Cut-Off
By looking at the typical profile of unknown files versus the number of times they occur, it can be seen that typically there are a large number of files that only occur once or twice across the sample of scan files.
This file count drops rapidly as files that occur multiple times are encountered. Therefore it is sensible to define a cut-off level in the file count to keep the number of files manageable. The cut-off may start at the 10% off level and progressively be reduced to 5% off or lower, depending on the number of known files that need to be handled and the target recognition level.
The exception to this is when there are business critical or special applications that only occur once or twice across the scan file sample. In this case, these applications are targeted and added to the SAI.
In many instances a cut-off of 95% of installed files is seen as a realistic target.
![Closed](../../../../Skins/Default/Stylesheets/Images/transparent.gif)
Estimating the time required for software recognition is based largely on experience. Factors that can play a big part are:
- Familiarity with Analysis Workbench.
- The number of scan files available.
- The quality of the scan file data – does it include valid header information in the files?
- The cut-off point required.
- The availability of local contacts – accessibility and responsiveness.
- Availability of personnel to visit machines or to investigate particular software.
- The file types to be included in the recognition. It is usual to restrict the types to COM, EXE and DLL files when teaching from scan files.
As a rule-of-thumb, over the project period an average of 100 files per day can be taught to your User.zsai file. This equates to 10-12 files per hour or roughly one every 5 minutes.
In the early stages of the project, you will need time to become familiar with the data set and to select a subset of scan files for teaching.
The most productive stage is when you have lots of files that can be quickly analyzed and taught. In particular, if it is possible to cluster a number of files together, it may be possible to achieve a better result than 100 files a day.
However as the project progresses, the amount of useful data available from Analysis Workbench can diminish. It becomes necessary to spend time contacting people who have specialist knowledge of the products or physically visit the machines and check the applications manually.
This can have a significant impact on recognition productivity.
![Closed](../../../../Skins/Default/Stylesheets/Images/transparent.gif)
The following are recognition statistics from a software recognition project.
The project used 900 scan files that contained 19,680 file names with 448,991 occurrences. The percentage of unrecognized files and occurrences is shown in the following table:
Start | Finish | |
---|---|---|
Recognized file names | 45% | 54% |
Unrecognized file names | 55% | 46% |
Recognized occurrences | 82% | 95% |
Unrecognized occurrences | 18% | 5% |
Around 1,820 files were taught over the duration of the project. The result was a 9% increase in recognition and 13% changes in occurrences.
![Closed](../../../../Skins/Default/Stylesheets/Images/transparent.gif)
It is important that you make all parties involved in the project aware of the scope of the teaching process before the project commences so that expectations are correctly set and met.
This is due to the difference in methods between using scan files for information and using installations of applications.
With application installations, it is possible to perform typical installations and see where all files are being deployed.
In the case of using the information available from scan files, the information relating to an application may be incomplete or different from other databases. This may be because:
- Available scan files may have incomplete installations.
- License relationships may be incomplete again due to incomplete installations.
![Closed](../../../../Skins/Default/Stylesheets/Images/transparent.gif)
As discussed, scan files of users’ machines are not an ideal source of data for teaching, although it often is the only source of data.
For applications where it is imperative that they be recognized as accurately as possible, it is recommended that you follow these steps where possible:
- Use an “empty” machine with just the operating system installed. The best way is to install the machine once and generate an image of it that can be restored to this state for every application.
- Scan the machine and store the result as the “reference” scan.
- Install the application to be recognized.
- Scan the machine again, storing the result as the “application” scan.
- Use both scan files in an import in the SAI Editor.
When used in this fashion, the SAI Editor can “subtract” the reference scan from the application scan. This allows a very accurate degree of teaching as all files affected by the installation can be automatically added to the library.
However, this method is more time consuming than when using scan files of users’ machines and requires a dedicated machine for the teaching work as well as access to the installation media for the applications.
The Application Librarians use this methodology for almost all applications that are added to the Master Library. MSI files are added to the Library using the MSI Importer in the SAI Editor.
![Closed](../../../../Skins/Default/Stylesheets/Images/transparent.gif)
Although a large number of scan files may be available, the only time that they all need to be loaded is when you need to produce progress statistics.
For teaching purposes, the smaller the subset of scan files loaded the better. This is because the recognition algorithm has fewer files and directories to scan, hence producing recognition results faster.
When choosing scan files for the project, consider the following:
- Particular location or department
- Machines types – server or laptop
- Specific scan files by name
Certain departments such as Finance or Personnel may be the only users of a certain type of application (for example, invoicing software).
These may contain special software (for example, communications software). Servers may contain server components of an application.
These may contain specific applications – such as a bank payment transfer – that only occur on a particular machine. Alternatively, the scan file may be an example of a much larger set of scan files that contain particular files.
![Closed](../../../../Skins/Default/Stylesheets/Images/transparent.gif)
Choosing a few scan files from a list of several thousand can be time consuming. The following steps show two different methods of different complexity that can be used to single out the scans that would be best for a teaching session.
Simple method: Use an Analysis Workbench Load Query
- Use the query option when loading scan files and generate a query that includes the desired assets.
- The selected Query will check all entries, which in a large list of scan files can be time consuming. If you need to load the same list of assets repeatedly, the more complicated method may be more appropriate.
Complex method: Load all scan files to locate the best ones to use
- Load your scan files into Analysis Workbench
- In the Files window, tag all CheckVer and Unknown files.
- Select the Files option from the Export menu. The Export to Files dialog box is displayed.
- In the Export to Files dialog box, click the Export Files tab.
- Select the One line per item option.
- Make sure you have selected the Asset Number and Name options in the list.
- Click the OK button.
- Import the exported data into a database program such as Microsoft Access 2000.
- Run a query that groups on the first instance of a file name. This provides a list of scan files that contain files of interest. The following is a sample SQL query:
- Take the output from this query and edit it into a script file under a section heading. Ensure that the full path is loaded. This can be defined with a variable for the path. Then use the Analysis Workbench LOAD_SCANFILE_LIST script to select the section.
- Run the script and this will load the defined scan files without having to wait for the user interface.
SELECT DISTINCTROW Files.Name, First(Files.[Asset Number]) AS [FirstOfAsset Number], Count(Files.[Asset Number]) AS [CountOfAssetNumber], Files.Directory, Files.Publisher, Files.Application, Files.Version, Files.Size
FROM Files
Group BY Files.Name, Files.Directory, Files.Publisher, Files.Application, Files.Version, Files.Size
HAVING (((Files.Name) In (SELECT [Name] FROM [Files] As Tmp GROUP BY [Name] HAVING Count(*)>1)))
ORDER BY Files.Name, Count(Files.[Asset Number]);
![Closed](../../../../Skins/Default/Stylesheets/Images/transparent.gif)
Having selected the scan files to work with for a particular session, the next thing you need to consider is the grouping of files based on specific criteria of interest.
This method is preferable to just examining files individually as it may lead to multiple files being selected for teaching at one time. The following sections discuss the benefits of particular selections.
Group Files by Date Without Selecting by Type
This method allows files (recognized and unrecognized) to be grouped together. This can help pick up stray files that may have been stored in directories away from the main application directory.
Date and Time Stamp in Windows
The file date and time stamp in Windows 200x/XP/Vista/7 usually depends on daylight savings time. This means that if a file was originally installed under GMT with a time stamp of 4:50, if the time is changed to BST (GMT+1), then the files’s time stamp becomes 5:50.
Vice versa, if an application was installed under BST and then the clocks were put back, the time stamp would change to 3:50.
This also means that using the date to cluster files when the machine is running Windows 200x/XP/Vista/7 could give different results if machines have been scanned with installations either side of a time change.
This effect does not occur with UNIX.
Group Files by Name
Products such as HP printer drivers often start with the same letters (that is, HP). Also files starting DC or DD may in fact be deleted files in the RECYCLE directories.
Group Files by Header Information
There are a number of files which are in archives or Cabinet files. It may be necessary to create a special Scanner which stores archives as directories.
Examine Directory Names
This can include Publisher, Application and even Version in some cases. Look for more than one directory with subdirectories off it. Look particularly under Program Files as often this is the default base directory for an application.
Look at Entries in Installed Applications
In Analysis Workbench you can select a machine and right-click it to display the scan file in Viewer. By looking in the Hardware and Configuration Data > Operating System > Installed Applications folder, the applications that have been registered in the System Registry will be displayed.
This information is also available in Analysis Workbench in the Machine Window under Windows Installed Apps. The column entries are not necessarily in alphabetical order.
Use Applications Window in Operational View
The Operational View displays all applications with partial or full identification.
In this view, Checkvers are displayed for an item. This reflects the fact that the file(s) is Checkver. Use this to recheck files and directory information. Often the grouping yields some information about a version.
To set Operational view:
- In Analysis Workbench select the Load Options command from the File menu.
- Click the Display Filter tab
- Select the Load Operational View option.
The Analysis Workbench Load Configuration–Advanced dialog box appears.
![Closed](../../../../Skins/Default/Stylesheets/Images/transparent.gif)
Teaching Recognition
After files have been selected, the teaching can be performed. This is the process of converting Checkver and Unknown files into recognized files in the Main or Associated categories.
Logging Your Progress
There are several ways that you can log your process:
- Checking the number of files that have been added to the User.zsai file.
- Checking the number of applications in the User.zsai file.
- Loading a set of scan files against the User.zsai file.
This must be done manually from the beginning each time, because files added later to an existing application are absorbed into the existing application.
This is useful but does not reflect the number of files contained in each application.
This is the most reliable way of tracking progress.
This is one of the few times that all the scan files need to be loaded. To keep the memory requirement down, only load *.exe and *.com files, and ignore Directories and Hardware data. This cuts down the amount of data to store in memory. This is a valid file selection, as it is unusual to teach *.dll files, for example, when just using scan files.
If the number of scan files gets too large for the available memory, then it will be necessary to split the load sets into more manageable chunks. Produce an export of all files using Name, Copy Count, and Status. These exports can be appended to produce one file. Then import the file into a database, and perform the necessary query on the data having first produced a query to eliminate duplicate file names.
Although the bulk of Main files are *.exe files, some applications are only identified by specific *.dll files or some other executable. In this case, all executable files will need to be loaded with the consequent impact on memory.
![Closed](../../../../Skins/Default/Stylesheets/Images/transparent.gif)
The project objective is to produce a clean User SAI that has recognized files to the defined cut-off level. The quality of the data will only be assessed as the information is being used. Issues that may compromise the data quality can occur for a number of reasons. Remember that when teaching from scan file based information only, the data available may be incomplete. The following section shows some examples of possible errors that may occur.
Recognition Errors
Here are some common recognition error scenarios:
- Header information is at a variance with the actual product. This can happen with OEM or 3rd party support products. The product version is often different from the file version (in this case, use the product version).
- Versions have not been applied correctly. Scan files may not include version information if the Scanner has not been set up to collect internal file identification data. Another reason might be that the application version reported in internal file data does not match the marketing version of the product. For example, Windows NT5 is packaged as Windows 2000. Microsoft Publisher 2000 has an internal version number of 6.0.
- Orphan files may be assigned incorrectly. These may be files that are not part of the main directory list for an application. They often reside in the \Windows or \Winnt directories.
- User recognition of application is incorrect. An application may be given a working name that is different from its official title.
- A version is installed that is different from the current license. Newer versions may have to be purchased for license purposes, but copies of the older version are deployed for compatibility reasons.
Input Errors
Here are some common input error scenarios:
- Vendor or application name is not consistent. This needs to be edited to ensure consistency for reports and exports.
- Product name does not match invoices or packaging. This would not be identified purely from the scan file content.
- The wrong version has been applied to a particular file. This could be caused by multiple versions being installed in the same directory.
- A space has been left in front of the text entry. This can happen if the entry is taken from teaching mode straight from the header.
Situations Where the Recognition Information Does Not Match the Client’s Software Invoice Details
If a client already has a database of software purchases, there can be a number of situations where the information taught to the SAI does not match software invoice details.
- The information in the SAI is incorrect. This may be due to incorrect descriptions obtained from file headers or the effect of misallocation of files in a mixed directory.
- The information in the SAI is out of date. Often for marketing reasons a product is rebranded. Edit the SAI to reflect the change. For example, Dr Solomon’s Antivirus becomes Total Virus Defence.
- The application is a device driver that needs hardware to be present. Therefore, the hardware effectively is the license and so there may not be a Main file.
- If after a Service Pack, Service release, or Hotfix is applied, only some associated files of the original application – but not the main file – are changed, the application itself will be reported as it was before the update. In this case, there might not be a Main file associated with a particular Service Pack, Service release, or Hotfix.
![Closed](../../../../Skins/Default/Stylesheets/Images/transparent.gif)
The following items should not be regarded as a definitive list of recognition methods, but rather they show the types of decisions that need to be taken when interpreting the available data. It is up to the individual to define a scheme that meets the needs of a particular situation.
Install/Uninstall Programs
Unless they can be tied to specific applications, use a generic setting of:
- Publisher = Various
- Application = Installation Program
- Version = Generic
Compressed Files
Often these come as part of the application, but they can sometimes be located in directories away from the application.
For a zip or Microsoft Cabinet file, use the archive option in the Scanner to capture the directory list of compressed files, and then compare with the other files on the machine.
Run-Time Files from Applications
Products such as Macromedia Director, Flash, etc., can produce run-time executables. It is not always possible to identify the content of the file, as the header will usually relate only to the generating application.
- Publisher = Name of the Publisher of the generating application
- Application = Name of the generating application
- Version = Version of the generating application (if known) otherwise use date of run-time file.
Files in Recycle Bins
Files in Recycle Bins will have file names such as DC1, DC9, DD1, etc. To remove them from the Unknown count, either:
- In Analysis Workbench, set load options to exclude recycle directories so that the contents do not appear in file counts.
- Teach the files in the directories with a generic description. For example
- Publisher = Various
- Application = Recycle Bin
- Version = Generic
It is important to check that the file name is not being used by a valid application. This means that any file not in a recycle directory still needs to be taught separately. This also means that it is dangerous to set a wide range of file sizes, as this range might encompass valid files.
The best approach would be to first load with the Recycle Bin directories included to produce a list of files in these directories. Then, reload the data with the load options set to exclude the recycle directories from normal teaching.
Zero Length Files
Files of zero length do occur as a result of uninstalling applications. Because these will never be executable files, then there is no recognition to be done. Therefore, just produce a report of zero length files for record purposes. Typically in a batch of 3000 files, there may be 10 to 20 such files.
Windows Hotfix
These are the intermediary fixes that are available between service releases. Identify:
- Publisher = name of publisher issuing the fix
- Application = name of application + HOTFIX
- Version = Date of file unless version details supplied
Applications That Are Available on Different Platforms
Some applications have the same name irrespective of which platform they appear on. Two approaches can be used, either:
- Consolidate all the files from all the different platforms into one application. This could lead to a range of file sizes for a particular file name.
- Teach each platform separately, and add the platform detail in the OS field of the application version.
Creating a List of Files Not Requiring Detailed Recognition
This may be a list of files not requiring detailed analysis for some reason. It may be that you wish to highlight internal use of applications (such as games) or that an application is internal and does not need detail version information.
- Publisher = Dummy or some other generic name
- Application = Dummy or some other generic name
- Version = Dummy or some other generic name to save having to use separate dates.
This activity should only be used as a last resort, otherwise everything will be added to this list without proper analysis. You may be tempted to do this thinking that if you move files from unrecognized to some form of recognition, then the recognition job is done. The problem will become apparent when you start to produce reports.
Why Applications May Be Taught Without a Main File
One situation where an application can be taught without a Main file is when the application is a device driver that requires hardware to be present. Therefore the hardware effectively is the license.
Using States
Rather than loading large numbers of scan files each time for a session, using States can considerably reduce the time to access data, provided that States and SAIs are kept synchronized. If the User SAI is updated, then a new state needs to be stored before exiting Analysis Workbench.
![Closed](../../../../Skins/Default/Stylesheets/Images/transparent.gif)
Some examples of groups of applications are given in this section:
- Microsoft Office 95 has Word 6
- Office 2000 is packaged as Office 2000 but has an internal version numbers of 9.x.
- Microsoft Publisher 2000 has internal version of 6.0
Components of the Different Editions
Office 2003
Office 2003 Professional Enterprise 2003: | Office 2003 Standard: |
---|---|
Office Word 2003 | Office Word 2003 |
Office PowerPoint 2003 | Office PowerPoint 2003 |
Office Excel 2003 | Office Excel 2003 |
Office Publisher 2003 | Office Outlook 2003 |
Office Outlook 2003 | |
Office Outlook 2003 with Business Contact Manager (US) | |
Office Access 2003 | |
Office InfoPath 2003 |
Lotus SmartSuite Release 9 (or ver.9.8)
Name | Application type |
---|---|
WordPro | Wordprocessing software |
1-2-3 | Spreadsheet software |
Approach | Database management system software |
Freelance Graphics | Graphics or photo imaging software |
Organizer | Calendar and scheduling software |
FastSite | Web page creation and editing software |
ScreenCam | Video creation and editing software |
How to Handle Special Groups of Files
- WinZip Self Extractor
Use the Scanner with ZIP extension
Abbreviations
- ZAK - Zero administration Kit from Microsoft
- ZEN - Novell
- TVD - Total Virus Defence - Network Associates
- SMS - Systems Management Server - Microsoft
![Closed](../../../../Skins/Default/Stylesheets/Images/transparent.gif)
When using a large number of scan files for recognition, the bulk of the time is spent with a subset of the total scan files available. Usually, the total number of scan files only need to be loaded to produce figures relating to the split of Recognized and Unrecognized files.
The easiest way to track your progress is to:
- Load the scan files into Analysis Workbench.
- Locate the Status column in a Files window.
- Right-click the column header, and select the Chart option.
This produces a chart showing the necessary split of the status of files—that is, Main, Associated, 3rd Party, checkVer, Unknown, Unprocessed and Auto-identified.
You can also:
- Select the Copies column.
- Left click the column header.
- Use a Global tag to tag all.
- Untag selections to deselect unwanted types and/or chop count cut-off.
- To prioritize the work, just select the scan files with the most tagged files. This is done by ensuring that the Tagged Field column (from the scan file section) is displayed.
- Sort on the number of tagged files and select a small group of scan files from this selection.
- These can be added to an OR query when loading scan files. Or, for a faster response without the user interface, use scripts to load a list of scan files.
This ranks the files in copy order.
If a tag selection is made of Unknown and Checkver above a certain number of copies, then a file filter can be applied to the machines window to select the scan files where these occur.
We welcome your comments!
To open the configured email client on this computer, open an email window.
Otherwise, copy the information below to a web mail client, and send this email to cms-doc@microfocus.com.
Help Topic ID:
Product:
Topic Title:
Feedback: