Digital Fingerprints : Digital Fingerprinting Technique : Fingerprints Collection and Storage
  
Fingerprints Collection and Storage
Samples of information assets (documents, files, etc.) that have their fingerprints collected and stored in the database are referred to as fingerprint sources. Samples may be modified, added, or removed, or their secrecy level may change with time. For the database to accommodate all those changes, the server runs classification tasks on a regular basis, which will update the fingerprints storage as described below.
Information samples are processed and their fingerprints are created by tasks on the DeviceLock Enterprise Server. Each task attributes to a certain classification, and assigns it to the fingerprints that it creates. For instance, fingerprints created by a task of “Confidential” classification are attributed to that same “Confidential” classification.
On every run, the task may inspect files in a particular folder. For each file, the task first creates the fingerprints of the file and compares them with the fingerprints from the database. The further processing of the file’s fingerprints depends upon comparison results such as:
The classification already holds a fingerprint whose source has the same check sum, path, and name as the file being inspected. In this case, the task does not make changes to the fingerprints storage. However, in the case of a different path or name, the file is specified as one more source of that fingerprint in the database.
The file’s check sum differs from the source’s check sum of an existing fingerprint, but the file’s fingerprint matches the existing fingerprint to some extent. In this case, the result of the task depends upon the percentage of matching elements of those fingerprints.
If the percentage of matching elements does not exceed the configured threshold, the file’s fingerprint is added to the database as a new fingerprint, with that file specified as its source.
If the percentage of matching elements exceeds the configured threshold, then the file’s fingerprint is specified as a new version of the existing one in the database. In this case, the file is specified as one more source of that fingerprint if its path or name differs from the path and/or name of other sources.
The file’s fingerprint does not match any one fingerprint from the database. In this case, the file’s fingerprint is added to the database as a new fingerprint, with that file specified as its source.
Even if the source of a fingerprint is removed, the fingerprint remains in the database. DeviceLock administrators can delete fingerprints or their individual versions by hand using the DeviceLock Management Console.
About Versioning Threshold
The versioning threshold determines whether to create a new fingerprint or merely add a new version to an existing fingerprint. The DeviceLock Enterprise Server specifies separate thresholds for text content (such as text files) and for binary content (such as image files).
Many files hold content of both types. For instance, Microsoft Word documents are binary files that can contain text and images. Fingerprints of files with mixed contents hold elements identifying text content and elements identifying binary content. When classifying such a “mixed” fingerprint, the server applies both thresholds, and separately assesses the match percentages for “text” and “binary” elements of the fingerprint. This results in the following effects:
The fingerprint of a text file can be classified as a fingerprint version for a file with mixed contents, and vice versa, a “mixed” fingerprint can become a fingerprint version for a text file.
The fingerprint of a binary file containing no text can be classified as a fingerprint version for a file with mixed contents, and vice versa, a “mixed” fingerprint can become a fingerprint version for a binary file with no text contents.