Digital Fingerprints : Digital Fingerprinting Technique : How It Works
  
How It Works
The digital fingerprinting technique is based upon the interaction of the following elements:
Content-aware rules
Content groups
Classifications of digital fingerprints
Digital fingerprints of documents and files
Digital fingerprints database
Match percentage
Normalization of fingerprints
Content-aware rules
Content-aware rules can leverage logical “content groups” of the Digital Fingerprints filter type for data analysis based on digital fingerprints. Such rules can be applied to both devices and network protocols, thus enabling the use of digital fingerprints to control content access/sending permissions, content-aware shadowing, and/or simple content detection.
Content groups
Content groups of the Digital Fingerprints filter type implement content inspection using digital fingerprints. Each group of this type references a certain classification of digital fingerprints, and would allow for the specification of a minimum percentage of fingerprint matching (referred to as threshold) that is required for assigning that classification to the content being inspected.
Classifications of digital fingerprints
Confidential documents and other information assets requiring protection can be classified according to classifications with certain levels of importance or secrecy (e.g. “Restricted”, “Confidential”, “Secret”, and “Top Secret”). Their digital fingerprints are classified to the respective classifications as well, so that the classification of each level holds digital fingerprints of information classified with the corresponding importance level. Each classification can be considered as a container that holds digital fingerprints of information samples classified to a certain level of importance or secrecy. Classifications are ordered according to that level.
DeviceLock provides a number of built-in classifications, and allows the addition of more custom ones. Their order by importance level can be changed when needed; however, the level of the built-in classification “Unclassified” is always lower than the level of any other classification and it cannot be raised. The “Unclassified” digital fingerprints have the lowest possible level regardless of whether or not they are encountered in other classifications.
Digital fingerprints of documents and files
A collection of hashes that uniquely identify a document or file and its contents is referred to as the digital fingerprint of that document or file. Fingerprints of sample documents and files of known classification can be stored in the database, where they are assigned that same classification. Then, the documents and files being inspected can be classified by comparing their fingerprints with those from the database. Thus, the collection and storage of fingerprints over time has a key role in the classification of documents and files going forward.
Digital fingerprints database
The DeviceLock Enterprise Server stores digital fingerprints of information samples provided to it (such as documents and files) in the fingerprints database, and allows for the management of fingerprints held in that database. Fingerprints are grouped by classification of their source. For example, fingerprints of “Secret” document samples are included in the “Secret” classification.
The database is serviced by tasks running on the server. For each classification, tasks can be created that process certain information sources (such as sets of documents) deliberately selected to the given classification. For example, a task for the “Secret” classification may be configured to process a folder with samples of “Secret” files. The fingerprints created by this task belong to the “Secret” classification, and they can be used to identify other documents or files as “Secret” by matching the fingerprints of those documents or files with the fingerprints of the samples of “Secret” files.
Match percentage
When inspecting an information source (such as a document or file), DeviceLock can compare the source’s fingerprints with those of a certain classification in the database, and calculate their match percentage. If the match percentage exceeds the configured threshold, DeviceLock classifies the inspected information accordingly. For “Top Secret” documents, the match threshold might be relatively low as even small pieces of such documents may contain very important information. Conversely, for a document to be recognized as “Unclassified”, a large amount of its fragments must match with samples of “Unclassified” documents, so the match threshold must be relatively high. The match threshold value is selected when configuring a fingerprints content group for content-aware rules.
The match percentage is calculated as the greater of two values:
The percentage of the source’s fingerprint elements that match the database-stored fingerprints of the given classification
The total percentage of the elements of the database-stored fingerprints of the given classification that match the source’s fingerprints
The first value responds to a situation where the source contains fragments of various samples of sensitive information; the second value enables the correct classification of the source when it contains samples of sensitive information along with a large amount of non-sensitive information. Together these two values allow for proper handling of most cases of digital fingerprint-based identification of content.
Normalization of fingerprints
To optimize and expedite the process of matching, the fingerprints in the database are exposed to normalization: the elements of the “Unclassified” fingerprints are removed from all fingerprints held in other classifications. This assumes that “Unclassified” documents certainly do not contain sensitive information. If a document got in the classification “Unclassified”, the information held therein will not be identified as “Secret” or “Confidential”, even if its fingerprints are available in other classifications.