DeviceLock Discovery Overview : Understanding DeviceLock Discovery : Features and Benefits
  
Features and Benefits
The key features and benefits of DeviceLock Discovery are as follows:
Content-based discovery. You can discover information and automatically take pre-defined actions based on real type of information as determined by its actual content. Content-based discovery can locate many types of data even if the files are renamed and their extensions changed. Thus, you can identify sensitive content receiving an immediate alert, removing the content on the spot or changing available access rights.
Document classification-based discovery. You can discover documents and automatically take pre-defined actions based on:
Digital fingerprints of sensitive documents being taken and stored on the DeviceLock Enterprise Server. Fingerprint-based discovery can identify full copies as well as pieces of documents, even if the document has been changed.
Classification labels for third-party products, such as the Boldon James Classifier applications, in which document attributes are set according to the level of sensitivity of the document.
Document discovery in Elasticsearch. You can discover documents of interest in Elasticsearch - a distributed system that provides real-time indexing and search for a wide variety of data types. DeviceLock Discovery requests a document search in Elasticsearch, matches search results to discovery rules, and then sends alerts, logs events, and generates reports upon discovery results.
Expansive coverage of multiple file formats and data types. You can identify content in the following file formats and data types: Adobe Acrobat (including encrypted files if the type of encryption in the file is one of the following: 40-bit RC4, 128-bit RC4, 128-bit AES and 256-bit AES, and the file permissions do not disable text extraction) (*.pdf), Adobe Framemaker MIF (*.mif), Ami Pro (*.sam), Ansi Text (*.txt), ASCII Text, ASF media files (metadata only) (*.asf), AutoCAD (*.dwg, *.dxf), CSV (Comma-separated values) (*.csv), DBF (*.dbf), EBCDIC, EML (emails saved by Outlook Express) (*.eml), Enhanced Metafile Format (*.emf), Eudora MBX message files (*.mbx), Flash (*.swf), GZIP (*.gz), HTML (*.htm, *.html), iCalendar (*.ics), Ichitaro (versions 5 and later) (*.jtd, *.jbw), JPEG (*.jpg), Lotus 1-2-3 (*.123, *.wk?), MBOX email archives such as Thunderbird (*.mbx), MHT archives (HTML archives saved by Internet Explorer) (*.mht), MIME messages (including attachments), MSG (emails saved by Outlook) (*.msg), Microsoft Access MDB files (*.mdb, *.accdb, including Access 2007 and Access 2010), Microsoft Document Imaging (*.mdi), Microsoft Excel (*.xls), Microsoft Excel 2003 XML (*.xml), Microsoft Excel 2007, 2010, and 2013 (*.xlsx), Microsoft OneNote 2007, 2010, and 2013 (*.one), Microsoft Outlook data files (*.PST), Microsoft Outlook/Exchange Messages, Notes, Contacts, Appointments, and Tasks, Microsoft Outlook Express 5 and 6 (*.dbx) message stores, Microsoft PowerPoint (*.ppt), Microsoft PowerPoint 2007, 2010, and 2013 (*.pptx), Microsoft Rich Text Format (*.rtf), Microsoft Searchable Tiff (*.tiff), Microsoft Visio (*.vsd, *.vst, *.vss, *.vdw, *.vsdx, *.vssx, *.vstx, *.vsdm, *.vssm, *.vstm), Microsoft Word for DOS (*.doc), Microsoft Word for Windows (*.doc), Microsoft Word 2003 XML (*.xml), Microsoft Word 2007, 2010, and 2013 (*.docx), Microsoft Works (*.wks), MP3 (metadata only) (*.mp3), Multimate Advantage II (*.dox), Multimate version 4 (*.doc), OpenOffice versions 1, 2, and 3 documents, spreadsheets, and presentations (*.sxc, *.sxd, *.sxi, *.sxw, *.sxg, *.stc, *.sti, *.stw, *.stm, *.odt, *.ott, *.odg, *.otg, *.odp, *.otp, *.ods, *.ots, *.odf) (includes OASIS Open Document Format for Office Applications), Quattro Pro (*.wb1, *.wb2, *.wb3, *.qpw), QuickTime (*.mov, *.m4a, *.m4v), RAR (*.rar), TAR (*.tar), TIFF (metadata only) (*.tif), TNEF (winmail.dat), Treepad HJT files (*.hjt), Unicode (UCS16, Mac or Windows byte order, or UTF-8), Visio XML files (*.vdx), Windows Metafile Format (*.wmf), WMA media files (metadata only) (*.wma), WMV video files (metadata only) (*.wmv), WordPerfect 4.2 (*.wpd, *.wpf), WordPerfect (5.0 and later) (*.wpd, *.wpf), WordStar version 1, 2, 3 (*.ws), WordStar versions 4, 5, 6 (*.ws), WordStar 2000, Write (*.wri), XBase (including FoxPro, dBase, and other XBase-compatible formats) (*.dbf), XML (*.xml), XML Paper Specification (*.xps), XSL, XyWrite, ZIP (*.zip) as well as PostScript, PCL5, PCL6 (PCL XL), HP-GL/2, EMF spooled files and GDI printing (ZjStream).
 
Note: Content in AutoCAD (DWG, DXF) file formats can be identified on Windows XP and later systems.
Continuous protection. You can apply content-based security policies to your entire network periodically with scheduled scans.
Multiple content detection methods. You can use multiple methods to identify sensitive content contained in documents (based on regular expressions, keywords, and document properties).
Centralized content management. Flexible, content-aware Rules and Actions are managed based on content groups that enable you to centrally define types of content types that you want to control.
Ability to override access rights. You can selectively allow or deny access to certain content stored on network computers regardless of preset permissions.
Inspection of files within archives. Allows you to perform deep inspection of each individual file contained in an archive. The following inspection algorithm is used: when a compressed archive is detected, all files are extracted from the archive and analyzed individually to detect the content to which to apply the actions defined in Rules and Actions. If the content of at least one file from the archive gets a positive match in the Rules and Actions section, DeviceLock Discovery will apply the corresponding rule or action to the entire archive.
All nested archives are also unpacked and analyzed one by one. Archive files are detected by content, not by extension. The following archive formats are supported: 7z (.7z), ZIP (.zip), GZIP (.gz, .gzip, .tgz), BZIP2 (.bz2, .bzip2, .tbz2, .tbz), TAR (.tar), RAR (.rar), CAB (.cab), ARJ (.arj), Z (.z, .taz), CPIO (.cpio), RPM (.rpm), DEB (.deb), LZH (.lzh, .lha), CHM (.chm, .chw, .hxs), ISO (.Iso), UDF (.Iso), COMPOUND (.Msi), WIM (.wim, .swm), DMG (.dmg), XAR (.xar), HFS (.hfs), NSIS (.exe), XZ (.xz), MsLZ (.mslz), VHD (.vhd), FLV (.flv), SWF (.swf) as well as CramFS, SquashFS (.squashfs), NTFS, FAT and MBR file system and disk images. Split (or multi-volume) and password-protected archives are not unpacked.
Optical Character Recognition (OCR). The use of the OCR technology allows you to recognize and extract text from scanned documents, camera-captured documents (if these documents were aligned 90 degrees to the camera), and screen shots of documents for further content analysis by Content-Aware Rules.
OCR includes the following capabilities:
An entire image or some portions of the image can be inverted, rotated, or mirrored.
Images with poor brightness or low contrast are supported.
Most fonts can be accurately recognized.
OCR has the following limitations:
Recognition of handwritten text or any fonts that look like handwritten text is not supported.
Embossed and engraved texts are not recognized.
Best recognition results are achieved for black text on a white background.
The built-in OCR supports the following languages: Arabic, Bulgarian, Catalan, Chinese - Simplified, Chinese - Traditional, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Hungarian, Indonesian, Italian, Japanese, Korean, Latvian, Lithuanian, Norwegian, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Spanish, Swedish, and Turkish. The following image files are supported for OCR processing: BMP files, Dr. Halo CUT files, DDS files, EXR files, Raw Fax G3 files, GIF files, HDR files, ICO files, IFF files (except Maya IFF files), JBIG files, JNG files, JPEG/JIF files, JPEG-2000 files, JPEG-2000 codestream files, KOALA files, Kodak PhotoCD files, MNG files, PCX files, PBM/PGM/PPM files, PFM files, PNG files, Macintosh PICT files, Photoshop PSD files, RAW camera files, Sun RAS files, SGI files, TARGA files, TIFF files, WBMP files, XBM files, and XPM files.
 
Note: The OCR feature is only supported on Windows XP and later versions of Windows.
Text in picture detection. The use of the text-in-picture detection technology allows you to classify all images into two groups: text images (containing text, such as scanned documents or screen shots of documents) and non-text images (those that don’t contain text). Timely identifying text images helps prevent or investigate leakage of sensitive information within image files. The following image files are supported: BMP files, Dr. Halo CUT files, DDS files, EXR files, Raw Fax G3 files, GIF files, HDR files, ICO files, IFF files (except Maya IFF files), JBIG files, JNG files, JPEG/JIF files, JPEG-2000 files, JPEG-2000 codestream files, KOALA files, Kodak PhotoCD files, MNG files, PCX files, PBM/PGM/PPM files, PFM files, PNG files, Macintosh PICT files, Photoshop PSD files, RAW camera files, Sun RAS files, SGI files, TARGA files, TIFF files, WBMP files, XBM files, XPM files.
Inspection of images embedded in documents. Allows you to perform deep inspection of each individual image embedded in Adobe Portable Document Format (including encrypted files if the type of encryption in the file is one of the following: 40-bit RC4, 128-bit RC4, 128-bit AES and 256-bit AES, and the file permissions do not disable text extraction) (PDF) files, Rich Text Format (RTF), AutoCAD files (.dwg, .dxf), and Microsoft Office documents (.doc, .xls, .ppt, .vsd, .docx, .xlsx, .pptx, .vsdx). All embedded images are extracted from these documents to the Temp folder of the System user and analyzed independently from text. The text contained inside documents is checked against the list of Rules and Actions that are created based on Keywords, Pattern or Complex content groups. Embedded images are checked against Rules and Actions that are created based on File Type Detection, Document Properties or Complex content groups. The appropriate action will be applied to the entire document if either its text or any of the images contained in the document have a match in the Rules and Actions list.
 
Note: Deep inspection of images embedded in files of AutoCAD (DWG, DXF) formats can be performed on Windows XP and later systems only.