Deduplication is the process of minimizing storage space taken by the data by detecting data repetition and storing the identical data only once.
Deduplication may also reduce network load: if, during a backup, a data is found to be a duplicate of an already stored one, its content is not transferred over the network.
Acronis Backup will deduplicate backups saved to a managed vault if you enable deduplication during the vault creation. A vault where deduplication is enabled is called a deduplicating vault.
The deduplication is performed on data blocks. The block size is 4 KB for disk-level backups and 1 B to 256 KB for file-level backups. Each file that is less than 256 KB is considered a data block. Files larger than 256 KB are split into 256-KB blocks.
Acronis Backup performs deduplication in two steps:
Deduplication at source
Performed on a managed machine during backup. The agent uses the storage node to determine what data can be deduplicated and does not transfer the data blocks whose duplicates are already present in the vault.
Deduplication at target
Performed in the vault after a backup is completed. The storage node analyses the vault's contents and deduplicates data in the vault.
When creating a backup plan, you have the option to turn off deduplication at source for that plan. This may lead to faster backups but a greater load on the network and storage node.
Deduplication database
Acronis Backup Storage Node managing a deduplicating vault, maintains the deduplication database, which contains the hash values of all data blocks stored in the vault—except for those that cannot be deduplicated, such as encrypted files.
The deduplication database is stored in the storage node local folder. You can specify the database path when creating the vault.
The size of the deduplication database is about 1.5 percent of the total size of unique data stored in the vault. In other words, each terabyte of new (non-duplicate) data adds about 15 GB to the database.
If the database is corrupted or the storage node is lost, while the vault retains its contents, the new storage node rescans the vault and re-creates the vault database and then the deduplication database.