File Exchange

image thumbnail

Using the MD5 Hash for Duplicate File Deletion

version (1.91 KB) by Michael Kleder
This function uses an MD5 hash to rapidly detect and delete duplicate files in a directory.

1 Download

Updated 15 Dec 2004

No License

This function rapidly compares large numbers of files for identical content by computing the MD5 hash of each file and detecting duplicates. The probablility of two non-identical files having the same MD5 hash, even in a hypothetical directory containing as many as a million files, is exceedingly remote. Thus, since hashes rather than file contents are compared, the process of detecting duplicates is greatly accelerated.

You must have the file md5DLL.dll on your MATLAB path to use this function. The function is stored in the MATLAB Central File Exchange, file #3784, and was written by Hans-Peter Suter. The URL for the download site is:

This function is intended for MS Windows operating systems. This is because MATLAB requires on the order of 0.1 seconds to execute an operating system command to delete each file, but can rapidly create and run an operating system batch file to perform the file deletions much faster. Since I use MATLAB on a Windows PC, this function creates a batch file for that platform. Futhermore, the md5DLL.dll file is specific to Windows.

Cite As

Michael Kleder (2021). Using the MD5 Hash for Duplicate File Deletion (, MATLAB Central File Exchange. Retrieved .

Comments and Ratings (2)

Jakob Kleinbach

Thanks for this solution, which I use more and more often.

I'd like to share another favourite to retrieve duplicate files (via their MD5 sum) on the DOS command line:

First, you need these helpers from the GNU GnuWin32 Packages: find, md5sum, sort, uniq, sed, xargs ( Put these somewhere, e.g. C:\msys\bin.

Secondly, create a *.bat file with the following:
@echo off
if "%1"=="" goto Usage
set MSYSDIR=C:\msys\bin
.\find %1 -type f -print0 | .\xargs -0 -n1 .\md5sum | .\sort -k 1,32 | .\uniq -w 32 -d -D | .\sed -r 's/^[\\][0-9a-f]*( )*(\*)/\.\/find "/;s/[\]+/\\\/g;s/\//\\\/g;s/$/" -printf "%%s\\\t%%f"/' | cmd.exe /Q /K | .\sed -r 's/.*^>//' | .\sort -n
goto End
echo Usage: %0 PATH Path must not contain blanks

Assuming your batch file is named fd.bat, you can check c:\windows recursively for duplicate files with

C:\> fd.bat c:\windows

This is adapted from some bash scripts i have found and works pretty well for me.

A. Belani

-It may help to include in both the webpage and the function description that files with same sizes are checked first, before their hashes are computed.
-It may help to optionally simply provide a list of the duplicates along with the original files, instead of deleting them.
-It may help to provide an option where the duplicate file with the older date gets deleted.

MATLAB Release Compatibility
Created with R11.1
Compatible with any release
Platform Compatibility
Windows macOS Linux

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!