Why does a Java MessageDigest not reset in a for loop?

9 views (last 30 days)
I'm writing (hacking together from stuff I found online) a basic script to take the MD5 hash of every possible (UK) mobile phone number and print it (for a demonstration). For some reason, every loop is returning the same hash, despite what I thought would be a reset of the engine (see below).
import java.security.*;
import java.math.*;
for i=1:10^9
thisStr = strcat("07",sprintf("%09d",i-1));
md = MessageDigest.getInstance('MD5');
hash = md.digest(double(thisStr));
bi = BigInteger(1, hash);
strHash = char(bi.toString(16));
fprintf("%s : %s\n", thisStr, strHash)
md.reset();
end
The numbers are incrementing, but the hash is unchanging in the output:
07000000000 : 93b885adfe0da089cdf634904fd59f71
07000000001 : 93b885adfe0da089cdf634904fd59f71
07000000002 : 93b885adfe0da089cdf634904fd59f71
07000000003 : 93b885adfe0da089cdf634904fd59f71
07000000004 : 93b885adfe0da089cdf634904fd59f71
07000000005 : 93b885adfe0da089cdf634904fd59f71
07000000006 : 93b885adfe0da089cdf634904fd59f71
07000000007 : 93b885adfe0da089cdf634904fd59f71
07000000008 : 93b885adfe0da089cdf634904fd59f71
07000000009 : 93b885adfe0da089cdf634904fd59f71
07000000010 : 93b885adfe0da089cdf634904fd59f71
% ... and so on ...
What am I missing? I don't have much experience with Java but what documentation I've read says that the reset method should effectively clear the instance and start afresh next loop, but clearly that isn't the case. And of course, I've tried the basic MATLAB:
clear md hash bi strHash
Which yielded no success either.
Any help is appreciated. Note I'm using R2020a prerelease but the "Products" dropdown doesn't give me the option.
  1 Comment
Guillaume
Guillaume on 20 Feb 2020
Note I'm using R2020a prerelease but the "Products" dropdown doesn't give me the option.
That's because we're not allowed to discuss publicly prereleases, you agreed to that when you downloaded it. However, your problem is not tied to a particular release so it's not really important.

Sign in to comment.

Answers (1)

Guillaume
Guillaume on 21 Feb 2020
Edited: Guillaume on 21 Feb 2020
The first thing to be aware of, and the main reason for why your code doesn't work is that your thisStr is a string and not a char vector. The double conversion behaves differently for these. With a char vector, it converts each character to its equivalent (UTF16) character value whereas for a string, it converts the string to whatever number is represented by the text. If the text does not represent a number, the result is NaN:
>> double("07000000001") %string
ans =
7000000001
>> double('07000000001') %char vector
ans =
48 55 48 48 48 48 48 48 48 48 49
>> double("1a1")
ans =
NaN
>> double('1a1')
ans =
49 97 49
So, you're not computing the hash of the characters "0", "7", "0", etc. but the hash of one double number. I'm not sure why 7000000001, 7000000002, etc. result in the same hash. Possibly, it's something to do with how matlab converts a scalar double into the array of bytes required by digest but in any case, you're not computing the hash of what you wanted.
Other things:
- You may as well put the 07 prefix into the sprintf format string. It'll be easier to read.
- You only need to get the md5 instance once
- You don't need to call reset after calling digest. As documented, digest does the reset.
So, a rewrite of your code:
md = java.security.MessageDigest.getInstance('MD5');
for i = 1:1e9
thisStr = sprintf('07%09d', i);
hash = md.digest(uint8(thisStr)); %you can use double() but uint8() is the byte[] equivalent
strHash = reshape(dec2hex(typecast(hash, 'uint8'))', 1, []); %another way to convert the hash to hex string without needing java
fprintf("%s : %s\n", thisStr, strHash)
end

Categories

Find more on Characters and Strings in Help Center and File Exchange

Products


Release

R2019b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!