Inconsistent matrix multiplication results

Question

0 votes

I noticed that some code I was running returned different results between runs. I seed all random number generators with a constant, so that shouldn't be the issue. I finally tracked down the problem (or at least one case of it) to a single matrix operation, of the form y1/(y'*y), where y1 is a scalar and y is a column vector. For two seemingly identical values of y, y'*y produced answers that differed in the last decimal place. If I reassigned y - i.e. d = y, d'*d, the answer was the same as y'*y. If I print out y with full precision (20 decimal places) and assign it to d, then d'*d was different from y'*y, even though y == d was all 1.

If I saved the workspace and loaded it, y'*y was different (not equal to the old y'*y), but y itself was still the same.

I am using r2011b, and this issue is somewhat annoying, because these discrepancies add up and I get inconsistent results between runs.

Has anyone seen this before, and do you have any advice? Could this be a bug in r2011b?

EDIT: y and d arrays are bitwise identical (using hex printout), but y'y and d'd differ in the last bit; saving & loading the workspace makes y'y identical to d'd, and this error appears to be nondeterministic. In successive runs, the error in the last bit may or may not occur, and appears somehow "tied" to the array y - i.e. if it occurs, successive calls to y'y produce the same (incorrect) answer, but loading the workspace with bitwise identical y produces the correct answer.

I suspect that Matlab keeps some internal flags that describe the order in which the sum in y'*y should be computed (resulting in different truncation error), and these flags are somehow associated with the array. Reloading the workspace must reset these flags. However, for whatever reason, between successive runs of my function, this is not consistent.

EDIT 2: I can confirm that this does not happen in R2010b or R2009b. So this appears to be a new bug introduced in R2011b.

1 Comment
Show -1 older comments Hide -1 older comments

Walter Roberson on 2 Feb 2012

http://matlab.wikia.com/wiki/FAQ#Why_is_0.3_-_0.2_-_0.1_.28or_similar.29_not_equal_to_zero.3F

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

John D'Errico on 2 Feb 2012

Open in MATLAB Online

1 vote

Sigh. No. This is NOT a bug in r2011b. This is NOT a bug. It is merely your not recognizing that this is floating point arithmetic. And the fact is, even if you print out 20 decimal digits of y, you have not encapsulated the true exact value of y!!!!!!!

For example, consider this number:

N = 1.23456789012346;

The decimal representation of that number was simple. Yet matlab converts it to a binary form, because all numbers are stored in that form internally. The value that matlab actually uses is:

hpf(N)
ans =
1.2345678901234560242983206990174949169158935546875

As you can see, they differ.

Welcome to the world of floating point arithmetic.

9 Comments
Show 7 older comments Hide 7 older comments

Sergey on 2 Feb 2012

I can, but I'm not sure how much good it will do. When I enter these numbers manually (using hex2num), I always get a consistent answer: 40b5c49c7eb4bc53

However, when I evaluate y'*y on the original array, I get this: 40b5c49c7eb4bc52

I haven't come up with a way to reproduce it w/o running my function a bunch of times until the 40b5c49c7eb4bc52 answer comes out. Every "fresh" array, either loaded from file or entered manually, evaluates to 40b5c49c7eb4bc53

Here is the array (32 entries). It's a column vector, but I'm writing it reshaped into 4 columns for compactness:

c037653607fcb0db c022cdfe633f86a0 4037ee6fd16139ac c00816d8c4484908

c03d44413e03fd2b c0056ceb88ac8cb4 40331543a66aa994 401e4e8c753dd51c

c042348ba90120ad c0129c9ed4bfc5de bfdb83c5dbfc86c0 bff73fe2232f7d20

c030c9a07936b006 bffdb48d2368a964 401cbe6b1e9ba90c c022ba5b9951b216

c02ce3baec176352 40094f6c684bb8b0 4019af06b90b961c 401326bf0214ee91

c02b472fc9322ecb 4031556d5c6567ec 401987b86cde6b7c 4027fc99d8a47144

c029e6a7146373ca 401a72a383ae74bd 401580bd22716d54 c0176ac4d605bd06

c017a4cb0ceac8ca bffed30eddab4a1c 3ffeec199f2030c8 4013df7eaeb7c8d4

James Tursa on 2 Feb 2012

I am unaware of any flag attached to variables internally that would behave this way, and I am pretty familiar with the internal representation of a MATLAB numeric variable. If the two arrays are indeed hex identical down to the last bit, then I would suspect that the order of operations is different between the two calculations. The MATLAB JIT accelerator can sometimes do this for two seemingly identical expressions ... e.g. one as part of an assignment and the other as part of an if test could get different optimizations applied to them. Or if one is part of a function (JIT processing gets applied) and the other is typed in from the command line (JIT processing is not done). Or perhaps something is changing the floating point mode in the background. Or maybe something is going on in the BLAS library with regards to reduction variables in a multi-threaded operation. Hard to say unless we can reproduce the behavior. You might be interested in my MTIMESX submission on the FEX to do your dot products ... it can sometimes be faster than the MATLAB calculation if you have a good compiler, and because of loop unrolling it is usually more accurate.

Sergey on 2 Feb 2012

all(y(:)==d(:)) is true, and the variables are identical to the last bit.

It's interesting that the expressions can get evaluated differently, but I don't think any of the examples you gave are happening here. It really looks like the result is somehow "tied" to the original array. I can break in with the debugger after I get the "bad" answer (40b5c49c7eb4bc52), and just enter y'*y in the console and get 40b5c49c7eb4bc52 every time, then I either enter the identical hex values to get d, or do d = y, save d to a .mat file and load it again, and then d'*d will evaluate to 40b5c49c7eb4bc53 every time from the console, while y'*y still evaluates to 40b5c49c7eb4bc52.

But what's causing this is definitely something that was introduced in either R2011a or R2011b, because R2010b does not have this issue.

James Tursa on 2 Feb 2012

Maybe it is not doing y'*y at all ... the optimization may recognize that y hasn't changed and it is just reporting the result of an earlier evaluation of this product (that did it in a different order). Try tic;y'*y;toc vs tic;d'*d;toc to see if there is a difference in timing that might yield a very small value for the first one. Also try y(1) = y(1) (forces optimization code to think that the variable has changed) and then do y'*y again.

Sign in to comment.

Answer 2

Jan on 2 Feb 2012

0 votes

It would be helpful, if you post a small piece of code, which reproduces the problem. Then we could investigate what's going on, e.g. by checking the mxArray headers or calling the BLAS functions directly from a Mex function.

If you cannot reproduce the problem with manually created data, one of the following problem can be the source of the different results:

An external program modifies the floating point environment, e.g. the rounding direction, perhaps temporarily.
The floating point unit of your processor is near to melting. This happened to me one time - but after some hours the processor died completely...

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

Inconsistent matrix multiplication results

1 Comment
Show -1 older comments Hide -1 older comments

Answers (2)

9 Comments
Show 7 older comments Hide 7 older comments

0 Comments
Show -2 older comments Hide -2 older comments

Categories

Tags

Community Treasure Hunt

Inconsistent matrix multiplication results

1 Comment Show -1 older comments Hide -1 older comments

Answers (2)

9 Comments Show 7 older comments Hide 7 older comments

0 Comments Show -2 older comments Hide -2 older comments

Categories

Tags

See Also

Community Treasure Hunt

1 Comment
Show -1 older comments Hide -1 older comments

9 Comments
Show 7 older comments Hide 7 older comments

0 Comments
Show -2 older comments Hide -2 older comments