MATLAB Answers

34

TUTORIAL: Why Variables Should Not Be Named Dynamically (eval)

Asked by Stephen Cobeldick on 26 Sep 2016
Latest activity Commented on by Walter Roberson
on 21 Aug 2019 at 22:47
Sometimes new coders (and some self-taught professors) think it would be a good idea to dynamically create or access variable names, the variables are often named something like these:
  • matrix1, matrix2, matrix3, matrix4, ...
  • test_20kmh, test_50kmh, test_80kmh, ...
  • nameA, nameB, nameC, nameD,...
There are several good reasons why dynamic variable names should be avoided, in particular (click the links to jump to the "answers" below):
There are much better alternatives to accessing dynamic variable names:
Usually some simple and efficient indexing is the key! It is important to realize that putting sequential numbers into variable names is pseudo indexing, which would be simpler and more efficient if it were turned into real indexing.
If you are not interested in reading the answers below then at least read MATLAB's own documentation on this topic Alternatives to the eval Function, which states "A frequent use of the eval function is to create sets of variables such as A1, A2, ..., An, but this approach does not use the array processing power of MATLAB and is not recommended. The preferred method is to store related data in a single array." Note that all of these problems and disadvantages also apply to functions load (without an output variable), assignin, evalin, and evalc, and the MATLAB documentation explicitly recommends to "Avoid functions such as eval, evalc, evalin, and feval(fname)".
The official MATLAB blogs explain why eval should be avoided, the better alternatives to eval, and clearly recommend against magically creating variables. Using eval comes out at position number one on this list of Top 10 MATLAB Code Practices That Make Me Cry. Experienced MATLAB users recommend avoiding using eval for trivial code, and have written extensively on this topic.
Note that avoiding eval (and assignin, etc.) is not some esoteric MATLAB restriction, it also applies to many other programming languages as well:

  6 Comments

Well I know little about the eval function and I will take it from you that it is something bad to use for this purpose. That said, being able to input the name of an existing file or object that you want to access or read in is not some wacko desire. This is pretty standard with programming languages. There may be reasons why MATLAB does not support this, but it is not a strange or clearly crazy thing to want to do.
"being able to input the name of an existing file" that's no problem with Matlab.
filespec = 'c:\tmp\the_name_of_my_file.csv';
str = fileread( filespec );
and
fid = fopen( filespec', 'r' );
...
"...being able to input the name of an existing file..."
is really easy and does not require magically accessing variable names.
"This is pretty standard with programming languages."
Indeed many languages support string evaluation... and yet interestingly experts and experienced users of most of those languages consider this to be an extremely inefficient and insecure practice that should be avoided. Perhaps you missed reading this thread, which gives links to some of those discussions:
"There may be reasons why MATLAB does not support this,.."
MATLAB does support this. Why do you think it doesn't? In fact the main function that is used for this is currently mentioned 75 times on this thread, something that readers of this thread are unlikely miss.
"...but it is not a strange or clearly crazy thing to want to do"
I totally agree: I don't think that it is a strange or crazy thing to want to do. But it is important to note that wanting to do something is not at all synonymous with it being a good way to write code: there are certainly much simpler and more efficient methods of writing better code, which waste less time (coding time, debugging time, run time). The fact that magically accessing variables forces beginners to write slow, complex, buggy code, and is easily avoided by better, simpler code and/or data design, is not changed by how much beginners might want to do it.
We often get questions from beginners who want to calculate all permutations of a set of numbers, or want use some other highly inefficient or numerically unstable algorithm: just because they want to do these things does not mean that their approach is going to be efficient, or produce any sensible output, or even be tractable at all. In these cases we would show them better ways of solving their problem (if possible), just like this thread does. The reality that some code is slower, buggier on average, harder to debug, and much less efficient is not changed by you want.

Sign in to comment.

19 Answers

Answer by Stephen Cobeldick on 26 Sep 2016
Edited by Stephen Cobeldick on 11 Dec 2017
 Accepted Answer

  2 Comments

The 2nd link, to the newsreader, is dead. (Long live the newsreader!)

Sign in to comment.


Answer by Stephen Cobeldick on 26 Sep 2016
Edited by Stephen Cobeldick on 18 Apr 2019

Slow
The MATLAB documentation Alternatives to the eval Function explains that code that uses eval is slower because "MATLAB® compiles code the first time you run it to enhance performance for future runs. However, because code in an eval statement can change at run time, it is not compiled".
MATLAB uses JIT acceleration tools that analyze code as it is being executed, and to optimize the code to run more efficiently. When eval is used the JIT optimizations are not effective, because every string has to get compiled and run again on every single iteration! This makes loops with eval very slow. This is also why not just creating variables with dynamic variable names is slow, but also accessing them.
Even the eval hidden inside of str2num can slow down code:

  0 Comments

Sign in to comment.


Answer by Stephen Cobeldick on 26 Sep 2016
Edited by Stephen Cobeldick on 6 Aug 2018

Buggy
Using eval makes it really hard to track down bugs, because it obfuscates the code and disables lots of code helper tools. Why would beginners even want to use a tool that makes it harder for them to fix their code?
Here are some examples to illustrate how what should have been simple operations become very difficult to debug because of the choice to use eval:
This quote sums up debugging eval based code: "I've never even attempted to use it myself, but it seems it would create unreadable, undebuggable code. If you can't read it and can't fix it what good is it?" Note that eval's equally evil siblings evalc, evalin and assignin also make code slow and buggy:

  0 Comments

Sign in to comment.


Answer by Stephen Cobeldick on 26 Sep 2016
Edited by Stephen Cobeldick on 28 Feb 2018

Security Risk
eval will evaluate any string at all, no matter what commands it contains. Does that sound secure to you? This string command might be malicious or simply a mistake, but it can do anything at all to your computer. Would you run code which could do anything at all to your computer, without knowing what it was about to do?
For some beginners the surprising answer is "yes please!".
Beginners will happily run code that could do anything at all with their computer. For example, try running this (taken from Jos' answer here):
eval(char('fkur*)Ykvj"GXCN"{qw"pgxgt"mpqy"yjcv"jcrrgpu0"Kv"eqwnf"jcxg"hqtocvvgf"{qwt"jctfftkxg"000)+'-2))
Did you really run it on your computer even though you had no idea what it would do? Every time beginners write code that gets a user input and evaluates it they give that user the ability to run anything at all. Does that sound secure to you?

  0 Comments

Sign in to comment.


Answer by Stephen Cobeldick on 26 Sep 2016
Edited by Stephen Cobeldick on 4 May 2019

Difficult to Work With
Many beginners come here with questions that are basically "I have lots of numbered variables but I cannot figure out how to do this simple operation...", or "my code is very slow/complex/...":
Even advocates of eval get confused by it, fail to make it work properly, and can't even figure out why, as these two examples clearly show:
Why can't they figure out why it does not work?:
  • Totally obfuscated code. Indirect code evaluation.
  • The code helper tools do not work.
  • Syntax highlighting does not work.
  • Static code checking does not work.
  • No useful error messages, etc. ,etc

  0 Comments

Sign in to comment.


Answer by Stephen Cobeldick on 26 Sep 2016
Edited by Stephen Cobeldick on 6 Jan 2019

Code Helper Tools do not Work
The MATLAB editor contains many tools that advanced users continuously make use of, and beginners should particularly appreciate when learning to use MATLAB (tip: learn to use them!). However none of these tools work with code that uses eval.
Note that these do not work when using eval, evalc, etc. to magically create or access variable names. Would you want to disable the tools that help you to write functioning code? Here are examples of how eval hides code errors and makes it hard to debug code:

  1 Comment

On this topic, it would be great if the Code Analyzer and checkcode would actually flag a warning when eval etc. are used. Perhaps that would cut down the number of questions about them on here?

Sign in to comment.


Answer by Stephen Cobeldick on 26 Sep 2016
Edited by Stephen Cobeldick on 4 Feb 2018

Obfuscated Code Intent
What does this code do?:
x1 = [119,101,98,40,39,104,116,116,112,58,47,47,119,119,119];
x2 = [46,121,111,117,116,117,98,101,46,99,111,109,47,119,97];
x3 = [116,99,104,63,118,61,100,81,119,52,119,57,87,103,88];
x4 = [99,81,39,44,39,45,98,114,111,119,115,101,114,39,41];
eval(char([x1,x2,x3,x4]))
Unfortunately eval makes it easy to write code which is hard to understand: it is not clear what it does, or why. If you ran that code without knowing what it does, you should know that it could have deleted all of your data, or sent emails to all of your contacts, or downloaded anything at all from the internet, or worse...
Because eval easily hides the intent of the code many beginners end up writing code that is very hard to follow and understand. This makes the code buggy, and also harder to debug! See these examples:
Properly written code is clear and understandable. Clear and understandable code is easier to write, to bug-fix, and to maintain. Code is read more times than it is written, so never underestimate the importance of writing code that is clear and understandable: write code comments, write a help section, use consistent formatting and indentation, etc.

  3 Comments

I remember the newsreader thread you got this from!
You're never going to give up on this example, are you?
I just encountered someone using num2str() on a computed variable name, in order to have the effect of an eval() without using eval() directly in the code. This is, needless to say, obscure intent.

Sign in to comment.


Answer by Stephen Cobeldick on 26 Sep 2016
Edited by Stephen Cobeldick on 8 Jul 2018

Alternative: Indexing into Cell Array or ND-Array
Oftentimes when a user wants to use eval they are trying to create numbered variables, which are effectively an index joined onto a name. So then why not simply turn that pseudo-index into a real index: MATLAB is very fast and efficient when working with indices, and using indices will make code much much simpler than anything involving dynamic variable names:
Using ND-arrays is a particularly efficient way of handling data: many operations can be performed on complete arrays (known as code vectorization), and ND-arrays are easy to get data in and out of, and reduces the chance of bugs:
Or simply put the data into the cells of a cell array:
And some real-world examples of where indexing is much simpler than eval:

  0 Comments

Sign in to comment.


Answer by Stephen Cobeldick on 26 Sep 2016
Edited by Stephen Cobeldick on 17 Apr 2019

Alternative: load into a Structure, not into the Workspace
In almost all cases where data is imported programmatically (i.e. not just playing around in the command window) it is advisable to load data into an output argument (which is a structure if the file is a .mat file):
S = load(...);
The fields of the structure can be accessed directly, e.g:
S.X
S.Y
or by using dynamic fieldnames. Note that this is the inverse of saving the fields of a scalar structure.
It is important to note that (contrary to what many beginners seem to think) it is actually much easier within a loop to save and load data when the variable names in the .mat files do not change, as having to process different variable names in each file actually makes saving/loading the files much harder.
Summary: when using a loop, keep all variable names the same!
Here are real-world examples of loading into variables:
And finally Steven Lord's comment on load-ing straight into the workspace:

  0 Comments

Sign in to comment.


Answer by Stephen Cobeldick on 26 Sep 2016
Edited by Stephen Cobeldick on 4 Mar 2019

Confuses Data with Code
The inclusion of data and meta-data within variable names (e.g. naming a variable with the user's input, the name of a test subject, or (very commonly) adding an index onto a variable name) is a subtle (but closely related) problem, and it should definitely be avoided.
The problem with including meta-data in variable names is that this breaks the idea of code being generalized, because it mixes the code and data together. In fact data and code should be kept separate to keep the code generalized. Code that is written to be as general as possible is simpler, more robust, more adaptable, easier to write, and easier to fix, which in turn makes code much less buggy. Mixing meta-data into variable names really just makes everything much more complicated, and this in turn makes code slow and buggy.
Read these discussions for an explanation of why it is a poor practice to put data and meta-data in variable names:
In many cases that meta-data is just a de-facto index, i.e. a value that proscribes the order of the data. But in that case the de-facto index should be turned into a much more efficient real numeric index:

  0 Comments

Sign in to comment.


Answer by Stephen Cobeldick on 26 Sep 2016
Edited by Stephen Cobeldick on 31 Jan 2017

  0 Comments

Sign in to comment.



Answer by Stephen Cobeldick on 26 Sep 2016
Edited by Stephen Cobeldick on 18 Dec 2017

Other Languages: do not use eval!
In case you think that avoiding dynamic variable names is just some "weird MATLAB thing", here is the same discussion for some other programming languages, all advising "DO NOT create dynamic variable names":
Some languages might use, require, or otherwise encourage dynamic variable names. If that is how they work efficiently, then so be it. But what is efficient in one language means nothing about the same method in other languages... if you wish to use MATLAB efficiently, make your code easier to work with, and write in a way that other MATLAB users will appreciate, then you should learn how to use MATLAB features and tools:

  0 Comments

Sign in to comment.


Answer by Stephen Cobeldick on 26 Sep 2016
Edited by Stephen Cobeldick on 31 Jan 2017

Alternative: Use more Efficient Ways to Pass Variables between Workspaces (applies to evalin, assignin, etc)
Use nested functions, or pass arguments, or use any of the other efficient ways to pass data between workspaces:

  0 Comments

Sign in to comment.


Answer by Stephen Cobeldick on 19 Jul 2017
Edited by Stephen Cobeldick on 6 Jan 2019

Magically Making Variables Appear in a Workspace is Risky
For a start those variables will be overwritten without warning, leading to hard-to-find bugs. But there is also a more serious yet subtle problem, which is caused by the MATLAB parser finding alternative functions/objects/... and calling those instead of using the magically-created variable: basically if the variable does not exist then the parser does it best to find something that matches where the name is called/used later... and it might just find something! For example:
The solution is simple: do not magically "poof" variables into existence: Always load into a structure, and never create variable names dynamically.

  6 Comments

Thanks to both of you for the very prompt responses! :)
I did not know that Matlab was not purely interpreted; knowing this clarifies the issue with dynamic names. Truth be told, I never use dynamic names in my code (because of some of the reasons listed in this thread), and would never recommend that anyone did. But I find edge-cases interesting.
Regarding potential impacts on performance, it seems to me that dynamic names can only ever occur due to an `eval`, `evalin` or `load` operation, is that right? I think it is fair to say that none of these functions should be used in a portion of code that is performance-critical. Though I am not one to give advice about language design, would it be possible to "re-parse" the code following such operations as they occur, in order to update the name resolution? This would badly affect performance when they are used, but would leave other situations unaffected, wouldn't it?
assignin and evalc also belong to that list. I can't think of any others right now.
The reparsing that you describe is most likely what was happening is past versions. But at least for load, it's clear that mathworks are moving away from that design and are not interested in supporting implicitly defined variables. I.e.
function myfunc()
load data.mat %creates a x variable
disp(x(:));
end
no longer works (or will no longer work). See R2017a release notes.
I'm fine with that. If the optimiser has to detect whether or not it can optimise or must leave it later to reparse will have an impact on performance for both cases since it must do that detection.
There is currently some name re-resolution being done whenever the MATLAB path changes, including when you cd() -- which is a reason to avoid cd() in code.
In the past, I put some thought into the kinds of structures you would have to put in place in order to handle that situation efficiency . I did not follow it through, though; just some thought experiments.

Sign in to comment.


Answer by Stephen Cobeldick on 30 Nov 2017
Edited by Stephen Cobeldick on 26 Oct 2018

PS: eval is Not Faulty:
Some beginners apparently think that eval (and friends) must be faulty and should be removed from MATLAB altogether. They ask "if eval is so broken, why has it not been removed?"... but it is important to understand that the problem is caused by magically accessing variable names regardless of what tool or operation is used, and that eval (or assignin, or evalin, or load without an output argument, etc.) is simply being used inappropriately because there are much better methods available ( better in the sense faster, neater, simpler, less buggy, etc). Read these discussions for good examples of this confusion:
It is important to note that any feature of a language can be used inefficiently or in an inappropriate way, not just eval, and this is not something that can be controlled by the language itself. For example, it is common that someone might solve something with slow loops and without preallocating the output arrays: this does not mean that for loops are "faulty" and need to be removed from MATLAB!
It is up to the programmer to write efficient code.

  0 Comments

Sign in to comment.


Answer by Econstudent on 17 Jan 2017

You discuss at length why we shouldn't A, B or C and you also comment on how we could access certain objects.
Now, suppose we need to import a few time series -- but I can only import those series one at a time. The intention behind creating a sequence of variables inside a loop is often to store those time series in distinct object every time. That is, you want to assign the data to a different object every time and do it considerably more than once...
What other choice do you have besides creating objects within your loop?

  2 Comments

"What other choice do you have besides creating objects within your loop?"
All of the choices that are explained above: cell arrays, structures, ND numeric arrays. And with newer MATLAB versions also tables, strings, datetime, etc. All of these allow you to "import those series one at a time", and use indexing (or fieldnames, etc) to put that data into one variable/object. Simple.
"The intention ... is often to store those time series in distinct object every time"
And that is the bad design decision right there.
@Econstudent: Did you read this thread carefully? If your "sequence of variables" mean, that the variables are a sequence, they have a relation. Then this relation should be mapped in the code by collecting them in one array. An efficient and logic representation of the data is essential for good code, but it remains the art of programming.

Sign in to comment.


Answer by Stephen Cobeldick on 17 Apr 2019
Edited by Stephen Cobeldick on 17 Apr 2019

Alternative: save the Fields of a Scalar Structure
The save command has an option for saving the fields of a scalar structure as separate variables in a .mat file. For example, given a scalar structure:
S.A = 1;
S.B = [2,3];
this will save variables A and B in the .mat file:
save('myfile.mat','-struct','S')
This is the inverse function of loading into a structure. Some threads showing how this can be used:

  0 Comments

Sign in to comment.


Answer by Steven Lord
on 30 Apr 2019
Edited by Steven Lord
on 30 Apr 2019

Alternative: Use a table or timetable Array
table (introduced in release R2013b) and timetable (introduced in release R2016b) arrays allow you to store data with row and/or column names with which you can access the data. For example, if you create a table with variables named Age, Gender, Height, Weight, and Smoker and rows named with the last names of the patients:
load patients
patients = table(Age,Gender,Height,Weight,Smoker,...
'RowNames',LastName);
you can ask for all the ages of the first five patients:
patients(1:5, 'Age')
or all the data for the patients with last names Smith or Jones:
patients({'Smith', 'Jones'}, :)
You can also add new variables to the table, either by hard-coding the name of the variable:
% Indicate if patients are greater than five and a half feet tall
patients.veryTall = patients.Height > 66
or using variable names stored in char or string variables. The code sample below creates new variables named over40 and under35 in the patients table using different indexing techniques.
newname1 = 'over40';
patients.(newname1) = patients.Age > 40;
newname2 = 'under35';
patients{:, newname2} = patients.Age < 35;
patients(1:10, :) % Show the first ten rows
The code sample below selects either Height or Weight and shows the selected variable for the fifth through tenth patients using dynamic names.
if rand > 0.5
selectedVariable = 'Height';
else
selectedVariable = 'Weight';
end
patients.(selectedVariable)(5:10)
See this documentation page for more information about techniques you can use to access and manipulate data in a table or timetable array. This documentation page contains information about accessing data in a timetable using the time information associated with the rows.

  1 Comment

Simpler way to generate a table from that .mat file:
S = load('patients.mat');
T = struct2table(S,'RowNames',S.LastName);

Sign in to comment.