Get all used variable names from a script

33 views (last 30 days)
Robert
Robert on 7 May 2021
Commented: Jan on 8 May 2021
As in the check "Check usage of restricted variable names" I want to check the names of variables used in a script, only against our more explicit naming conventions. But using symvar also returns keywords like "function", "if" or "end" and also, what is much worse, any word found in comments and even "-delimited strings. Is there any function that can return me all variable names used in a script file or string, but nothing else?
Or to be a bit more precise, as Stephen Cobeldick correctly hinted to the dynamic execution nature of scripting languages: variable names, that are explicitly used in a function header as input or output variables (not varargin, varargout), and variable names explicitly used as left hand arguments in assignments like a = <some expression> or [a, b] = <expression>. That certainly would be sufficient, as the execution context here is eml, so apart from local variables data flow is pretty much under control with signal i/o and data store memory requiring registration as Stateflow.Data objects.
  1 Comment
Stephen23
Stephen23 on 7 May 2021
Edited: Stephen23 on 7 May 2021
"Is there any function that can return me all variable names used in a script file or string, but nothing else?"
No.
Variables can be created dynamically, even by functions called from your script/function (or functions that they call...). Function scope can also change dynamically, so which functions get called can also change (or even deciding if something is a function or a variable). Only actually running the code can resolve this stack: static code analysis is not sufficient.
It might be possible to provide an "estimate" based on static code analysis, but on the understanding that it can diverge from what variables are "used" when the code is actually run.

Sign in to comment.

Answers (1)

Jan
Jan on 7 May 2021
Edited: Jan on 7 May 2021
It is hard to parse the code exhaustively for names of variables:
  • Mask strings and char's. This is not trivial:
'"asd"', '"asd', "'asd'", "'asd", "asd"', 'asd''', ...
  • Recognize and remove comments. This inlcudes block comments between %{ and %} as well as "..." .
  • Distinguish the creation of indexed variables from function calls:
f(1);
f(1) = 0;
v = f(1);
v = f ...
(1);
  • Cope with eval, evalin, assignin
  • If you are talking of scripts instead of functions, it is hard to identify if sum(1:5) means the built-in function or if another script has redefined sum as avariable before.
Maybe the best is to run the code and update a list of variables after each line of code:
function Out = TrackVariables(mFile, Data)
% USAGE:
% If you really want a hardcore debugging:
% 1. TrackVariables('D:\MatlabCodes\yourFcn.m')
% This injects a DBSTOP in each line of the code, which calls the
% function TrackVariables with the output of WHOS as 2nd input.
% You can do this for multiple functions at the same time.
% 2. Call yourFcn() or the main routine.
% After each line the output of WHOS is forwarded to TrackVariables and
% the names are stored persistently. If you want, you can expand this
% to store the sizes or types of the variables also.
% 3. Request the collected data by:
% List = TrackVariables();
% 4. Clean up brutally:
% dbclear all
%
% This is NOT a recommendation for using this function to control the
% quality of code, but a brute hack only. If you can identify a
% miss-spelled variable, it was useful.
% Advantage: It tracks even the evil dynamic creation of variables.
% Limitations: The code execution is slowed down. It tracks only branches
% of the code, which actually run, so this might remain invisible:
% if rand < 0.001; KILLER = 17; end
%
% Use MLINT for a smart code analysis.
%
% (C) 2021, Jan, Heidelberg, License: CC BY-SA 3.0
persistent List
if isempty(List)
List = struct();
end
switch nargin
case 1 % Inject a dbstop in each line:
[~, mName] = fileparts(mFile);
Cmd = sprintf('TrackVariables(''%s'', whos)', mName);
Str = strsplit(fileread(mFile), '\n');
for k = 1:numel(Str)
if ~isempty(Str{k})
dbstop('in', mName, 'at', sprintf('%d', k), 'if', Cmd)
end
end
List.(mName) = {};
case 2 % Called for collecting variables:
List.(mFile) = unique(cat(2, List.(mFile), {data.name}));
Out = false; % Do not stop the debugger
case 0 % Flush the list:
Out = List;
List = [];
end
end
Call this as:
TrackVariables('YourFunc.m');
YourFunc % Or the main program
List = TrackVariables;
This does not consider, if the variable is created in subfunctions or nested functions.
I do not trust such meta-programming techniques. Exhaustive unit-testing is more powerful. Most of all, avoid scripts, if you need reliable code.
  6 Comments
Robert
Robert on 7 May 2021
Edited: Robert on 7 May 2021
Hi Jan, what code do you mean? The C-mex code of my parser to come? I'd tink I'd publish that. But generally my target is to identify any explicit variable name as described in my reply to Stephen Cobeledick's comment above. The object might be any code liable to be typed into an eml-function-block. My parsing-mex should 'mask' (or rather eliminate) all occurrences of comments and strings. Anyways, if there is no means of identifying explicit variable names as described before by some API-function, I'll stick to my own implementaion and will let you know, when I'm at some point of publishing (if you're interested).
Jan
Jan on 8 May 2021
I meant a parser, which I have written as M-function.

Sign in to comment.

Products


Release

R2018b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!