Using regexprep to clean up MATLAB code formatting

12 views (last 30 days)
DGM
DGM on 31 Jan 2022
Commented: DGM on 31 Jan 2022
I was trying to put together something to fix operator spacing in a bunch of old .m files. I'm reducing this problem example to simply adding spaces around instances of = and ==. I want to ignore matches within quotes, but I realized that transposition operators on the same line mess up any sort of lookahead/lookbehind quote-counting approach I can think of.
Is there even a good way to deal with this using regex? Is there some sort of code formatting tool that I can use to accomplish this instead?
intext = sprintf(['don''t pad \n█ = █\n█ = █\n %% █=█\n''█=█''\n''█=█'' A.''\nA.'' ''█=█''\n' ...
'add pad\n█=█ A.''\nA.'' █=█\n█=█\n█==█']);
% only operate on uncommented lines
alltext = split(intext,newline);
ncom = cellfun(@isempty,(regexp(alltext,'^\s*%.*','match')));
niq = '(?=([^'']*''[^'']*'')*[^'']*$)'; % not in single quotes
alltext(ncom) = regexprep(alltext(ncom),['(?<=[^~=<>\s])=' niq],' ='); % rhs of = or ==
alltext(ncom) = regexprep(alltext(ncom),['=(?=[^=\s])' niq],'= '); % lhs
[split(intext,newline) alltext]
ans = 12×2 cell array
{'don't pad '} {'don't pad ' } {'█ = █' } {'█ = █' } {'█→=→█' } {'█→=→█' } {' % █=█' } {' % █=█' } {''█=█'' } {''█=█'' } {''█=█' A.'' } {''█ = █' A.''} {'A.' '█=█'' } {'A.' '█=█'' } {'add pad' } {'add pad' } {'█=█ A.'' } {'█=█ A.'' } {'A.' █=█' } {'A.' █ = █' } {'█=█' } {'█ = █' } {'█==█' } {'█ == █' }
I'm pretty much an absolute novice with regex, and this tool is likely only going to be used once, so I'm avoiding making the regex more complicated than I can understand well enough to have confidence in it. To that end, I'm simply using masking to ignore commented lines.
  7 Comments
DGM
DGM on 31 Jan 2022
@Star Strider Yeah. Disregarding the inelegance of the kludge I've made so far, I can deal with the pre-spaced cases. It's the exclusion of operators within quoted substrings that I'm struggling with.
I decided to flag lines containing both quotes and targeted operators so that they can be reviewed. Since I don't have one guaranteed safe way to handle such lines, I can just present the user (me) with the option to quickly select from multiple format attempts with the option to discard all attempts and manually edit the line.
After a bit of observation, the vast majority of such cases present identifiable patterns and can be handled programmatically without prompting. The majority of remaining cases can be reviewed with a single keystroke. Out of about 100k lines, it took me about 30 minutes to grind through all the files.
I feel bad about taking the "avoidance for dummies" route, but the last thing I need is another project of the scale that a proper solution would require. Still, I can't say avoidance isn't a learning experience. The lesson here is to do a better job of formatting to begin with.
DGM
DGM on 31 Jan 2022
@Stephen For what it's worth, I did check out fparser(). While I never managed to get it to run without dumping errors, It had some useful bits in it. I'm guessing some things just broke since it's been unmaintained for so many years.

Sign in to comment.

Answers (0)

Categories

Find more on Programming in Help Center and File Exchange

Tags

Products


Release

R2019b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!