Technical Specification for Format Specification for fprintf, sprintf, sscanf, etc.

Question

Earl DeShazer on 3 Sep 2024

0
Link

Direct link to this question

https://se.mathworks.com/matlabcentral/answers/2149669-technical-specification-for-format-specification-for-fprintf-sprintf-sscanf-etc

Moved: Voss on 10 Sep 2024

parametricFmt.m

The documentation provides a good overview for someone looking to understand how to use format specifications for reading and writing formatted strings. However, there are details and nuances that are not made clear at all in the documentation. Rather than perform an parametric exploration of valid format specs. I would prefer a tech spec. As an example of things I could not infer the validity of given the online documenation.

How many flags may appear together? I had thought only one and then decided to explore:

Here are some that I tried:

>> parametricFmt
Command: aString = sprintf("%#5.0f", 25);
<edge>  25.</edge>
Command: aString = sprintf("%#-5.0f", 25);
<edge>25.  </edge>
Command: aString = sprintf("%#+5.0f", 25);
<edge> +25.</edge>
Command: aString = sprintf("%#+05.0f", 25);
<edge>+025.</edge>
Command: aString = sprintf("%#+005.0f", 25);
<edge></edge>
Command: aString = sprintf("%#+05.0x", 25);
<edge> 0x19</edge>
Command: aString = sprintf("%#+5.0x", 25);
<edge> 0x19</edge>
Command: aString = sprintf("%+5.0x", 25);
<edge>   19</edge>
Command: aString = sprintf("%-5.0x", 25);
<edge>19   </edge>
Command: aString = sprintf("%#-5.0x", 25);
<edge>0x19 </edge>
Command: aString = sprintf("%#-05.0x", 25);
<edge>0x19 </edge>
Command: aString = sprintf("%#- 05.0x", 25);
<edge>0x19 </edge>
Command: aString = sprintf("%#+ 05.0x", 25);
<edge> 0x19</edge>
Command: aString = sprintf("%#+ 015.0x", 25);
<edge>           0x19</edge>
Command: aString = sprintf("%#+ 15.0x", 25);
<edge>           0x19</edge>
Command: aString = sprintf("%#+015.0x", 25);
<edge>           0x19</edge>
Command: aString = sprintf("%#+-015.0x", 25);
<edge>0x19           </edge>
Command: aString = sprintf("%#+#-015.0x", 25);
<edge></edge>

Any guidance would be appreciated.

Kind regards,

Will

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

dpb on 3 Sep 2024

0
Link

Direct link to this answer

https://se.mathworks.com/matlabcentral/answers/2149669-technical-specification-for-format-specification-for-fprintf-sprintf-sscanf-etc#answer_1510014

Edited: dpb on 3 Sep 2024

Mathworks does not publish their internal specifications other than the documentation. One can submit service requests for clarification and/or bug reports and sometimes documentation will be clarified/expanded as a result.

As noted in the fprintf doc References section, Matlab formatted i/o is based on C standard library functions printf and scanf; the particular references still refer to the old K&R C and the 1989 ANSI C. Matlab is compiled with modern compilers which will be based on recent C/C++ standards, but since it is not directly the C i/o standard library functions that are being used by Matlab but an internal version that has been modified/vectorized to handle array inputs that the C stdio library cannot, updates to recent standards are not included and the pertinent documentation for format operators would still be the older references.

The <documentation> mentions that more than one can be used at a time although it is only a note in a top level section on formatting and is not repeated in the specific formatting specifier section linked to from the function descriptions. That is surely an oversight and should be amended to do so. If one already knows something about C, one would already be aware of that; if one doesn't have that prior knowledge, I agree it isn't made obvious without a fair amount of digging to find that out.

AFAICT, in the C language documetation outside MATLAB it is also only explicitly specified that zero or more of the flags can be used; I believe the implication is that one would not duplicate the same flag multiple times but I do not find that explicitly stated.

<A Linux printf man page> states "The character % is followed by zero or more of the following flags:" which is reworded in the MATLAB documentation as "You can specify more than one flag in a formatting operator".

The "go to" reference is <P J Plauger>, but I don't have a copy at hand to see if he mentions in the text a limitation.

My personal interpretation/opinion is that more than one of the same flag character is undefined and the compiler can do anything, but I am not a member of the Standard committee so my opinions are only that. :) There was a <Stack Overflow thread> on the subject a number of years ago with varying opinions and some example behavior of some compilers; some warned, others didn't.

At the time referenced in MATLAB documentation, the standard library was not part of the C Standard, and so wasn't covered explicitly; undoubtedly The Mathworks is still maintaining/upgrading the same base code they started with back then with whatever behavior it takes, but I don't believe it is more explcitly documented anywhere, other than what Plauger may say--but TMW won't be using his implementation, regardless, so that would only be what that and later versions may be required to do, not what the MATLAB implementations do.

4 Comments
Show 2 older commentsHide 2 older comments

Earl DeShazer on 3 Sep 2024

This is a great summary. Thank you so much for putting a lot of detail into your investigation into the expected behavior. The motivation for me to look into this was that I had intended to write my own regexp toolkit to validate valid format Specs and the first thing that happened was I noticed the divide between Matlab's proprietary fprintf and the open standard. Immediately, I wanted to find a more authoritative document than the online help. That kind of documentation is often geared towards onboarding type of literature. Very, very useful, but not helpful when one wants to assess if something is right or wrong. :-)

Separately, but very much related for me, the AI revolution has increased the popularity of regular expressions and tokenization, but I have also noticed that a lot of software is applying pattern recognitions that are not robust even in commercial grade software. I have a long history with Regular Expressions. Instead of advancing, we as a community might be losing some of our chops when it comes to writing robust patterns. Also, I have been vetting the Regexp Toolbox and want to feel confident with it.

As a compliment, the treatment of strings vs chars has become much, much better and I am really happy about that. I've always thought that only computer scientists think of strings as arrays of chars. Everyone else thinks of them as atomic things., so having that formalized is pretty cool.

Thanks for looking into it. I'll keep my eye out for more documentation. If you need me to file anything anywhere, I would be happy to. Until then, have a good one.

Will

dpb on 3 Sep 2024

It really wasn't a lot of time/research; I was already aware of the content, I just wrote it down and looked up a couple of links. The only thing new I hadn't seen before was the Stack Overflow thread.

With Mathworks, since MATLAB is a proprietary product, the user documentation is all that is ever going to get published; it's not their job to make the language definition available publicly although I have been known to also complain that it is such that the doc is not a definition.

As for regexp, anybody who is a guru there is a magician in my view; I can't even make simple expressions work, what more anything robust!

Earl DeShazer on 9 Sep 2024

Moved: Voss on 10 Sep 2024

@Steven Lord Thank you for the reference. I really have enjoyed the regular expression tools. They are rich. As with everyone's implementation's of Regex, there are differences that require poking around, and there have been some differences from Perl that I had to work through, but overall this is a job well done. Also, I want to say, I never realized how powerful char arrays were unitl this recent go around. One doesn't need (?x) because on can just string some chars together. I really enjoy that.

@dpb I realize now your perspective on Matlab's willingness to share a spec and that you are like me a user of Matlab and not a developer. That said, I know a few people over there and they are generally quite reasonable. IMHO, a spec is not an implementation document, it is a contract with the customer or between team members on what something should do or how it should behave. Without clear specifications, one (a customer) is at risk using the software. So hopefully, more detailed info will be shared. Also, I want it written that what I mean as spec and what someone else means as spec may be different. To me, implementation is proprietary; behavior, on the other hand, is the face one has to the public. How could that be propietary when one could just exercise it and see what it does. The real question is how much money does it take to provide that level of support. However, probably less than getting stupid questions. :-)

Thank you both for your responses.

Cheers

Sign in to comment.

Technical Specification for Format Specification for fprintf, sprintf, sscanf, etc.

0 Comments
Show -2 older commentsHide -2 older comments

Answers (1)

4 Comments
Show 2 older commentsHide 2 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

Technical Specification for Format Specification for fprintf, sprintf, sscanf, etc.

0 Comments Show -2 older commentsHide -2 older comments

Answers (1)

4 Comments Show 2 older commentsHide 2 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

4 Comments
Show 2 older commentsHide 2 older comments