Why does MATLAB differentiate between strings and character array data types?
52 views (last 30 days)
Show older comments
I am curious why this language makes a distinction between strings and arrays of characters. In python eg these are effectively the same data type and array opperations can be performed on strings, which are just treated as an ordered series of characters. The distinction seems arbitrary and annoying to me since after all, words and phrases are, logically speaking just an ordered series of letters, so why not always treat them as such and eliminate an unnecessary data type and the type conversion required?
1 Comment
Stephen23
on 12 Aug 2022
Edited: Stephen23
on 12 Aug 2022
"In python eg these are effectively the same data type ... "
No, they are not. The approximate equivalent of MATLAB's string array (or even a cell array of character vectors) would be in Python an iterable container of strings, e.g. a list or tuple of strings. Not the same thing at all.
Python's str type is a iterable vector of characters, something a bit like a char row vector in MATLAB. However instances of the str type immutable, thus the ability to manipulate character codes in arrays like in MATLAB does not exist in Python.
"The distinction seems arbitrary and annoying to me since after all, words and phrases are, logically speaking just an ordered series of letters..."
Not all text consists of "words and phrases". I often perform operations on character arrays using basic array operations (i.e. indexing, arithmetic, permutation, etc) to manipulate the character codes themselves, which would be far more complex if only the string class was available. In other situations having an "atomic" string is more useful. I appreciate having the choice between manipulating characters vs strings, for me this is a very useful distinction.
Assuming that your use case of "words and phrases" applies to all other users is unlikely to help understand why other users might find both of those classes useful. One of the neat things about MATLAB is that numeric/char/logical arrays are contiguously stored in memory, and some users appreciate the ability to manipulate such arrays via a high-level language. Understanding the fundamental differences between char and string arrays would go a long way to appreciating when they can be used effectively.
Accepted Answer
Rik
on 12 Aug 2022
Edited: Rik
on 12 Aug 2022
Backwards compatibility.
The string data type was introduced a few years ago (in R2016b) to introduce new features. Mathworks chose to use a new syntax to define strings, instead of extending the char data type.
Note that char is not really the equivalent of string: a cell array of char vectors is. You can easily convert to that with the cellstr function. I you were really determined, you could make a custom class implementing all features string offers.
When you realize string is an extension of cellstr, it makes sense they left char alone.
The main confusing thing is that string, char vector, and cellstr were used interchangeablely before the introduction of the string data type, which is why most properties are still called string, even if they are actually char vectors.
0 Comments
More Answers (1)
Bruno Luong
on 12 Aug 2022
strings is introduced recently and the behavior is not the same than char array, an historic class. So they create a new class to ensure backward compatibility.
See Also
Categories
Find more on Data Type Conversion in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!