Long story short:
- Storing multiple strings as a 2-D char array is lousy, because a list of N strings is not an N-element long array; it's an N-by-MaxStringLength array, so none of the normal Matlab array operations work on it and you can't write generic code against it. And it's inefficient due to space-padding, and bc the characters of a given string aren't even contiguous in memory, because Matlab arrays are column-major but 2-D char arrays are read row-majorly.
- Cellstrs are lousy because they're not type-safe, don't support many standard Matlab operations, and they have the overhead of storing a full mxArray inside each single cell.
So IMHO the new string type is much nicer to use at an M-code level. When you're working on strings, often you'll want to do string-wise operations that treat each element of an array as a full string, instead of exposing the individual characters. E.g. stuff like someArrayOfStrings == "the string I'm looking for". If you really want to do character-wise operations, like concatenating substrings, extracting or replacing individual characters and so on, you can convert your strings to char arrays to do that lower-level work. (Like how Java has separate String and char data types.) E.g. str2 = string(circshift(char(str1))).
And because it's a new, string-specific type instead of being built on cells of chars, it gives Matlab internals to use a more efficient internal data representation and faster implementations of string operations. (Though this is largely yet to be realized; only some string array operations are big wins over the cellstr equivalent, and in some cases they're even slower.)
Not having access to the raw character data of the strings in a MEX file is a huge bummer, though. I didn't realize this was the case. I can see a reason to not have it return raw 16-bit char data with mwGetData because that'd expose internals in a way that would bind Matlab to a particular internal representation forever. (Which they might not want to do, because e.g. they might want to switch string arrays from storing 2-byte UTF-16 char data to 1-byte UTF-8, or even a "flexible width" string format like Python uses, both of which could be significant wins in efficiency (at least for non-Asian text).)
But there really ought to be a way to get at the underlying string data in a C MEX file, at least in a read-only manner! Especially because "string arrays are the way to go now" seems to be MathWorks current strategic position. I looked through https://www.mathworks.com/help/matlab/cc-mx-matrix-library.html and don't see a way to do this. Can a MathWorker comment on this?
I'd like to see something like this back-ported to the C Matrix API, though. Lots of MEX code is still in C. And converting it to C++ is a substantial project: I've found it much harder to write C++ MEX files that are actually fast, compared to C MEX files. Buncha performance gotchas in the C++ MEX/Data API from what I can see.