Convert file encoding before opening

21 views (last 30 days)
Peter on 28 Aug 2020
Commented: Walter Roberson on 31 Aug 2020
Is there any way to convert the encoding of a text or csv file in MATLAB? Very often my data files get saved with an inconsistent encoding. I'm not sure exactly what causes this, but certain machines will save them with something other than UTF-8 encoding (such as UTF-8 BOM or UCS-2 LE BOM). MATLAB is not able to interpret the file correctly with most other encodings.
I can change the encoding very easily using Notepad++ before importing the file using MATLAB. The problem is that if I edit and save the file, and then try to reimport, the encoding often reverts. This also happens if I create a new data set and forget to switch the encoding before importing. I'd like to be able to make my import script just convert the file every time, so that I don't get errors if I forget to manually switch the encoding for each file first.
Walter Roberson
Walter Roberson on 31 Aug 2020
I investigated, and the only way I could figure out to distinguish between UTF16LE BOM and UCS-2 LE BOM, was to look for invalid surrogate pairs. Surrogate pairs would only be used for UTF16 in the case that the code point was 0xD800 to 0xDFFF or 0x10000 or above. Is that realistic?

Sign in to comment.

Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!