Reading .txt file in MATLAB with issue in formatting

8 views (last 30 days)
I am using the MATLAB 2021b function readtable to read the following text file:
ISSUERID|FISCAL_YEAR|FIELD_ID|VALUE|PUBLISHED_DATE|SOURCE|DATA_TYPE|ADDITIONAL_INFO
IID000000002137286||DIVERSITY_DISCLOSURE_ETHNICITY_SOURCE|"https://www.cubesmart.com/about-us/corporate-responsibility/\""||{}||{}
The separator is the | (bar) character. Aenter code heres you can see, at the end of the "https://www.cubesmart.com/about-us/corporate-responsibility/\"" field value, there is the following \" character, which messes up the reading. I am trying to use the options 'Whitespace' to ignore it but for some reason it does not work. The code I am running is:
T_equ = readtable(file_name, 'FileType', 'text', 'Delimiter', {'|'}, 'Whitespace', '\"');
where file_name is just the path to the .txt file.
The results of the import is an empty table. I understand this results if the character \" would be read as a special character but from my understanding the 'Whitespace', '\"' pair/value argument should force the readtable function to ignore it. What am I missing here?
  3 Comments
Tulkkas
Tulkkas on 23 Feb 2022
I did try with ouble slash but it does not work either. How would you read the text without interpreting the formatting? And then do the parsing?
Rik
Rik on 23 Feb 2022
For example with my readfile function (which you can get from the FEX or with the AddOn manager), or with the readlines function.
You could use the split function to split based on the | character (or even use regexp).
The result will not be a table yet, but it should be easy to convert it to what you need.

Sign in to comment.

Answers (1)

Jeremy Hughes
Jeremy Hughes on 24 Feb 2022
The issue is that \" is not how CSV files (and thus readtable) escape doube-quotes. To escape quotes, the file should have "".
Like this:
X|Y|Z|"And something in ""quotes""."
Otherwise, readtable will keep reading after \"" until it finds a lone double-quote character. I would guess that's what you're seeing.
The only way I can think to resolve this is by reading the file, and replacing \" with "" then write the data back out. There's no way to get readtable to treat \" as an escaped quote.
text = fileread(fn);
text = replace(text,'\"','""');
fid = fopen(fn,'w'); % or use a new file name if you don't want to overwrite it.
fwrite(fid,text);
fclose(fid);

Products


Release

R2021b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!