Failed to read xml error when using xmlread

45 views (last 30 days)
I am trying to read several xml files in a loop using xmlread. An error 'Failed to read xml file' occurs. On examining the xml file I noticed that in the first line that says <?xml version="1.0" encoding="ISO8859-1"?>, if I change ISO8859-1 to ISO-8859-1, xmlread works. Is there an automated way to corect this or any other way to read the files in bulk without having to manually correct the header in each file?

Answers (3)

dpb
dpb on 13 Aug 2020
...
try
DOMnode=xmlread(filename(i)); % try to read the file
catch ME % catch the failure; fixup
fidi=fopen(filename(i),'r'); % open the file
fido=fopen('tmp','r'); % open a scratch temp file
while ~feof(fidi)
l=fgetl(fidi);
if ~empty(strfind(l,'ISO8859'))
l=strrep(l,'ISO8859','ISO-8859'); % fixup the record
end
fprintf(fid0,l) % output to temp file...
end
fidi=fclose(fidi);
fido=fclose(fido);
copyfile('tmp',filename(i)) % and copy over the original
end
DOMnode=xmlread(filename(i)); % and try again with corrected file...
  2 Comments
Sarah Immanuel
Sarah Immanuel on 14 Aug 2020
Thanks a lot for your help.
Just a clarification: in this command fprintf(fid0,l) only the content of 'l' will be writtten to the tmp file? How do we get back all the other remaining content of the original file please?
Walter Roberson
Walter Roberson on 14 Aug 2020
It is within the loop, so eventually the entire content is written.
However, the
fprintf(fid0, l)
should be
fwrite(fid0, l)

Sign in to comment.


Walter Roberson
Walter Roberson on 14 Aug 2020
Edited: Walter Roberson on 14 Aug 2020
filename = 'InputFileName.xml';
S = fileread(filename);
SS = regexprep(S, 'encoding="ISO8859-', 'encoding="ISO-8859-', 'once');
if strcmp(S, SS)
remove = false; %optimization, do not write new file if not needed
tname = filename;
else
tname = tempname();
fid = fopen(tname, 'w');
fwrite(fid, tname);
fclose(fid);
remove = true;
end
DOMnode = xmlread(tname);
if remove; delete(tname); end
This code is deliberate in narrowing down to encoding= and only doing the first instance, so as to avoid accidentally changing any ISO8859 that might happen to be part of the data.
  3 Comments
Walter Roberson
Walter Roberson on 14 Aug 2020
tname is set to filename when strcmp is true, not when it is false.
The comparison is true when the two strings S and SS are exactly the same, which would happen if regexprep did not make a change. Such as for a file that already has the right pattern, or which has a different encoding. In this situation the original file name is used directly for the later xmlread.
When the strcmp is false that means the original and regexprep versions are different, which means that the regexprep worked to make a new string. In that situation, a temporary file name is fetched, and the file is opened and the new content is written, and the temporary file is closed. It is this temporary file whose name is passed to xmlread. After the reading the temporary file is deleted
Walter Roberson
Walter Roberson on 14 Aug 2020
See also https://www.mathworks.com/matlabcentral/answers/101632-how-can-i-use-a-function-such-as-xmlread-to-parse-xml-data-from-a-string-instead-of-from-a-file-i#comment_972999 which shows a Java related method. To use it you would do the fileread(), regexprep(), and then java.io.StringBufferInputStream() the result, and xmlread() what you get from that.

Sign in to comment.


Sarah Immanuel
Sarah Immanuel on 14 Aug 2020
Thanks a lot Walter, yes that makes sense. One last question, hope it is the last!. I am using Matlab2020a. The command tempfile() doesnt seem to work?
  3 Comments
Sarah Immanuel
Sarah Immanuel on 17 Aug 2020
Hi Walter, thanks - the tempname() creates a tempfile but is not handled by the xmlread. It shows an error again. Can you help?

Sign in to comment.

Categories

Find more on Interactive Control and Callbacks in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!