Failed to read xml error when using xmlread
45 views (last 30 days)
Show older comments
I am trying to read several xml files in a loop using xmlread. An error 'Failed to read xml file' occurs. On examining the xml file I noticed that in the first line that says <?xml version="1.0" encoding="ISO8859-1"?>, if I change ISO8859-1 to ISO-8859-1, xmlread works. Is there an automated way to corect this or any other way to read the files in bulk without having to manually correct the header in each file?
0 Comments
Answers (3)
dpb
on 13 Aug 2020
...
try
DOMnode=xmlread(filename(i)); % try to read the file
catch ME % catch the failure; fixup
fidi=fopen(filename(i),'r'); % open the file
fido=fopen('tmp','r'); % open a scratch temp file
while ~feof(fidi)
l=fgetl(fidi);
if ~empty(strfind(l,'ISO8859'))
l=strrep(l,'ISO8859','ISO-8859'); % fixup the record
end
fprintf(fid0,l) % output to temp file...
end
fidi=fclose(fidi);
fido=fclose(fido);
copyfile('tmp',filename(i)) % and copy over the original
end
DOMnode=xmlread(filename(i)); % and try again with corrected file...
2 Comments
Walter Roberson
on 14 Aug 2020
It is within the loop, so eventually the entire content is written.
However, the
fprintf(fid0, l)
should be
fwrite(fid0, l)
Walter Roberson
on 14 Aug 2020
Edited: Walter Roberson
on 14 Aug 2020
filename = 'InputFileName.xml';
S = fileread(filename);
SS = regexprep(S, 'encoding="ISO8859-', 'encoding="ISO-8859-', 'once');
if strcmp(S, SS)
remove = false; %optimization, do not write new file if not needed
tname = filename;
else
tname = tempname();
fid = fopen(tname, 'w');
fwrite(fid, tname);
fclose(fid);
remove = true;
end
DOMnode = xmlread(tname);
if remove; delete(tname); end
This code is deliberate in narrowing down to encoding= and only doing the first instance, so as to avoid accidentally changing any ISO8859 that might happen to be part of the data.
3 Comments
Walter Roberson
on 14 Aug 2020
tname is set to filename when strcmp is true, not when it is false.
The comparison is true when the two strings S and SS are exactly the same, which would happen if regexprep did not make a change. Such as for a file that already has the right pattern, or which has a different encoding. In this situation the original file name is used directly for the later xmlread.
When the strcmp is false that means the original and regexprep versions are different, which means that the regexprep worked to make a new string. In that situation, a temporary file name is fetched, and the file is opened and the new content is written, and the temporary file is closed. It is this temporary file whose name is passed to xmlread. After the reading the temporary file is deleted
Walter Roberson
on 14 Aug 2020
See also https://www.mathworks.com/matlabcentral/answers/101632-how-can-i-use-a-function-such-as-xmlread-to-parse-xml-data-from-a-string-instead-of-from-a-file-i#comment_972999 which shows a Java related method. To use it you would do the fileread(), regexprep(), and then java.io.StringBufferInputStream() the result, and xmlread() what you get from that.
See Also
Categories
Find more on Interactive Control and Callbacks in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!