How to create Excel spreadsheet from HTML report?
4 views (last 30 days)
Show older comments
MathWorks Support Team
on 25 Jun 2018
Edited: MathWorks Support Team
on 5 Feb 2024
I have an HTML report with some information, and I would like to use MATLAB to extract a subset of the text and export to an Excel spreadsheet. How can I do this?
Attached is the HTML file, and copied below are its contents. I have also included a screenshot of the result that I would like to achieve after exporting to an Excel spreadsheet:
<div class="info">
<pre>
###########################################
<strong>-------------------------------------------</strong>
<strong>--------------- Section 1 -----------------</strong>
<strong>-------------------------------------------</strong>Date = 01/01/2018
Time = 12:00:00
Type = Type1
Number1 = 1234
Comments = <strong>-------------------------------------------</strong>
<strong>--------------- Section 2 -----------------</strong>
<strong>-------------------------------------------</strong>Number2 = 100.0001
Version = 1.0.0
Program = example.exe</pre>
</div>

Accepted Answer
MathWorks Support Team
on 2 Dec 2023
Edited: MathWorks Support Team
on 5 Feb 2024
Attached (and also copy-pasted below) is an example script that illustrates one such approach to convert the HTML file to an Excel spreadsheet as per the following workflow:
1) Read in the HTML file
2) Remove specific sections/patterns of text (via regular expression search and replacement)
3) Parse the remaining text based on a specified delimiter and format
4) Export the result to an Excel Spreadsheet
Please note that this example is provided simply to illustrate the application of these functions and outline an example of a general approach. This example code would likely need to be adapted and tuned for a particular workflow and use-case, since the implementation is highly dependent on the requirements therein.
%% Read in HTML file.
filenameHTML = 'Example.html';
txt = fileread('Example.html');
%% Remove HTML tags, header text, and last section (pertaining to images).
txt = regexprep(txt,'<script.*?/script>','');
txt = regexprep(txt,'<style.*?/style>','');
txt = regexprep(txt,'<.*?>','');
txt = regexprep(txt,'.*#\n','');
txt = regexprep(txt,'--.*?\n','');
txt = regexprep(txt,'\n\n.*','');
%% Set up delimiters and format specification to read columns of data as text:
% For more information, see the TEXTSCAN documentation.
delimiter = {' = '};
formatSpec = '%q%q%[^\n\r]';
%% Read columns of data according to the format.
dataArray = textscan(txt, formatSpec, 'Delimiter', delimiter, ...
'TextType', 'char', 'ReturnOnError', false);
raw = repmat({''},length(dataArray{1}),length(dataArray)-1); %preallocation before loop
for col = 1:(length(dataArray)-1)
raw(1:length(dataArray{col}),col) = dataArray{col};
end;
%% Write data to Excel spreadsheet.
filenameSpreadsheet = 'Example.xlsx';
xlswrite(filenameSpreadsheet,raw)
The following links provide additional information and references to the functions that I used, along with their syntax, arguments, parameters, and options:
Run the below command in your MATLAB instance to access the release specific documentation:
web(fullfile(docroot, 'matlab/ref/fileread.html'))
web(fullfile(docroot, 'matlab/ref/regexprep.html'))
web(fullfile(docroot, 'matlab/ref/textscan.html'))
web(fullfile(docroot, 'matlab/ref/xlswrite.html'))
Please follow the below link to search for the required information regarding the current release:
0 Comments
More Answers (0)
See Also
Categories
Find more on Spreadsheets in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!