Find NaNs at the end of an Excel file

Hi,
New user, first question, so bear with me but I haven't been able to find anything on it. I need to find file names containing NaN values that are on the end of a comma delimited file. Xlsread seems to automatically truncate. The files vary in length. What is the best way to do this?
baseInputFolder = 'C:\Users\me\Desktop\Test\';
filename = strcat(baseInputFolder,'find_NaNs.xlsx');
inputFiles = dir(fullfile(baseInputFolder,'**\*.csv'));
nextRow = 1;
for k = 1:length(inputFiles)
baseFileName = inputFiles(k).name;
fullFileName = fullfile(inputFiles(k).folder, baseFileName);
fprintf('Reading file %d of %d named %s\n',k,length(inputFiles),baseFileName);
if any(isnan(xlsread(fullFileName,1)), 'all')
range1=sprintf('%s%d','A',nextRow);
writematrix(fullFileName,filename,'Sheet','Sheet1','Range',range1);
nextRow = nextRow+1;
end
end

 Accepted Answer

See if the contains function (introduced in R2016b) will do what you want.

7 Comments

Hi,
Thanks for getting back to me so quickly. I tried this already; it appears to not work because xlsread doesn't read in trailing NaNs to compare. Am I missing something?
Thanks,
JJ
My pleasure.
You may not be missing something, however it would likely be easier to solve this if you provide examples of the file names you want to analyse.
Note that ‘I need to find file names containing NaN values that are on the end of a comma delimited file.’ is slightly ambiguous. An example of the original data (that appear to be file names) and an example of what you want as a result wouold be helpful.
So sorry-- it's not the names of the files, it's the contents. The other process which spits out these files is appending NaN values in the last rows of some of them. I need to find the names of the files with those rows. One of my files is attached.
JJ
I’m not certain what you’re doing, or if this is any significant improvement:
nextRow = 1;
for k = 1:length(inputFiles)
baseFileName = inputFiles(k).name;
fullFileName = fullfile(inputFiles(k).folder, baseFileName);
fprintf('Reading file %d of %d named %s\n',k,length(inputFiles),baseFileName);
D1 = readmatrix(fullFileName);
[r,c] = find(isnan(D1));
nrnan = nnz(isnan(D1)); % Number Of ‘NaN’ Values In File
nanconsec = nnz(diff(r)>1)==0; % If Rows Containing ‘NaN’ Values Are Consecutive = ‘true’
if nrnan & nanconsec
nanfiles{nextRow} = fullFileName; % If Both Conditions Are ‘true’, Store ‘filename’
nextRow = nextRow+1;
end
end
That assumes that you are only checking for NaN values in rows at the end of each file, not intermediate NaN values (if any exist anywhere else). This will not store file names if the rows with NaN values are not consecutive. If the NaN values are only at the end of the file, it would store them, if they are also in other places in the file, it would not store that specific name. (It might also be necessary to test to be certain the NaN values fill each row completely. I have no idea if this is necessary.)
This stores them in the looop and then would write the file name cell array to a text file after the loop completes.
I knew this had to be simpler than I was making it: readmatrix instead of xlsread. Told you this was a newbie question. Thank you so much for all the time you've spent on this. I really appreciate it.
Need to work on my articulation. I only needed to find out if there were any NaNs anwhere in the file but xlsread cuts off anything at the end of the file that isn't a number (as nearly as I can tell) by default. NaNs after the last valid cell were not being read in so I could detect them. Readmatrix apparently gets the whole thing so I was able to modify the script to get what I needed.
Thanks again,
JJ
As always, my pleasure!
Yes, when you use use xlsread(), the first output, num, automatically has leading and trailing rows and columns of nan removed. This is because when you are talking about numeric values, text shows up as NaN (not a number, after all) and xlsread() wants to trim out header lines and trailer lines and text columns.
Also it is because if you ask excel to read a range of values and the range exceeds the size actually in the file, then excel returns nan. So xlread() cannot tell the difference between nans supplied because the file "ended" and nans that were part of the data. Indeed, unless there is a template in the file or formatting has been specifically applied to a particular range, Excel itself cannot really tell where the end of the data is. It is all ambiguous in spreadsheets: if you wrote something to row 10000 and then deleted the content, then is the spreadsheet now "really" 10000 rows, or is it "really" the size implied by the last non-empty data?

Sign in to comment.

More Answers (0)

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!