TextScan - flexible formatSpec string how to?

3 views (last 30 days)
BSantos
BSantos on 20 May 2015
Edited: BSantos on 20 May 2015
Hey fellow "Matlabers"!
I have been struggling with this problem for quite a while and so far could not figure out a solution. Maybe someone here knows or have a suggestion.
The problem:
I am reading several .txt files, which contain all kind of data (strings, date format, numbers) and I only need to extract information of a few columns. The problem is that I need to ignore a certain amount of characters (marked as string) until I reach the first column that has data I need. For each file, the amount of characters can vary and therefore, I don't know how to specify on the formatSpec string that will be used in my textscan function. The number 59 is the value that varies; each file has a different number of characters to discard.
Example:
formatSpec = '%*59*s%10{dd/MM/yyyy}D%6{HH:mm}D%*10*s%*14s%10s%*8*s%*14s%10s%[^\n\r]';
textscan(fileID, '%[^\n\r]', startRow-1, 'ReturnOnError', false);
dataArray = textscan(fileID, formatSpec, 'Delimiter', '', 'WhiteSpace', '', 'ReturnOnError', false);
Error message:
Error using textscan
Unable to read the DATATIME data with the format 'dd/MM/yyyy'. If the data is not a time, use %q to get
string data.
Any idea how can I automate this process?
Thanks in advance!
EDIT: I have added two .txt files as an example of what kind of data I am dealing with.
  2 Comments
Stephen23
Stephen23 on 20 May 2015
Your description is great. All that is missing are a few sample text files, so that we can test out code on and see if it works. You can upload a few test files using the paperclip button, and not that you will need to push both the Choose file and Attach file buttons too.
It is much easier for us and also for you if we have real data to work with!
BSantos
BSantos on 20 May 2015
Stephen,
I thought about adding my files, but I'm afraid I can't due to company restrictions. I will try to "edit" my txt files and leave out just some information so I don't get in troubles.
Thanks!

Sign in to comment.

Answers (1)

Walter Roberson
Walter Roberson on 20 May 2015
ToSkip = 59;
formatSpec = ['%*', sprintf('%d', ToSkip), '*s%10{dd/MM/yyyy}D%6{HH:mm}D%*10*s%*14s%10s%*8*s%*14s%10s%[^\n\r]';
  5 Comments
Walter Roberson
Walter Roberson on 20 May 2015
The two sample files you provided can be handled by using
repmat('%*s',1,5)
as the data to skip.
By the way, are the columns possibly tab separated?
BSantos
BSantos on 20 May 2015
Edited: BSantos on 20 May 2015
Walter,
Thanks, I will try this out on my script. Unfortunately no; the software generating this .txt files is a bit "user unfriendly"... The csv files are even a lot worse to handle than the text files; so I choose saving my results in this kind of text files.
If this works, I will pots over here.
EDIT:
Well I get the same error as posted on my question. Any other suggestions?
Thanks!!

Sign in to comment.

Categories

Find more on Characters and Strings in Help Center and File Exchange

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!