Textscan with '@' as delimiter

Question

AMM on 7 May 2020

1
Link

Direct link to this question

https://se.mathworks.com/matlabcentral/answers/523878-textscan-with-as-delimiter

Answered: per isakson on 13 May 2020

Accepted Answer: per isakson

Open in MATLAB Online

I'm working with an inherited script that calls TEXTSCAN as follows:

allData = textscan(fid,'%s','Delimiter','@');

What does the at-sign delimiter parameter do, and is this documented anywhere?

I don't see anything in the TEXTSCAN help for this, but when I parse the same text file with and without that parameter specified, I get different results. The input file contains no explicit at-sign characters anywhere. Is TEXTSCAN treating the @ as some special control character?

5 Comments
Show 3 older commentsHide 3 older comments

AMM on 8 May 2020

Edited: AMM on 8 May 2020

Open in MATLAB Online

Thanks, both, for the replies.

Walter, I'm not seeing what you describe—I see effects throughout the input file, not just at the end. If I have a plain-text file that contains no at-signs in it, and I perform the TEXTSCAN call above with and without the 'Delimiter','@' parameter/value arguments, I get significantly different results:

with 'Delimiter','@' (trimmed for compactness):

    whos allData_withDelim, allData_withDelim(1), allData_withDelim{1},
      Name                   Size            Bytes  Class    Attributes
      allData_withDelim      1x1             34684  cell               
    
    ans =
      1×1 cell array
        {133×1 cell}
        
    ans =
      133×1 cell array
        {'     3.04           N: GNSS NAV DATA    M: Mixed            RINEX VERSION / TYPE'}
        {'XXXXXXX XXXXX XXXX                      20200101 123500 UTC PGM / RUN BY / DATE '}
        ...

without 'Delimiter','@' (similarly trimmed; note the CR/LF linebreaks in the last quoted line):

    whos allData_noDelim ; allData_noDelim(1), allData_noDelim{1},
      Name                 Size            Bytes  Class    Attributes
      allData_noDelim      1x1             21488  cell               
    
    ans =
      1×1 cell array
        {1×1 cell}
        
    ans =
      1×1 cell array
        {'     3.04           N: GNSS NAV DATA    M: Mixed            RINEX VERSION / TYPE←↵XXXXXXX XXXXX XXXX                      20200101 123500 UTC PGM / RUN BY / DATE ←↵ ...'}

It sure seems like calling TEXTSCAN with the P/V pair 'Delimiter','@' affects its handling of line endings—in other words, it seems to treat the at-sign as a special character, rather than as a literal one. (As I mentioned, this input file contains no at-signs anywhere.)

But I don't see this anywhere in the documentation, and I have no idea what's going on with TEXTSCAN "under the hood." Sorry to be obtuse, but is this possible?

Walter Roberson on 9 May 2020

Open in MATLAB Online

Please attach your data file, and also the code you use to reproduce the problem.

The tests I have done find nothing special about using @ . The effect I get when I use any character not found in the file exactly the same as if I use

textscan(fid, '%s', 'Delimiter', '\n', 'Multiple', true)

or

textscan(fid, '%s', 'whitespace', '\n')

and the effect is:

each time the %s fires, skip all leading spaces and newlines
once the %s starts reading something non-blank, continue until the first newline

AMM on 12 May 2020

Edited: AMM on 12 May 2020

Open in MATLAB Online

textscan_test.txt

Hi Walter,

Here you go. Here is what I'm seeing with the attached file:

>> fid=fopen('textscan_test.txt','rt');
>> out1=textscan(fid,'%s'); out1=out1{1}; frewind(fid);
>> out2=textscan(fid,'%s','Delimiter','@'); out2=out2{1}; 
>> out3=textscan(fid,'%s','whitespace','\n'); out3=out3{1}; fclose(fid);
>> whos
  Name         Size             Bytes  Class     Attributes
  ans          1x1                  8  double              
  fid          1x1                  8  double              
  out1      2700x1             351730  cell                
  out2       538x1             134220  cell       
  out3       538x1             134220  cell  

As you can see, the attached file contains no at-signs.

Indeed, what seems to be happening is exactly what you describe: if textscan is given a delimiter that doesn't occur in the input, it falls back to the default behavior you mention above.

Sign in to comment.

Sign in to answer this question.

Answer 1

per isakson on 13 May 2020

0
Link

Direct link to this answer

https://se.mathworks.com/matlabcentral/answers/523878-textscan-with-as-delimiter#answer_432122

I've reproduced your result on R2018b. The result is according to the textscan documentation - I think.

out1 is a cell array of character arrays with one item per cell
out2 is a cell array of character arrays with one data row per cell

Case 1. One or more spaces are used as delimiter. That's by default and regardless of the value of 'MultipleDelimsAsOne'. Doc says: If you do not specify a delimiter, then: the delimiter characters are the same as the white-space characters.

Case 2. '@' is used as delimiter. '%s' matches the entire row, since no delimiter is found. (I don't find a sentence in the documentation to copy. There is something about row-oriented that goes without saying.)

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Textscan with '@' as delimiter

5 Comments
Show 3 older commentsHide 3 older comments

Accepted Answer

0 Comments
Show -2 older commentsHide -2 older comments

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

Textscan with '@' as delimiter

5 Comments Show 3 older commentsHide 3 older comments

Accepted Answer

0 Comments Show -2 older commentsHide -2 older comments

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

5 Comments
Show 3 older commentsHide 3 older comments

0 Comments
Show -2 older commentsHide -2 older comments