How to filter certain parts of a string.

16 views (last 30 days)
Hello,
So I have a string file which has a ton of data in it.
Let's say it looks like this:
"Introduction:
x1x1x1x1x1x1x1x1x1x1x1x1x1
Summary:
x2x2x2x2x2x2x2x2x22x2x2x2
Conclusion:"
What I would want to do is setup a filter so that it reads the keyword Introduction and Summary and captures everything in between. While ignoring everything else.
In this case it should be
"Introduction:
x1x1x1x1x1x1x1x1x1x1x1x1x1"
If it helps, I am running a script with this code to generate this string file after reading the contents from a word document:
word = actxserver('Word.Application');
word.visible = 0;
wdoc = word.Documents.Open('wordfile.docx');
x = splitlines(string(wdoc.Content.Text));
wdoc.Close;
word.Quit;

Accepted Answer

Walter Roberson
Walter Roberson on 17 Sep 2019
Edited: Walter Roberson on 17 Sep 2019
S = char(wdoc.Content.Text);
x = regexp(S, '^Introduction:.*?(?=^\nSummary:)', 'match', 'lineanchors');
  10 Comments
greenyellow22
greenyellow22 on 23 Jun 2022
Hey Walter! I thought I might just ask you directly. I have a table of which one of the columns includes text. Some cells of this column look like this: 'New Scene was loaded: Castle' or 'New Scene was loaded: World1'.
I want to filter out/split for all cells with the text 'New Scene was loaded:' the last word e.g. 'Castle' or 'World1' and insert it into a new column. Can you tell me how I can split these cells based on this rule? I appreciate any help!! :)
Walter Roberson
Walter Roberson on 24 Jun 2022
TableName.NewColumnName = regexp(TableName.ColumnName, '(?<=New Scene was loaded:\s+)\S+', 'match', 'once')
There is a possibility that TableName.ColumnName might need to be {TableName.ColumnName} but I think it unlikely.

Sign in to comment.

More Answers (0)

Categories

Find more on Characters and Strings in Help Center and File Exchange

Products


Release

R2017b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!