I would like to extract some information from the following text:
There are 3 groups in the text. I want to extract the genders (enclosed in brackets), the group names (the text following 'Name:') and the student IDs for each group (the numbers following 'ID XX =').
My desired output is as follows:
The issue is that not all groups have a header line (the lines starting with '#'), e.g. for group 3.
My code is as follows
str = fileread('trip-data.txt');
expr = 'Student group.+?\((?<Gender>\w+?)\).*?Name:(?<Name>.+?)\nGROUP.+?=(?<IDs>.+?(,\s*\n.+?)*)(?=(\n|$))';
groups = regexp(str, expr, 'names');
The returned struct array ignores group 3:
I have also tried enclosing the header line in an optional bracket, e.g. '()?', like so
expr = '(Student group.+?\((?<Gender>\w+?)\).*?Name:(?<Name>.+?))?\nGROUP.+?=(?<IDs>.+?(,\s*\n.+?)*)(?=(\n|$))';
The returned struct captures the 'ID' fields but not the 'Gender' and 'Name' fields for all 3 groups: