Help with Regular Expressions

3 views (last 30 days)
Chris
Chris on 27 Mar 2015
Edited: Guillaume on 30 Mar 2015
Hi Everyone! I am trying to parse messages which have a repeating structure (e.g. GPS NMEA messages), but the number of repeats of the structure is not immediately known. Let me illustrate with an example, take the following made-up message as an example:
MSGID,3,<data1a>,<data1b>,<data1c>,<data2a>,<data2b>,<data2c>,<data3a>,<data3b>,<data3c>
With this format I have a message identifier, followed by the number of data sets in the message (3, in this case), followed by the three sets of data. These three sets of data always come in sets of three, but the number of sets of three are unknown a priori and vary with each unique message. Is there a way to parse this using regular expressions? I.e. is there a way, using regexp, to incorporate the second field which tells me the number of "sets of three" I'll have, all within my expression in the regexp function call?
Thank you in advance!
Chris
UPDATE: EDIT: Everyone, Thank you for your responses thus far. I want to be a bit more clear for everyone as to what I am trying to do. I am trying to parse a NMEA message, specifically the GPGSV message. I'd like to parse the whole thing using regular expressions if possible. An example of the GPGSV message is given below:
$GPGSV,3,1,12,06,73,157,45,26,54,268,42,17,48,037,43,24,35,302,40*7B
$GPGSV,3,2,12,02,35,205,41,28,35,103,43,20,16,147,34,12,12,309,*7E
$GPGSV,3,3,12,03,10,057,33,13,03,215,36,15,01,250,32,30,00,157,24*7D
In this example, we have:
$GPGSV - message ID
3 - Number of GPGSV pages
1 - Page number of this message
12 - Number of data sets total (across all three messages).
The number 12 is interesting, it is saying that there are 12 total data sets across three messages, meaning there are 4 data sets in each message. I want to point out that each message does not specifically call out how many data sets it contains, you have to examine both the total number of data sets, as well as the current page number to figure out how many data sets are in the page.
After the number 12, we being the data sets:
06 - Satellite ID number
73 - Elevation angle of the satellite
157 - Azimuth angle of the satellite
45 - SNR of the satellite
26 - Satellite ID number
... etc (in sets of four numbers)
I am trying to find a way to parse this message using regular expressions. These GPGSV messages tend to come in sets of three, an example:
$GPGSV,3,1,11,05,65,062,45,29,59,331,44,25,51,241,43,12,35,190,41*75
$GPGSV,3,2,11,02,34,059,43,21,19,271,36,13,17,126,37,15,11,164,38*75
$GPGSV,3,3,11,10,07,041,34,20,05,043,33,18,00,218,00*4F
So here, I know that I need to parse out 4 data sets from message page 1, 4 data sets from message page 2, but only three data sets from message page 3. I realize this is complex, but this is what I'm trying to parse in a dynamic way using regular expressions.
Thank you guys in advance!
Chris
  2 Comments
arun kumar
arun kumar on 27 Mar 2015
maybe you can also search for fourth position after < and use a counter to check if there is repetition of this number. if there is a repetition of this number then your counter value has to be increased..so it will check that '1' has come three times so your value is 3. this works if the data format is always the same
Stephen23
Stephen23 on 27 Mar 2015
You might like to try some of the Regular Expression Helpers available on MATLAB File Exchange, such as my own submssion:
It lets you try different match expressions and shows regexp's outputs as you type.

Sign in to comment.

Accepted Answer

Guillaume
Guillaume on 27 Mar 2015
Edited: Guillaume on 27 Mar 2015
It's certainly possible to capture the data set number to reuse later in the expression. In fact, it's even the example that's shown in the documentation of regexp under dynamic regular expressions:
'^(\d+)((??\\w{$1}))' determines how many characters to match by reading a digit at the beginning of the string.
I'm not clear on what exactly you're trying to extract from your message, though.
  3 Comments
Chris
Chris on 30 Mar 2015
Edited: Chris on 30 Mar 2015
EDIT: Nevermind, typing "help isKey" into the console didn't produce anything, but the online help does have some documentation: http://www.mathworks.com/help/matlab/ref/containers.map.iskey.html?searchHighlight=iskey
Guillaume, Thank you very much for this answer, this looks very helpful! One question, what are you referring to when you use the function "isKey" ? Is that pseudocode for something? Or is that a function I'm unaware of? Matlab's help gives me nothing for that function.
Thank you,
Chris
Guillaume
Guillaume on 30 Mar 2015
Edited: Guillaume on 30 Mar 2015
isKey is a member function of the map class. It is not pseudocode.
It checks whether the given message ID is already a key in the map. If it is, I retrieve the corresponding value. If it is not, then I add it to the map.
Note that the first argument to isKey must be a map object. Otherwise, you'll most likely get an undefined function for type xxx error.

Sign in to comment.

More Answers (1)

Stephen23
Stephen23 on 27 Mar 2015
Edited: Stephen23 on 27 Mar 2015
I would solve this exactly the other way around: simply identify the groups of dataNa,dataNb,dataNc using a basic regexp pattern (you said they always come in threes), and then afterwards confirm that the number of groups found matches the given value, something like this pseudocode:
tkn = regexp(str,'(data1a),(data1b),(data1c)', 'tokens');
tot = regexp(str,'MSGID,(\d)', 'tokens')
assert(numel(tkn)==str2double(tot))
Or using the pseudo-data from the original question:
>> str = 'MSGID,3,<data1a>,<data1b>,<data1c>,<data2a>,<data2b>,<data2c>,<data3a>,<data3b>,<data3c>';
>> tkn = regexp(str,'<(\w+)>,<(\w+)>,<(\w+)>','tokens');
>> tkn = vertcat(tkn{:});
>> size(tkn,1)==sscanf(str,'MSGID,%f,')
ans =
1

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!