Tell the format of bytes sent via tcp?

I m sending a number of integers wirelessly to matlab and I want to know when its finished. So i have been using the code while(get(t, 'BytesAvailable') > 0)) where t is my tcp/ip object. So when there are no bytes available the while loop escapes. So this is fine for when bytes have stopped sending, however my bytes are sending incremently. ie they start and stop meaning the while loop will escape before i have received my total bytes. Note: this has nothing to do with InputBufferSize the stop start of bytes is simply the behaviour of the source of what i m sending from. When you are reading bytes you specify the format, eg fscanf(t, %d,1) (for ints) so i was hoping to put some kind of delimiter which would tell me when integers have stopped reading. eg if the byte is not an integer and something like a character then you could tell when to finish reading. However how do you read bytes that you dont know the format of? So before i used delimiters to know when to finish reading an integer. eg in arduino i had something like Serial.parseInt() when you send an int to arduino following by a character the parseInt function would escape as soon as it hit the character. Alternatively, another way of looking at it is, say you had a buffer of bytes where some of these bytes were ints and some were characters, how do you know when to read with a different format type? I looked at ways you can use bytesavailablefcn to initialize a callback function of when bytes have finished but this wasnt really approprtiate to what i want to do.

4 Comments

I think you are confusing bytes and characters. Semantically they are two very different things.
When you use
fscanf(t, '%d')
you are not reading bytes as integers, but reading characters as integers. This is important as there are two ways you could encode integers to send over a serial port:
  • Use text. For example to send the integer 1234, you'd encode it as text: '1234' and you'd send the bytes [49 50 51 52] (the ascii values of the characters). The translation from number to character and then character to byte is what fprintf does for you.
  • Use binary. For example to send the integer 1234 , using 32-bit representation you'd send the bytes [4 210] (using big-endian encoding) because 1234 = 4 * 256 + 210. That translation is what fwrite does for you.
As you can see the actual bytes that are sent over the wire are very different.
In that context, "if the byte is not an integer and something like a character" is meaningless. A byte is always an integer (in the range 0-255). And any byte in the range 0-127 is also always the representation of an ASCII character.
In your case, it sounds like you're using text exclusively to transmit data over the serial port. Hence you're always reading characters. So what you meant is "if the character is not a digit (0-9) but some other character" (You're also mixing up digits (the characters '0' to '9') and integers)
Serial ports deal with bytes, just like other data transmission streams. On sufficiently old communications systems, it might only be possible to get 7 useful bits per byte, and some bytes might be interpreted to control the link (such as XON and XOFF and Nul), but it is still a byte stream.
Guillaume
Guillaume on 14 May 2015
Edited: Guillaume on 14 May 2015
@Walter, Yes what's transmitted are bytes. I made some simplifications here to avoid confusing the issue.
From my reading of the OP, what is being transmitted is exclusively text (encoded as bytes of course) and when Robert talks of sending integer, what he means is that he sends the text representation of the digits of the integer (encoded as decimal). So most of the time, when he talks about bytes he's actually talking about characters.
edit: Actually, rereading the OP, I'm very confused, are we talking about a TCP connection or a serial connection? As far as I know fscanf is not a member of tcpclient, and Serial.ParseInt does sound like a serial connection operation. Furthermore, most of the OP makes a lot more sense if what he's writing reading are characters. Yet, the title and the start of the post talks about TCP.
I was referring to a TCP connection. You can use fscanf for TCP objects. When i mentioned Serial.ParseInt i was just giving an example of how i used delimiters before with arduino serial data. Using delimiters was something i was considering trying to read bytes from a TCP object in matlab.

Sign in to comment.

 Accepted Answer

Guillaume
Guillaume on 14 May 2015
Edited: Guillaume on 14 May 2015
As per my comment to your question, I think you're confusing bytes and characters and you're dealing exclusively with transmitting characters (which ultimately are transferred as bytes, but that translation is hidden from you by fprintf).
To solve your problem, what you want to do is read the stream one character at a time and check whether or not that character is a digit:
digits = '';
while strcmp(t.Status, 'open')
c = fscanf(t, '%c', 1); %read one character only
if c >= '0' || c <= '9'
%character is a digit
digits = [digits c]; %add to read digits
else
%character is not digit, convert the digits that have been read to integer
number = str2double(digits ); %convert string to number
digits = ''; %reset string to nothing
%do something with number
end
end

1 Comment

Thank you for the response. I will give this a go because previously i had success when i was practising with a text file populated with integers and characters, to just read everything as a character.

Sign in to comment.

More Answers (1)

Other than packet headers (and trailers), TCP (and UDP) just send bytes. The bytes are not marked as to how they are intended to be interpreted. Interpretation is up to the application.
It is very common for applications to pre-define the number of interpretation of bytes according to their relative position. For example it might define that the first 6 bytes in the stream are a constant string that serves to identify that the program, and it might define the next 2 bytes as being an unsigned 16 bit integer that represents a version number, and it might define everything up to the next 0x0d 0x0a (CR-LF) as being a printable "banner" that might be displayed to the user. It might define that the byte after that is to be binary 0 if needed to reach an even byte boundary. And so on: it can be a mix of fixed-length and variable-length information (including information whose end is marked by a pre-defined terminator.)
If there is variable-length binary information to be transferred, then it is almost never done by using a termination marker. Instead, the information is almost always proceeded by a binary count of the number of bytes that the binary information will occupy. In cases where mixed binary and string information is being sent, it is not uncommon for strings to be proceeded by a binary count of the bytes occupied. Having a count makes processing faster, as code that is not interested in the string can skip forward that many bytes in the stream instead of having to examine each byte to see if the terminator has been reached.
Some protocols are primarily text based. An example is the SMTP email protocol, where text commands are sent and text responses are received. Those protocols seldom require that every string block be proceeded by a byte count. None-the-less one of the text commands might signify in the protocol that it is to switch temporarily into a binary protocol, such as to transfer a block of binary data efficiently.
There is a standard for sending blocks of data that might be of varying data-type or which might be in different byte orders: the standard is known as XDR. It has its uses, but more of the time people define TCP protocols as requiring that data be transmitted in Network Byte Order (which is Big Endian

6 Comments

Thank you for the response. The trouble with reading a fixed number of bytes for the specific format is that sometimes some of my bytes arent received and usually the bytes i m sending are of varying amount. One of the functions I was looking at was textscan. If i send a number of integers and the last byte i send is a character then using textscan(t, '%d', '%c') would read ints and stop reading once it got to the character. This worked reading from a text file but i had issues trying with a tcp object. Another thing i was trying was to force the terminator to escape when it read the characters LF which is pretty much the same as what i did with textscan. I could just stick with the timer i had been using to just finish reading once a time had elapsed but this is slower.
There are circumstances under which textscan will read characters you do not expect to be read, even when you have specified a %d format. In particular, the characters 'd' or 'e' or 'D' or 'E' appearing after a string of digits may be read as forming part of the number, indicating that the parsing for %d is derived from code for parsing floating point numbers :(
If some of your bytes are not received then your TCP implementation is faulty. It is generally valid for an implementation to package payloads into smaller packets than the maximum allowed (provided it uses proper headers), and it is absolutely valid for an implementation to send only what was given to it in any particular fwrite() call instead of waiting to fill the buffer, but TCP is defined as being a reliable transport so if any bytes are not making it from the source to the destination your TCP is broken.
well i agree that TCP should be reliable especially if the socket stays open as that is the advantage of TCP over UDP. However its actually the source i m sending from, that causes some bytes not to be created. But what i dont understand is how are you expected to know exactly the number of bytes when depending on the integer the bytes vary. eg 345(integer) could be 5 bytes, 8(integer) could be 3 bytes. So if i send 1800 integers where many of the integers are of varying size (single/double/triple digits)well i cant know the fixed size of bytes that i should expect?
Ronan, go read my first comment on your question at the top of this page.
You're talking about reading integers represented as a string of decimal digits. Indeed the number of characters required depends on the magnitude of the number (although if an upper bound is known, you could decide on a fixed length string and pad the shorter numbers with '0').
Walter is talking about reading integers as bytes, the same way they're encoded in memory. You just have to agree on the number of bits / bytes used to encode your integer and read that number of bytes. For example, 16 bits (2 bytes) can encode all unsigned integer from 0 to 2^16-1 = 65535.
The latter is a very standard way of transmitting data over networks. You just agree beforehand on the number of bytes used to encode number. See for example the description of the TCP header format. The first number in the header is a 16-bit unsigned integer. Hence you're always reading two bytes for the first number.
Sorry, i was doing something stupid. All of my focus was going into what matlab was doing. You were right, i was actually sending the integers as a string of characters. So i m using an arduino board to transmit bytes over wifi and i never considered the format of Serial.print() on arduino. I was aware Serial.write() only sends one byte and i took it for granted and thought Serial.print() automatically sends the variables in their original format. I was aware that integers are normally around 2 bytes but when i was getting 5 bytes for 3 digit integers on matlab i assumed that maybe it was because of extra data like new line or something weird in matlab. So it makes sense that a string of 3 digits makes up 3 characters and characters equate to about 2 bytes also making up around 5 bytes. So having that said, reading one character at a time should work out.

Sign in to comment.

Categories

Find more on MATLAB Support Package for Arduino Hardware in Help Center and File Exchange

Asked:

on 13 May 2015

Commented:

on 18 May 2015

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!