# I want to know if I am calculating the results are right?

1 view (last 30 days)
tina jain on 6 May 2015
Commented: Walter Roberson on 12 May 2015
I have applied run length encoding on a string or characters. Let it be
'ttttttPPttttttPPNNNtttttt'
which after Run length encoding turns to be t6P2t6P2N3t6
so I calculated the compression ratio to be (12/25)*100= 48%
Is it correct???

Walter Roberson on 6 May 2015
Edited: Walter Roberson on 6 May 2015
Close. The original is length 25 not 26.
But how are you going to handle counts of greater than 9? And how are you going to handle characters that do not repeat?
tina jain on 6 May 2015
Walter....I checked my code and I came to know that for each repetition it is giving a space in the output....either it is 10 times repetition or 1 time. So your second doubt of greater than 9 is solved here. Now please help me to solve your doubt no 1. that is putting non repeating character with 1 count....e.g. ttPtt is my input string than I am getting output as t Pt * . I want to change this to *t P t
Walter Roberson on 8 May 2015
Your aaAA is a vector that contains a mix of original characters and binary counts. Because of the details of how you built it, the vector will be of type char(). The spaces or not that you see are the results of looking at char() of the binary counts. For example if the count happened to be 9 then the corresponding position would hold char(9) which is the tab character. If you want the character '9' to show up instead of binary 9, then you need to change your program, and you need to take into account that counts of more than 9 might occur so that multiple output character positions in the string might be required to do the encoding.

Thomas Koelen on 6 May 2015
a='ttttttPPttttttPPNNNtttttt';
b='t6P2t6P2N3t6';
A=whos('a');
B=whos('b');
compression=B.bytes/A.bytes*100;
compression =
48
Walter Roberson on 12 May 2015
What Thomas had written originally was the calculation for percentage of the original file size, but as I explained earlier the standard for compression ratio is number of original bytes to number of compressed bytes which is the inverse and has no percentage calculation. If the compressed file was 50% of the original size then it is half of the original size, so two bytes of original data are represented by 1 byte of compressed data for a ratio of 2:1. Better compression would have the first number increase. 4:1 would mean 4 bytes of original became 1 compressed, so the compressed file was 1/4 (25%) of the original size.
Walter Roberson on 12 May 2015
In general, using the whos() byte count to define the compression ratio is wrong. whos() will report the number of bytes used to store the data in the data type you used for your storage, but if you did not pack that datatype completely full then you could create a byte array that was smaller and would be the "real" size.
For example it might be convenient (or sloppy programming) to use double precision values to store results that are integral from 0 to 2^24-1 . That's only 3 bytes worth of value stored in 64 bits (8 bytes). The 3 useful bytes from each value could be saved to an array or written to a file, producing a smaller result.
When you use the whos() byte count you need to understand how much information you have packed into those bytes and adjust accordingly.
For this reason, the calculations would more often be done not by asking the programming language for the number of bytes used for storage (which might include overheads), but instead by multiplying the number of elements stored by the number of used bytes per element. In some cases it instead becomes more useful to multiply the count of the number of elements stored by the number of useful bits per element, and divide the total by the bits-per-byte of 8, rounding up: the result is the number of bytes that the information could be written to in a file, after going through procedures to pack down the information. It isn't always worth going through that trouble of packing down the information just to prove that you could. For example you might run different algorithms, find the one with the smallest theoretical size, and only then bother to do the packing down, since packing down can be a sort-of-expensive operation when you can't access hardware bit-rotate instructions.