I want to know if I am calculating the results are right?

then the original encoding scheme would encode as p226t1 and that is fine as long as you are not allowed to have more than 9 of a character in succession, as you could count on it alternating source-letter and single digit of count. But if your count hit (say) 13, then you start to need two digits of count: ppppppppppppp222222t would encode as p1326t1 if you worked in a straight forward manner. Now try to decode that. Is that 'p1' '326' 't1' ? p33333333333333333333333333t would encode to the same thing. But so would repmat('p',1326) followed by 't'.

A standard way to handle this is to use byte pairs, with the first byte indicating the count, and the second indicating the original byte to be repeated. That limits you to 255 (maximum binary value of an 8 bit byte) repetitions, but since a repetition count of 0 will never occur you can encode the count as 1 lower than it actually is, allowing 1 to 256 repetitions. And if your block of bytes has more than 256 the same, then you then just start a new count for the same character.

There are also adaptations for the case where there are a lot of different bytes in a row but occasional bursts of repeated bytes.

tina jain on 6 May 2015

Walter....I checked my code and I came to know that for each repetition it is giving a space in the output....either it is 10 times repetition or 1 time. So your second doubt of greater than 9 is solved here. Now please help me to solve your doubt no 1. that is putting non repeating character with 1 count....e.g. ttPtt is my input string than I am getting output as t Pt * . I want to change this to *t P t

Walter Roberson on 8 May 2015

Your aaAA is a vector that contains a mix of original characters and binary counts. Because of the details of how you built it, the vector will be of type char(). The spaces or not that you see are the results of looking at char() of the binary counts. For example if the count happened to be 9 then the corresponding position would hold char(9) which is the tab character. If you want the character '9' to show up instead of binary 9, then you need to change your program, and you need to take into account that counts of more than 9 might occur so that multiple output character positions in the string might be required to do the encoding.

Sign in to comment.

Answer 2

Thomas Koelen on 6 May 2015

0
Link

Direct link to this answer

https://se.mathworks.com/matlabcentral/answers/215294-i-want-to-know-if-i-am-calculating-the-results-are-right#answer_178019

Open in MATLAB Online

a='ttttttPPttttttPPNNNtttttt';
b='t6P2t6P2N3t6';
A=whos('a');
B=whos('b');
compression=B.bytes/A.bytes*100;

compression =

48

7 Comments
Show 5 older commentsHide 5 older comments

Walter Roberson on 12 May 2015

What Thomas had written originally was the calculation for percentage of the original file size, but as I explained earlier the standard for compression ratio is number of original bytes to number of compressed bytes which is the inverse and has no percentage calculation. If the compressed file was 50% of the original size then it is half of the original size, so two bytes of original data are represented by 1 byte of compressed data for a ratio of 2:1. Better compression would have the first number increase. 4:1 would mean 4 bytes of original became 1 compressed, so the compressed file was 1/4 (25%) of the original size.

Walter Roberson on 12 May 2015

In general, using the whos() byte count to define the compression ratio is wrong. whos() will report the number of bytes used to store the data in the data type you used for your storage, but if you did not pack that datatype completely full then you could create a byte array that was smaller and would be the "real" size.

For example it might be convenient (or sloppy programming) to use double precision values to store results that are integral from 0 to 2^24-1 . That's only 3 bytes worth of value stored in 64 bits (8 bytes). The 3 useful bytes from each value could be saved to an array or written to a file, producing a smaller result.

When you use the whos() byte count you need to understand how much information you have packed into those bytes and adjust accordingly.

For this reason, the calculations would more often be done not by asking the programming language for the number of bytes used for storage (which might include overheads), but instead by multiplying the number of elements stored by the number of used bytes per element. In some cases it instead becomes more useful to multiply the count of the number of elements stored by the number of useful bits per element, and divide the total by the bits-per-byte of 8, rounding up: the result is the number of bytes that the information could be written to in a file, after going through procedures to pack down the information. It isn't always worth going through that trouble of packing down the information just to prove that you could. For example you might run different algorithms, find the one with the smallest theoretical size, and only then bother to do the packing down, since packing down can be a sort-of-expensive operation when you can't access hardware bit-rotate instructions.

Sign in to comment.

I want to know if I am calculating the results are right?

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

8 Comments
Show 6 older commentsHide 6 older comments

More Answers (1)

7 Comments
Show 5 older commentsHide 5 older comments

See Also

Categories

Tags

Community Treasure Hunt

I want to know if I am calculating the results are right?

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

8 Comments Show 6 older commentsHide 6 older comments

More Answers (1)

7 Comments Show 5 older commentsHide 5 older comments

See Also

Categories

Tags

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

8 Comments
Show 6 older commentsHide 6 older comments

7 Comments
Show 5 older commentsHide 5 older comments