Compression

Symoon · Post by **Symoon** » Sun Jun 15, 2014 5:08 pm

Hi,
I've been testing a compression method for Oric Hires screen. Still being worked on, so maybe some mistakes here or there, but anyway.
I have results that go, for the simplest screens, to about 200 bytes (empty screen), and up to 4100 bytes for complex ones (4070 for the title screen of Murder On the Atlantic; 3815 for the Théoric's faamous pirate boat).

This is not counting the working area (decompression zone + decompression routines) that should be around 1k.

Before wasting (lots of) time writing the decoding routines, can you guys already tell me how this compares to existing methods you'd know on Oric ?
Please do send hires screens if you want me to try to compress them and give you the resulting space.

Thanks!

PS: for the curious ones, the method is a RLE one, which splits the 6 diplayed bytes and the 2 that do not appear on screen; and work vertically instead of horizontally.

Chema · Post by **Chema** » Sun Jun 15, 2014 8:28 pm

The OSDK incluides a tool for compressing binary data

http://osdk.defence-force.org/index?pag ... e=filepack

You can test against it...

Post by **Dbug** » Sun Jun 15, 2014 8:30 pm

Symoon wrote:I have results that go, for the simplest screens, to about 200 bytes (empty screen), and up to 4100 bytes for complex ones (4070 for the title screen of Murder On the Atlantic; 3815 for the Théoric's faamous pirate boat).

Before wasting (lots of) time writing the decoding routines, can you guys already tell me how this compares to existing methods you'd know on Oric ?

Hi, these two pictures are in the Oric slideshow "Pushing The Envelope", and were compressed with PictConv:

Code: Select all

// Entry #9 '..\build\files\murder_on_the_atlantic.hir'
// - Loads at address 40960 starts on track 7 sector 6 and is 13 sectors long (3188 compressed bytes: 39% of 8000 bytes).
// - Associated metadata: author='Dom' name='Murder on the Atlantic' 
// Entry #10 '..\build\files\trois_mats.hir'
// - Loads at address 40960 starts on track 8 sector 2 and is 17 sectors long (4222 compressed bytes: 52% of 8000 bytes).
// - Associated metadata: author='Vasiloric' name='Sailing ship'

So apparently, FilePack is more efficient on the first one, and less on the second one.
Your results, for a simple RLE encoding are quite impressive

Memory wise all it requires is the 150 bytes or so used by the depacking routine, there's no additional dictionary or intermediate buffer.

Symoon · Post by **Symoon** » Sun Jun 15, 2014 9:02 pm

Thanks for the feedback and the tests!
Filepack seems quite efficient enough not to waste more time on this on my side

I also tried a really complex picture (probably made by Libpipi), and the compression was... Bigger than the original ^^

About the efficiency... The basic idea was that the 1st two (invisible) bits of each byte are, quite often, the same, especially in the same column. So I began by packing them vertically, using bytes structured like this for each bit:
DDNNNNNN ; D=DATA (two bits), N= amount (how many times they occur). This is very efficient for classic drawings.
Then for the visible pixels, I also used a vertical analysis, but this time with a structure like
NNDDDDDD
So one byte could hold up to 4 times the same repeating byte; and then a 2nd pass compacted the repeating bytes that already stored 4 bytes.
Not sure I'm very clear...

Dbug, if you ave a moment could you do a final test with the following picture too, with pictconv ?

LAMER.tap: (7.83 KiB) Downloaded 823 times

My method compresses it at 610 bytes but I suspect pictconv will do better. Thanks

(tried to access the files on Pushing The Enveloppe but the disk is special

)

Symoon · Post by **Symoon** » Sun Jun 15, 2014 9:40 pm

Symoon wrote:Dbug, if you ave a moment could you do a final test with the following picture too, with pictconv ?
LAMER.tap
My method compresses it at 610 bytes but I suspect pictconv will do better. Thanks

Well I got myself filepack and tried: lamer.tap compressed by Filepack gives a 1238 bytes big file. So in some cases (and rather obviously) a dedicated compression routine is more efficient. Filepack has the advantage of already existing 100%, and working with any kind of data!

Post by **Dbug** » Mon Jun 16, 2014 5:41 pm

What would be an interesting test, is to see if we would get better compression ratios by using FilePack per column instead of per line

Symoon · Post by **Symoon** » Mon Jun 16, 2014 6:59 pm

Dbug wrote:What would be an interesting test, is to see if we would get better compression ratios by using FilePack per column instead of per line

Well I found the idea interesting because of the HIRES screen structure: 2 dimensions, the vertical one being more important than the horizontal one. 200 bytes vertically, potentially repeating as much as the (only) 40 bytes horizontally.
Complex drawings (like the ship or Murder) compress about 20% more vertically than horizontally in my RLE-like method.

But I don't think this would apply for code or any other "single-dimension" data.

Antiriad2097 · Post by **Antiriad2097** » Tue Jun 17, 2014 6:22 am

Given the current penchant for alternating colours on every line, would it be worth scanning every column twice, once for even lines and once for odd lines? I'm curious of the impact this would have as an option.

Symoon · Post by **Symoon** » Tue Jun 17, 2014 6:44 pm

Antiriad2097 wrote:Given the current penchant for alternating colours on every line, would it be worth scanning every column twice.

Sure the alternate colors wouldn't help, but they mainly "only" occur in a single column (the 1st, defining the global lines color).

Actually it would be interesting to test anyway, trying to detect repeating schemes. What I'm afraid of is that the method is already complex (1 pass the the 2 first bytes, and two passes for the remaining six), and making it more complex, we have to keep in mind that the decoding routines and buffer size should be minimal too, as they should be added to the compressed picture size to evaluate the real final compression efficiency.

Post by **Dbug** » Tue Jun 17, 2014 7:00 pm

Symoon wrote:(1 pass the the 2 first bytes, and two passes for the remaining six)

Bytes... or bits?

Symoon · Post by **Symoon** » Tue Jun 17, 2014 10:52 pm

Dbug wrote:
Symoon wrote:(1 pass the the 2 first bytes, and two passes for the remaining six)
Bytes... or bits?

Ooops, bits, you're right

I made the mistake in my 1st post too, I'm a bit tired I'm afraid :-/

Antiriad2097 · Post by **Antiriad2097** » Wed Jun 18, 2014 1:00 pm

A two pass run across each column wouldn't add much overhead, since your run length compression would be storing alternate lines, so lines 1,3,5 etc would come first, followed by 2,4,6 etc.

The actual stored data is potentially less, since you're more likely to have a successful run of similar bytes to compress. All you need is one bit/byte in the header of your compressed data to indicate whether its using consecutive or alternating columns for compression. If you want to include both methods, you'd need that header bit so it knows that lines are stored sequentially or alternating.

Godzil · Post by **Godzil** » Wed Jun 18, 2014 3:06 pm

Antiriad2097 wrote:A two pass run across each column wouldn't add much overhead, since your run length compression would be storing alternate lines, so lines 1,3,5 etc would come first, followed by 2,4,6 etc.

The actual stored data is potentially less, since you're more likely to have a successful run of similar bytes to compress. All you need is one bit/byte in the header of your compressed data to indicate whether its using consecutive or alternating columns for compression. If you want to include both methods, you'd need that header bit so it knows that lines are stored sequentially or alternating.

In fact, the amount of data that need to be compressed is really small compared to what a modern computer can manage.

RLE is maybe not the best compression methods, but there is some way to make it as better as possible, especially with moderne computer and "low" size of data to compress.

There is two thing to look when compressing, first is, for the current block size, to count each possible symbol value and use the symbol with the lowest occurence as the "RLE" trigger.
Next is to use the "power" of modern computer and try to compress the data with multiple symbol size and choose the one with the best ratio. This only add two thing to a file, one marker that say the size of a symbol, the other one that says what is the current RLE switch symbol.
You could also trye to determine the minimum bit to use represent the "length" value (the number of time a value must be represented), and also use this size to determine the minimum number of occurence that you should have to switch current pattern to a RLE.

I find my explaination a bit difficult to follow, here is I hope an example that will make things clearer:

For exemple we use the caracter @ as RLE switch, the symbol size is 8bit, and use one byte

If we have this partern:

Code: Select all

ABBABBBAAABAABBABBBBBBB [ 23 bytes - 184 bit ]

If the RLE format we use is

Code: Select all

Key; Repeat (1 Byte) ; Value

We can naively translate this to:

Code: Select all

A;@;2;B;A;@;3;B;@;3;A;B;@;2;A;@;2;B;A;@;7;B;  [ 22 Bytes - 176 bit ]

Of course this is not optimal as we need 3 bytes to store a RLE

We can says that if the lenght is lower or equal to 3, we won't do a RLE:

Code: Select all

A;B;B;A;B;B;B;A;A;A;B;A;A;B;B;A;@;7;B; [ 19 Bytes - 152 bit ]

Which is a better

Also, the maximum of item in a row we can meet here is 7, we can encode this to 3 bit instead of 8bit currently.

So run again with non optimisation:

Code: Select all

  A;@;2;B;A;@;3;B;@;3;A;B;@;2;A;@;2;B;A;@;7;B;  [ 146 bit : ~ 19 Bytes  ]
( 8+8+3+8+8+8+3+8+8+3+8+8+8+3+8+8+3+8+8+8+3+8 bits )

Now when storing a RLE we store 8 + 3 + 8 = 19bit which is a bit higher than 2bytes, so unlike before when we repeat at least 3 times (8*3 = 24bit) we have a clear gain, so now encode only if lenght is greater than 2:

Code: Select all

  A;B;B;A;@;3;B;@;3;A;B;A;A;B;B;A;@;7;B;  [ 137 bit : ~ 17 Bytes  ]
( 8+8+8+8+8+3+8+8+3+8+8+8+8+8+8+8+8+3+8 bits )

The next step will be to look at the data itself and determine the correct window size for the symbol, instead of 8bit like now.

forum.defence-force.org

Compression

Compression

Re: Compression

Re: Compression

Re: Compression

Re: Compression

Re: Compression

Re: Compression

Re: Compression

Re: Compression

Re: Compression

Re: Compression

Re: Compression

Re: Compression