OSDev.org

Posted: **Sun Aug 08, 2004 5:34 pm**

ok had to move this out of the "call gates" thread, otherwise eric would kill me ;)

so candy, here you go...

Candy wrote: If you do make one, please do a jpeg stream... gotta learn decoding them :)

It's 1-bit btw, with 0=white, 1=black, 2=clearcode, 3=EOI. Thus it starts with 3-bit (5x), then 4-bit (8x), then 5-bit... etc :)

Think I've padded a bit there...

seems you're using some screwed lzw algorithm - I haven't seen this with variable bitsize yet and if I start decoding the first 5 with 3bits per code I get something strange: dictionary entry 4 is 0,0 and in the next decoding step entry 5 will be created with 0,0 too - this can't be.. hmm.

Got some hints to default quantization tables for jpeg?

the quantization table should be included in the jpeg standard (otherwise nobody could implement it ;)

And preferably a nice calculation method to do the DCT?

the one-dimensional DCT works as follows:
all there's done is take 8 base vectors x[sub]0[/sub],..x[sub]7[/sub] from |R[sup]8[/sup], in each vector you put the samples of a cosine with frequencies ranging from 0 Hz to 3.5 Hz (center align the cosines) for the vector 0 to 7 respectively.
then take your sample data you want to transform, put that in another vector z from |R[sup]8[/sup].
then just do 8 dot products:
<z,x[sub]0[/sub]>
<z,x[sub]1[/sub]>
...
<z,x[sub]7[/sub]>
and you get the values for your transformed vector.
this is nothing more than a simple base transform. note that if you write down the base transform matrix, this matrix is orthogonal, so the inverse DCT is nothing else than multplying with the transpose of this matrix :)
If I look at this mathematical stuff closer, it also looks like a simple correlation (convolution stuff) to me, but I haven't read much on this topic yet..

well anyway I just made up a quickie (I was doing stuff like the following all day, so..):

8 functions which make a bitmap:

f1(c,b,a) = f8(c,b,a) = !a&&b || a&&!c || !b&&c
f2(c,b,a) = f7(c,b,a) = !a&&!b&&!c || a&&b&&c
f3(c,b,a) = f6(c,b,a) = !a&&!b || !a&&!c || a&&b&&c
f4(c,b,a) = f5(c,b,a) = !a&&!b || a&&c

each function calculates the bits for a column of the bitmap - just supply binary increasing arguments, e.g. f1(false,false,false) f1(false,false,true) f1(false,true,false) f1(false,true,true), f1(true,false,false),...
(yes the bitmap has a height of 8 then ;)

[me=mpp.eox3]listening to "aiboforcen - not unique" says now good night :)[/me]

Posted: **Mon Aug 09, 2004 2:22 am**

579102

01 0101 111 001 000 100 000 010

ok, the first 6 were 3-bit because it started with a clear code (remember, the bits are inverted from low and on). The variable byte size is exactly the same as in GIF, but I might have screwed up the first few bits. List seen here is CC - 0 - 4 - 0 - 1 - 7 - 5 ...

this ends up doing the table
0- 0
1- 1
2- cc
3- eoi
4- 00
5- 000 (code 4 with a 0 padding)
6- 01 (the stuff added at the 1)
7- 11 (the one with the exact same padded)
8- 10 (the last one with the first of the 5-code padded

resulting in:

0 4 0 1 7 5

0 00 0 1 11 000 -> 0000111000 ...

which does match what I started with... should've saved my handwork :S.

GIF starts with CC usually btw.

Posted: **Mon Aug 09, 2004 6:31 am**

hehe nice bitmap, it happened that I just encoded the same thing yesterday with those functions... *g
well what else should programmers encode? ;)

you should mention that you used gif-lzw because that's a bit different than standard lzw, without your hint I wouldn't have been able to decode it. When I interpret the gif text correctly I think the cc should start at 100, and eoi=101 (for bw images).

was fun :)

Posted: **Mon Aug 09, 2004 7:15 am**

mpp.eox3 wrote: it happened that I just encoded the same thing yesterday with those functions... *g
well what else should programmers encode?

well... a bitmap of a rendering of a #define? Anyways...

you should mention that you used gif-lzw because that's a bit different than standard lzw, without your hint I wouldn't have been able to decode it. When I interpret the gif text correctly I think the cc should start at 100, and eoi=101 (for bw images).

Ok, will tell. The GIF text specified that the first N (N=2^1=2) samples equal themselves, with the two next ones (2 and 3) being the clear code and the EOI code respectively. Got a module of it standing by for testing, and it works on at least n (N=2^n) of 3, so afaik it should work.

was fun

Thanks

. At least now I know somebody decoded it, so I didn't make any mistakes in the encoded portion.

You could try to make a bz2 compression of a text string? Is also a lot of fun

Posted: **Mon Aug 09, 2004 11:10 am**

Candy wrote: Ok, will tell. The GIF text specified that the first N (N=2^1=2) samples equal themselves, with the two next ones (2 and 3) being the clear code and the EOI code respectively. Got a module of it standing by for testing, and it works on at least n (N=2^n) of 3, so afaik it should work.

The 3 of your module is because the gif spec says that B/W images should be started with a code size of 2, which means the CC is 2^2 = 100, and eoi=101 - they say because of some algorithmic "problems" (?) - but strangely it worked anyway with your encoding.

Thanks :). At least now I know somebody decoded it, so I didn't make any mistakes in the encoded portion.

You could try to make a bz2 compression of a text string? Is also a lot of fun :D

That's with burrowswheeler transform and movetofront encoding right? I think that will take some time to refresh my brain on that - BWT was weird %)

Got to do some kernel stuff now... (designing/implementing new memory manager, I think I almost killed my chicken&egg problem that delayed me from coding.. yippieee :)

----BEGIN CODER BLOCK----
Version: WPP386 11.0
while(242)*((char*)0xa0000+rand()%(320*200))^=rand();
----END CODER BLOCK

Posted: **Mon Aug 09, 2004 12:26 pm**

mpp.eox3 wrote: The 3 of your module is because the gif spec says that B/W images should be started with a code size of 2, which means the CC is 2^2 = 100, and eoi=101 - they say because of some algorithmic "problems" (?) - but strangely it worked anyway with your encoding.

I know what problem they refer to, the first X-1 codes of any bit length are that length plus 1. On a length of 2 codes you get 1 times 2 bits, then 4 times 3 bits (yes, I did the first two in 3 bits too, my fault. The 2nd and 5th bit in the first byte are always 0) etc. On the others the first amount is larger, so it's more useful.

BTW, where in the standard is this? Never recall having read that anywhere...

That's with burrowswheeler transform and movetofront encoding right? I think that will take some time to refresh my brain on that - BWT was weird %)

It's not so hard. Just decoding in less than 10 times the message size had me tricked for a while. Explanation for that part, after the first decode step (copy array, sort array) you have the indexes in a different order. index A in first points to B in second. Index B in first points to C in second. index C in first ...... continue doing that, and you have the decoded text. This still feels like magic to me, even though I can explain it

Got to do some kernel stuff now... (designing/implementing new memory manager, I think I almost killed my chicken&egg problem that delayed me from coding.. yippieee

Chicken & egg is cool, had that with free-page-module (which used threads to clean up the table & copy back&forth) and the threading module (which kind of needed free pages).

I still don't completely understand this JPG, but I do think I got the idea... Trying to put them in a matrix on my Casio GFX and multiply with a sample image (kind of a fade effect) didn't really result in anything usable, the first row was full of ~600 numbers and the rest was more or less empty. Any hints / example calculations / links to the quantization table?

Posted: **Mon Aug 09, 2004 2:02 pm**

Candy wrote: BTW, where in the standard is this? Never recall having read that anywhere...

damn just deleted the files.. well it was not in the compression appendix, it was more somewhere above.

I still don't completely understand this JPG, but I do think I got the idea... Trying to put them in a matrix on my Casio GFX and multiply with a sample image (kind of a fade effect) didn't really result in anything usable, the first row was full of ~600 numbers and the rest was more or less empty. Any hints / example calculations / links to the quantization table?

Don't get stuck on the quantization table - it's only a very small part in the magic. All the quantization table does is reduce certain frequencies in the transform, each by a specific amount. By which factor this happens has been tested in a visual test with several test persons - they had to tell what still looked good and what didn't.
Also I don't think I'd like to implement a JPEG decoder because JPEG is designed of several building blocks a customer can choose of to encode their images. So implementation should be much work - e.g. you can use huffman, arithmetic coding, lzw or whatever to compress the final bitstream.
Hmm just searched for the exact file format, but they want 110 CHF (swiss currency) for the ISO spec. PDF file.

Posted: **Mon Aug 09, 2004 2:25 pm**

hmm I'm just trying to write an math example here but this typing stuff is somehow useless here.. no mathematical symbols etc.. maybe I could do that with some ascii art but there's not much space (width) here

Posted: **Mon Aug 09, 2004 2:41 pm**

mpp.eox3 wrote: hmm I'm just trying to write an math example here but this typing stuff is somehow useless here.. no mathematical symbols etc.. maybe I could do that with some ascii art but there's not much space (width) here

uhm... that's exactly what I don't need, another mathematical explanation. Can you try to do it in developers language? More of a this-works-because-of-this instead of authentic proofs with big formulae. I do prefer examples.

Point with JPEG is, I know what to do (including the arithmetic etc etc etc), and I know JFIF which limits JPEG to a usable limit, so I know what parts to take to decode the JPEG files (which are actually JFIF). Point is, I have no clue where to start with DCT or IDCT. If I have those figured out, I can write (and manually decode/encode) jpegs.

The quantization table is the only coders part I don't have already. It makes my jigsaw come together, so I really look forward to seeing it.

Posted: **Mon Aug 09, 2004 3:02 pm**

Ah well ok.. You know I turn on the "ignore font settings" in my internet explorer - I've got 1600x1200 resolution here and there are damn many stupid idiot "webdesigners" out there who find it cool to set their fonts to 8pixels on webpages.. Just imagine how good I can read that..
I turned the fonts now back so I think I've got enough of space...

The formula for the DCT as used in JPEG is as follows:

F(u,v) = 1/4 C[sub]u[/sub]C[sub]v[/sub] SUM[sub]x=0[/sub][sup]7[/sup] SUM[sub]y=0[/sub][sup]7[/sup] f(x,y)*cos((2x+1)uPI/16)*cos((2x+1)vPI/16)

where C[sub]u[/sub] = always 1, except for u=0: C[sub]0[/sub] = 1 / 2^(1/2)
(this is a normalization value for the DC part)
C[sub]v[/sub] is the same. f(x,y) is the pixel in the 8x8 image block. F(u,v) is the amount of frequencies u and v

if you look at this, you'll see that the two cos terms are independent, so you can pull that stuff apart:

F(u,v) = 1/2 C[sub]u[/sub] SUM[sub]x=0[/sub][sup]7[/sup] cos((2x+1)uPI/16)
* 1/2 C[sub]v[/sub] SUM[sub]y=0[/sub][sup]7[/sup] cos((2x+1)vPI/16)
* f(x,y)

As you can see now, the 2d DCT consists in reality only of two 1dimensional DCTransforms.

[ok will post this part so if something crashes here, I won't loose all the crap I've already typed here...]

Posted: **Mon Aug 09, 2004 3:09 pm**

hmm ok if you don't like maths.. I found some quantization tables for you:

Luminance quantization table:

16 11 10 16 24 40 51 61
12 12 14 19 26 58 60 55
14 13 16 24 40 57 69 56
14 17 22 29 51 87 80 62
18 22 37 56 68 109 103 77
24 35 55 64 81 104 113 92
49 64 78 87 103 121 120 101
72 92 95 98 112 100 103 99

Chrominance quantization table:

17 18 24 47 99 99 99 99
18 21 26 66 99 99 99 99
24 26 56 99 99 99 99 99
47 66 99 99 99 99 99 99
99 99 99 99 99 99 99 99
99 99 99 99 99 99 99 99
99 99 99 99 99 99 99 99
99 99 99 99 99 99 99 99

Posted: **Tue Aug 10, 2004 12:38 am**

mpp.eox3 wrote: The formula for the DCT as used in JPEG is as follows:

F(u,v) = 1/4 C[sub]u[/sub]C[sub]v[/sub] SUM[sub]x=0[/sub][sup]7[/sup] SUM[sub]y=0[/sub][sup]7[/sup] f(x,y)*cos((2x+1)uPI/16)*cos((2x+1)vPI/16)

where C[sub]u[/sub] = always 1, except for u=0: C[sub]0[/sub] = 1 / 2^(1/2)
(this is a normalization value for the DC part)
C[sub]v[/sub] is the same. f(x,y) is the pixel in the 8x8 image block. F(u,v) is the amount of frequencies u and v

if you look at this, you'll see that the two cos terms are independent, so you can pull that stuff apart:

F(u,v) = 1/2 C[sub]u[/sub] SUM[sub]x=0[/sub][sup]7[/sup] cos((2x+1)uPI/16)
* 1/2 C[sub]v[/sub] SUM[sub]y=0[/sub][sup]7[/sup] cos((2x+1)vPI/16)
* f(x,y)

As you can see now, the 2d DCT consists in reality only of two 1dimensional DCTransforms.

Shouldn't the second cosine be over ((2y+1)vPI/16) ? If not, how are they independant?

Posted: **Tue Aug 10, 2004 12:50 am**

Candy wrote: Shouldn't the second cosine be over ((2y+1)vPI/16) ? If not, how are they independant?

um yes that's a typo.. I copy-pasted the same because I was too lazy to write that stuff again with all those sub/italic tags and then simply forgot to change the x to y ;)

Seems you're now interrested in the maths ;) If you want more maths I'll continue..

Posted: **Tue Aug 10, 2004 1:56 am**

mpp.eox3 wrote: um yes that's a typo.. I copy-pasted the same because I was too lazy to write that stuff again with all those sub/italic tags and then simply forgot to change the x to y

Thought so

Seems you're now interrested in the maths If you want more maths I'll continue..

This far it's workable math. That's ok with me

but, the section here I don't get is the vPI and the uPI. What are they?

For a few posts ahead, can this be extended to 3d? And would that be on scalable (vector) 3d graphics or on pixelated / voxel / whatever stuff? My guess is voxel-stuff...

Posted: **Tue Aug 10, 2004 3:13 pm**

Candy wrote: but, the section here I don't get is the vPI and the uPI. What are they?

v, u are the parameters from F(u,v). these address a value in a 2dim array.
PI is an important constant, approximately 355/113 (type that in your calculator).....

For a few posts ahead, can this be extended to 3d? And would that be on scalable (vector) 3d graphics or on pixelated / voxel / whatever stuff? My guess is voxel-stuff...

I don't see why it shouldn't, so you could transform a cube of voxels.

Ok so what was I going to do next... The 2d DCT is broken up into 2 1d ones, so lets continue with a 1d DCT.

So now the 1d DCT looks like:

G(u) = 1/2 C[sub]u[/sub] SUM[sub]x=0[/sub][sup]7[/sup] f(x)*cos((2x+1)uPI/16)

Ok that's it, but why does it work?
Now let's take an ascii art look at

h(x,u) = cos((2x+1)uPI/16)

[tt]
h(x,0):
^
************************************
|
|
0----1----2----3----4----5----6----7->
|
|
|

h(x,1):
^
********
| ******
| ***
0----1----2----3-**-4----5----6----7->
| ***
| ******
| ********

h(x,2):
^
**** ****
| *** ***
| ** **
0----1---*2----3----4----5*---6----7->
| ** **
| *** ***
| ******

h(x,3):
^
|** *****
| ** ** **
| * * *
0----1----2----3--*-4----5----6----7->
| * * *
| ** ** **
| ***** **
[/tt]

OSDev.org

OT: hey candy [moved from "call gates"]

OT: hey candy [moved from "call gates"]

Re:OT: hey candy [moved from "call gates"]

Re:OT: hey candy [moved from "call gates"]

Re:OT: hey candy [moved from "call gates"]

Re:OT: hey candy [moved from "call gates"]

Re:OT: hey candy [moved from "call gates"]

Re:OT: hey candy [moved from "call gates"]

Re:OT: hey candy [moved from "call gates"]

Re:OT: hey candy [moved from "call gates"]

Re:OT: hey candy [moved from "call gates"]

Re:OT: hey candy [moved from "call gates"]

Re:OT: hey candy [moved from "call gates"]

Re:OT: hey candy [moved from "call gates"]

Re:OT: hey candy [moved from "call gates"]

Re:OT: hey candy [moved from "call gates"]