2013. október 7., hétfő

Big JPEG Compression and Statistic experiment

A short while ago I had to think about how to make images smaller, without loosing (any more) information. For PNGs it is easy: I am a long time user of IrfanView, which includes the PNGOUT plugin to optimize compression of the image information.
Unfortunately two color images (used frequently in document scanning, because they use 1 bit per pixel, 1bpp) are usually stored in TIFF, Group 4 compressed images and in the past I have found these to be smaller than the same image saved in 1bpp PNG from IrfanView using PNGOUT. I have found no tools to poke around with this compression, also, IrfanView tends to save larger TIFF G4 images, than what come out of scanning.

This made me tudn my attention to Grayscale (8bpp) images,  which are often saves in JPEG, even though that is a lossy compression (meaning the decompressed image will not be equivalent to what came out of the scanner, just will look like that).
It is very easy to make a JPEG smaller by lowering its quality during compression, but it will just increase the difference from the original image thus decrease the similarity to that. We want to keep as much information of the original image as we can, so I was only looking into improving the compression while keeping the encoded information intact.

This made me do a large scale experiment on all my fifty two thousand JPEG images: I compressed them with all the different methods I could find, and recorded the results. In a series of posts I will review these result, because I have found several interesting angles to them:
  1. How to check the integrity of all your photos in an automated way: easily done with a following these instructions! Basically, on linux install jpeginfo, then, in the designated folder perform: (let me know if you need windows instructions!)

    find -iname "*.jpg" -print0 | xargs -0 jpeginfo -c | grep -e WARNING -e ERROR
  2. How much space can you save by re-compressing your photos, _without_ losing a single pixel of information (nor the EXIF data)?
  3. Is it really worth all the extra hassle to create ultra-progressive JPEGs?
  4. What is an arithmetic compressed JPEG, how much size it gains and which applications can read it?
  5. Which cameras / camera makes have the best and the worst JPEG compression engines?
  6. How have the megapixels evolved along the years?
  7. How do re-compression gains change compared to image size (megapixels)?
  8. How to automate all these, and which problems need to be solved for this (e.g. how to create progressive, arithmetic coded JPEGs)? actual scripts and binaries where needed!
  9. Shall I look at my 30GB+ MJPEG movies as well? :)
For starters, let's see what cameras have produced my 51719 JPEGs: (yes, several images were taken with mobile phones)

Apple 562
BlackBerry 135
Canon 5928
Casio 214
Fuji 17552
HP 112
Kodak 234
Minolta 244
Motorola 2
Nikon 9783
Nokia 1430
Olympus 13813
Panasonic 698
Pentax 5
Samsung 236
Sony 276
#N/A 495

And a sneak peak into the total possible filesize / storage gains, and for you to have a quick answer:
  1. Originals: 135.6GB
  2. Re-compressing only the Huffman coding: 130.21GB, 3.78% total gain
  3. Re-compressing the Huffman coding, and storing a progressive JPEG: 123.67GB, 8.79% total gain, 4,82% gain over just Huffman optimization!
  4. Re-compressing the Huffman coding, and optimizing progressive JPEG storage (a.k.a. ultra-progressive JPEG):122.53GB, 9.64% total gain, only 0.85% gain over progressive JPEG
  5. Re-compressing using arithmetic compression instead of Huffman: 116.88GB, 13.81% total gain, 4.17% gain over the best possible Huffman compression!
  6. Re-compressing using arithmetic compression and storing a progressive JPEG: 114.72GB, 15.4% total gain, 1,59% gain over non-progressive arithmetic JPEG
  7. Re-compressing using arithmetic compression and and optimizing progressive JPEG storage (a.k.a. ultra-progressive arithmetic JPEG): 113.94GB, 15.97% total gain, only 0.58% gain over progressive arithmetic JPEG
Breakdown for each camera brand in the next post!

Questions welcome, as always! :)

Rendszeres olvasók