File compression

Content


File compression (general)

File compressions are processes with what it is possible to reduce or rebuild (several) files at large lossless.

The following processes are known:

Process Suffix (typical) Command AIX HP-UX SunOS Linux MS-DOS Windows 95 Windows NT Source code
ZIP/PKZIP .zip zip, unzip hera (nur unzip) ? ? unzip zip, unzip, pkzip, pkunzip see DOS, winzip see Win95 ?
GZIP .gz + + + + + gzip.exe gzip.exe gzip.exe  
BZ2 .bz, .bz2 bzip2, bunzip2 - - - bzip2, bunzip2 - - - ?
ARJ .arj arj, unarj - - - unarj arj241.exe arj241.exe arj241.exe -
ARC .arc   - - - - ARCE.COM ARCE.COM ARCE.COM -
LHARC .lha lha124 - - - lha lha213.exe lha213.exe lha213.exe -
RAR .rar rar.exe - - - - RAR.EXE RAR.EXE RAR.EXE -

Without compression but to archive: TAR


Image data compression

Two basically different processes are used to compress image data (point information = Bitmaps):

Lossless compression

In general these processes search for repetitive samples in a line and archive information about them and about the number of their appearance (see example PCX). Generally original data can be reconstructed lossless from the compressed data.

Several known processes/formats :

  • PCX: original format of Paint-Brush, is used just rarely
  • BMP: Microsoft Windows standard format (RunLength-Encoding)
  • GIF: CompuServe format, (LZW; Lempel-Ziv-Welch)
  • PNG: Portable Network Graphics, is deemed to be the successor of GIF but is not animated (license free)
  • Tiff: Tag Image File (Aldus, HP, Microsoft)
The most common format is GIF. Attention: GIF, more precisely LZW, is secured by a patent of UNISYS. In the meantime the UNISYS patent is exhausted that means GIF and also LZW can be used freely.

All mentioned formats can be read and written with GIMP e.g.. BMP can be read and written with the program paint that is provided with MS Windows.

GIMP is a free OpenSource image editing program for all current system software.

Lossy compression

The basis is the assumption that not all information of an image are really n&oum;tig. The lossy processes fulfil essentially an analysis of the original image data in the frequency area, eliminate marginal parts and achieve an ratio of compression between 1:10 .. 1:200. Depending on the ratio of compression in the reconstructed image, especially in larger similar ranges, artefacts appear.

Known processes/formats:

  • JPEG: Compression 1:10 .. 1:70, cosine transformation (most common image format with compression)
  • Wavelet compression: Compression up to 1:200, function area out of Wavelets
  • Fractal compression: Search for self similar structures, so that elimination of redundant data

JPEG files are readable and can be processed more with all common image editing programs.
Look at JPEG Bibliothek when using JPEG in self-written applications.


Video compression

The basis of video compression techniques are in general lossy compression processes. Analyses are made in appropriate frequency /function rooms just like with image compression.
  • Compression of frames: the single frames are compressed and then they are linked together
  • Difference data formation: starting with a reference frame the differences to the following frames are built in the " forward step" and only they are filed into the data stream
In general video data streams can also consist of compressed audio file.
Known processes/formats:
  • Motion JPEG based formats:
    • AVI (.avi)
    • Apple Quicktime (.mov)
  • MPEG is common in 2 video versions at present
    • MPEG-I (.mpg): standard format for video CD (Hint: DVD names only the recording process with two tracks)
    • MPEG-II (.mpg, .mp2): standard format for digital television
    MPEG-III (.mp3): process to compress clean audio files
AVI, MOV and MPG files can be played in Windows with Media-Player with the aid of AVI and MPEG Codecs (multimedia driver) or Apple Quicktime.