Tuesday, 2 October 2012

ZIP FILE

General Format of a ZIP file
  Files stored in arbitrary order.  Large zipfiles can span multiple
  diskette media.
  aaOverall zipfile format:
    [local file header + file data + data_descriptor] . . .
    [central directory] end of central directory record

  A.  Local file header:
 local file header signature     4 bytes  (0x04034b50)
 version needed to extract       2 bytes
 general purpose bit flag        2 bytes
 compression method              2 bytes
 last mod file time              2 bytes
 last mod file date              2 bytes
 crc-32                          4 bytes
 compressed size                 4 bytes
 uncompressed size               4 bytes
 filename length                 2 bytes
 extra field length              2 bytes
 filename (variable size)
 extra field (variable size)

  B.  Data descriptor:
 crc-32                          4 bytes
 compressed size                 4 bytes
 uncompressed size               4 bytes
      This descriptor exists only if bit 3 of the general
      purpose bit flag is set (see below).It is byte aligned
      and immediately follows the last byte of compressed data.
      This descriptor is used only when it was not possible to
      seek in the output zip file, e.g., when the output zip file
      was standard output or a non seekable device.

  C.  Central directory structure:
      [file header] . . .  end of central dir record
      File header:
 central file header signature   4 bytes  (0x02014b50)
 version made by                 2 bytes
 version needed to extract       2 bytes
 general purpose bit flag        2 bytes
 compression method              2 bytes
 last mod file time              2 bytes
 last mod file date              2 bytes
 crc-32                          4 bytes
 compressed size                 4 bytes
 uncompressed size               4 bytes
 filename length                 2 bytes
 extra field length              2 bytes
 file comment length             2 bytes
 disk number start               2 bytes
 internal file attributes        2 bytes
 external file attributes        4 bytes
 relative offset of local header 4 bytes

 filename (variable size)
 extra field (variable size)
 file comment (variable size)
      End of central dir record:
 end of central dir signature    4 bytes  (0x06054b50)
 number of this disk             2 bytes
 number of the disk with the
 start of the central directory  2 bytes
 total number of entries in
 the central dir on this disk    2 bytes
 total number of entries in
 the central dir                 2 bytes
 size of the central directory   4 bytes
 offset of start of central
 directory with respect to
 the starting disk number        4 bytes
 zipfile comment length          2 bytes
 zipfile comment (variable size)

  D.  Explanation of fields:
      version made by (2 bytes)
   The upper byte indicates the host system (OS) for the
   file.  Software can use this information to determine
   the line record format for text files etc.  The current
   mappings are:
   0 - MS-DOS and OS/2 (F.A.T. file systems)
   1 - Amiga                     2 - VAX/VMS
   3 - *nix                      4 - VM/CMS
   5 - Atari ST                  6 - OS/2 H.P.F.S.
   7 - Macintosh                 8 - Z-System
   9 - CP/M                      10 thru 255 - unused

   The lower byte indicates the version number of the
   software used to encode the file.  The value/10
   indicates the major version number, and the value
   mod 10 is the minor version number.
      version needed to extract (2 bytes)
   The minimum software version needed to extract the
   file, mapped as above.
      general purpose bit flag: (2 bytes)
   bit 0: If set, indicates that the file is encrypted.
   (For Method 6 - Imploding)
   bit 1: If the compression method used was type 6,
   Imploding, then this bit, if set, indicates
   an 8K sliding dictionary was used.  If clear,
   then a 4K sliding dictionary was used.
   bit 2: If the compression method used was type 6,
   Imploding, then this bit, if set, indicates
   an 3 Shannon-Fano trees were used to encode the
   sliding dictionary output.  If clear, then 2
   Shannon-Fano trees were used.
   (For Method 8 - Deflating)
   bit 2  bit 1
     0      0    Normal (-en) compression option was used.
     0      1    Maximum (-ex) compression option was used.
     1      0    Fast (-ef) compression option was used.
     1      1    Super Fast (-es) compression option was used.

   Note:  Bits 1 and 2 are undefined if the compression
   method is any other.
   (For method 8)
   bit 3: If this bit is set, the fields crc-32, compressed size
   and uncompressed size are set to zero in the local
   header.  The correct values are put in the data descriptor
   immediately following the compressed data.
   The upper three bits are reserved and used internally
   by the software when processing the zipfile.  The
   remaining bits are unused.
      compression method: (2 bytes)
   (see accompanying documentation for algorithm
   descriptions)
   0 - The file is stored (no compression)
   1 - The file is Shrunk
   2 - The file is Reduced with compression factor 1
   3 - The file is Reduced with compression factor 2
   4 - The file is Reduced with compression factor 3
   5 - The file is Reduced with compression factor 4
   6 - The file is Imploded
   7 - Reserved for Tokenizing compression algorithm
   8 - The file is Deflated

      date and time fields: (2 bytes each)

   The date and time are encoded in standard MS-DOS format.
   If input came from standard input, the date and time are
   those at which compression was started for this data.

      CRC-32: (4 bytes)

   The CRC-32 algorithm was generously contributed by
   David Schwaderer and can be found in his excellent
   book "C Programmers Guide to NetBIOS" published by
   Howard W. Sams & Co. Inc.  The 'magic number' for
   the CRC is 0xdebb20e3.  The proper CRC pre and post
   conditioning is used, meaning that the CRC register
   is pre-conditioned with all ones (a starting value
   of 0xffffffff) and the value is post-conditioned by
   taking the one's complement of the CRC residual.
   If bit 3 of the general purpose flag is set, this
   field is set to zero in the local header and the correct
   value is put in the data descriptor and in the central
   directory.

      compressed size: (4 bytes)
      uncompressed size: (4 bytes)

   The size of the file compressed and uncompressed,
   respectively.  If bit 3 of the general purpose bit flag
   is set, these fields are set to zero in the local header
   and the correct values are put in the data descriptor and
   in the central directory.

      filename length: (2 bytes)
      extra field length: (2 bytes)
      file comment length: (2 bytes)

   The length of the filename, extra field, and comment
   fields respectively.  The combined length of any
   directory record and these three fields should not
   generally exceed 65,535 bytes.  If input came from standard
   input, the filename length is set to zero.


      disk number start: (2 bytes)

   The number of the disk on which this file begins.

      internal file attributes: (2 bytes)
   The lowest bit of this field indicates, if set, that
   the file is apparently an ASCII or text file.  If not
   set, that the file apparently contains binary data.
   The remaining bits are unused in version 1.0.

      external file attributes: (4 bytes)

   The mapping of the external attributes is
   host-system dependent (see 'version made by').  For
   MS-DOS, the low order byte is the MS-DOS directory
   attribute byte.  If input came from standard input, this
   field is set to zero.

      relative offset of local header: (4 bytes)

   This is the offset from the start of the first disk on
   which this file appears, to where the local header should
   be found.

      filename: (Variable)

   The name of the file, with optional relative path.
   The path stored should not contain a drive or
   device letter, or a leading slash.  All slashes
   should be forward slashes '/' as opposed to
   backwards slashes '\' for compatibility with Amiga
   and Unix file systems etc.  If input came from standard
   input, there is no filename field.

      extra field: (Variable)

   This is for future expansion.  If additional information
   needs to be stored in the future, it should be stored
   here.  Earlier versions of the software can then safely
   skip this file, and find the next file or header.  This
   field will be 0 length in version 1.0.

   In order to allow different programs and different types
   of information to be stored in the 'extra' field in .ZIP
   files, the following structure should be used for all
   programs storing data in this field:

   header1+data1 + header2+data2 . . .

   Each header should consist of:

     Header ID - 2 bytes
     Data Size - 2 bytes

   Note: all fields stored in Intel low-byte/high-byte order.

   The Header ID field indicates the type of data that is in
   the following data block.

   Header ID's of 0 thru 31 are reserved for use by PKWARE.
   The remaining ID's can be used by third party vendors for
   proprietary usage.

   The current Header ID mappings are:

   0x0007        AV Info
   0x0009        OS/2
   0x000c        VAX/VMS

   The Data Size field indicates the size of the following
   data block. Programs can use this value to skip to the
   next header block, passing over any data blocks that are
   not of interest.

   Note: As stated above, the size of the entire .ZIP file
  header, including the filename, comment, and extra
  field should not exceed 64K in size.

   In case two different programs should appropriate the same
   Header ID value, it is strongly recommended that each
   program place a unique signature of at least two bytes in
   size (and preferably 4 bytes or bigger) at the start of
   each data area.  Every program should verify that its
   unique signature is present, in addition to the Header ID
   value being correct, before assuming that it is a block of
   known type.

  -VAX/VMS Extra Field:

   The following is the layout of the VAX/VMS attributes "extra"
   block.  (Last Revision 12/17/91)

   Note: all fields stored in Intel low-byte/high-byte order.

   Value         Size            Description
   -----         ----            -----------
  (VMS)   0x000c        Short           Tag for this "extra" block type
   TSize         Short           Size of the total "extra" block
   CRC           Long            32-bit CRC for remainder of the block
   Tag1          Short           VMS attribute tag value #1
   Size1         Short           Size of attribute #1, in bytes
   (var.)        Size1           Attribute #1 data
   .
   .
   .
   TagN          Short           VMS attribute tage value #N
   SizeN         Short           Size of attribute #N, in bytes
   (var.)        SizeN           Attribute #N data

   Rules:

   1. There will be one or more of attributes present, which will
      each be preceded by the above TagX & SizeX values.  These
      values are identical to the ATR$C_XXXX and ATR$S_XXXX constants
      which are defined in ATR.H under VMS C.  Neither of these values
      will ever be zero.

   2. No word alignment or padding is performed.

   3. A well-behaved PKZIP/VMS program should never produce more than
      one sub-block with the same TagX value.  Also, there will never
      be more than one "extra" block of type 0x000c in a particular
      directory record.

      file comment: (Variable)

   The comment for this file.

      number of this disk: (2 bytes)

   The number of this disk, which contains central
   directory end record.

      number of the disk with the start of the central directory: (2 bytes)

   The number of the disk on which the central
   directory starts.

      total number of entries in the central dir on this disk: (2 bytes)

   The number of central directory entries on this disk.

      total number of entries in the central dir: (2 bytes)

   The total number of files in the zipfile.


      size of the central directory: (4 bytes)

   The size (in bytes) of the entire central directory.

      offset of start of central directory with respect to
      the starting disk number:  (4 bytes)

   Offset of the start of the central direcory on the
   disk on which the central directory starts.

      zipfile comment length: (2 bytes)

   The length of the comment for this zipfile.

      zipfile comment: (Variable)

   The comment for this zipfile.


  E.  General notes:
      1)  All fields unless otherwise noted are unsigned and stored
   in Intel low-byte:high-byte, low-word:high-word order.

      2)  String fields are not null terminated, since the
   length is given explicitly.

      3)  Local headers should not span disk boundries.  Also, even
   though the central directory can span disk boundries, no
   single record in the central directory should be split
   across disks.

      4)  The entries in the central directory may not necessarily
   be in the same order that files appear in the zipfile.

No comments:

Post a Comment