File format
A file format is a particular way to encode data for storage in a computer file.
Since hard drives store only bits, the computer must have some way of converting information to 0s and 1s. There are different kinds of formats for different kinds of information. However, within any kind of format, there will be some number of competing formats.
Formats are typically represented by an addition ("file extension") of 2 to 4 letters onto the file's name. For example, if a picture is stored using the JPEG format, the file would be \mypicture.jpeg or the like.
Other operating systems such as older versions of Mac OS did not require file extensions, but instead had file types/creator data that was hidden from the user and managed transparently by the operating system. On Microsoft Windows computers, extensions are required for applications to be recognised as executable (and many applications require them to recognise specific data formats). On Unix and Unix-like systems, an extension can be created, however this is optional, and the use of extensions under these systems are seen as a convenience and not a requirement. Under these systems, all files, basically, are seen as data files, directories (which indeed are a special kind of file), or an executable.
Operating system setting determine which program is executed by default on "opening" a file with a particular extension. For example, if a file has extension .htm, the setting determines whether a browser is used to interpret the HTML (and which one) or an editor or text viewer that displays the HTML code.
Many file formats, and probably most well-known file formats, have a published specification document (often with a reference implementation) that describes exactly how the data is to be encoded, and which can be used to determine whether or not a particular program treats a particular file format correctly. There are two kinds of exception to this, however. First, some file format developers view their specification documents as trade secrets, and therefore do not release them to the public. Second, some file format developers never bother to write a specification document; rather, the format is defined only implicitly, through the program(s) that manipulate data in the format.
Note that using file formats without a publicly available specification is usually costly. Learning how the format works will require either A) reverse-engineering it from a reference implementation or B) acquiring the specification document for a fee from the format developers. (Note that the second case, possible only when there is a specification document, typically requires one to sign a non-disclosure agreement.) Both cases require significant time, money, or both. Therefore, as a general rule, file formats with publicly available specifications are supported by a large number of programs, while non-public formats are supported by only a few programs.
Some file formats are designed to store very particular sorts of data; the JPEG format, for example, is designed only to store still images. Other file formats, however, are designed for storage of several different types of data; the GIF format supports storage of both pictures and simple animations, and the AVI format can support many different types of multimedia. A text file stores any text or numbers with a one-to-one correspondence between the bytes and ordinary readable characters such as letters and digits, and some control characters. The extension may be .txt, but also more specific such as .par for a parameter file, .pas for a Pascal program, etc. On the lower level a HTML file is a text file. The "text" is the coding for a webpage, so considered on a higher level the file is a webpage file.
Since files are seen by programs as streams of data, a method is required to mark the format of the file. One way to indicate these metadata is with a file extension. Another is with off-band data if supported by the filesystem. And another is in-band, within the file with an distinctive sequence (often called the magic number).
For example, a GIF file can be recognized by its extension ".gif", by some metadata about type or by its first four bytes "GIF8".
It is sometimes possible to cause a program to read a file encoded in one format as if it were encoded in another format. With a bit of work, for example, a music playing program can be used to play a (specially modified) Microsoft Word document as if it were a song. The result does not sound very musical, however. This is so because a sensible arrangement of bits in one format is almost always nonsensical in another.
It should be noted that it is very difficult to make a principled distinction between a file format and a programming language, or between a "normal program" and a programming language interpreter. A programming language can be seen as a file format for storing algorithms, while even a simple image file viewer can be seen as an "interpreter" for, say, the GIF "language."
The most useful part of intellectual property law for protecting ownership of a file format appears to be patent law. Although patents for file formats are not permitted, some formats require encoding data with patented algorithms. For example, the GIF file format requires use of a patented algorithm; at first, the patent owner did not collect fees for use of the algorithm, then started to collect fees. This has resulted in a significant decrease in the use of GIFs.
See also: list of file formats, graphics file format, audio file format, video file format, object file format






