Filename extension
A
filename extension is an extra set of (usually)
alphanumeric characters that is appended to the end of a
filename to allow computer users (as well as various pieces of
software on the computer system) to quickly determine the type of data stored in the file. It is one of several popular methods for distinguishing between
file formats.
With the advent of the
GUI, the issue of file management and interface behavior arose. The Windows platform allowed multiple applications to be associated with a given file type, and different file "actions" defined for opening, editing, viewing, and so-forth by means of a
context menu.
File managers such as
Windows Explorer can have applications assigned for almost every file name extension. For example, a
text editor for .txt, a
word processor for .doc or .odt, a
web browser for .htm or .html,
PDF viewer or editor for .pdf, a
graphics program for .png, .gif or .jpg, a
spreadsheet program for .xls or .ods, etc.
Under Microsoft's operating systems
DOS and
Windows, some extensions, including .exe, .com, .bat, and .cmd, indicate that a file is an
executable.This is different from
Unix-like operating systems, where file name extensions are voluntary for executables, and instead
permissions are used to decide whether a file is executable.
Filename extensions have been in use for decades, but they have gained common usage because the
file systems included with DOS and Windows had severe limitations on filenames for many years, which strongly encouraged the use of filename extensions. Filename extensions can be considered as a type of
metadata, though one of the most visible pieces of such information on modern computer systems.
Mac OS disposed of filename extensions entirely, instead using a file
type code to identify the file format. Additionally, a
creator code was specified to determine which application would be launched when the file's
icon was
double-clicked.
Mac OS X, however, uses filename extensions as a consequence of being derived from the Unix-like
NEXTSTEP, which didn't have type or creator code support in its file system.
Filename extensions were used in
Digital Equipment Corporation (DEC) operating systems (for example,
TOPS-10,
OS/8 and
RT-11).
CP/M adopted the convention and
MS-DOS, as a re-implementation of CP/M, did so as well.
The DEC operating systems internally split the filename into a "base name" and a filename extension, with the "base name" limited to five to eight characters and the extension limited to two or three characters; when a filename/filename extension combination was typed in commands, a
period (.) was placed between the filename and filename extension. CP/M worked the same way; the filename was limited to eight characters and the filename extension was limited to three characters, with a period between them. Early versions of the
FAT filesystem used in MS-DOS and
Microsoft Windows imposed the same limitations. This is sometimes referred to as the
"8.3" convention, and since the word
filename is eight letters long and
ext is a reasonable abbreviation for extension, it can be generalized as:
FILENAME.EXTWhen doing a file listing, the base name and extension would be separated by spaces, much like this:
Volume in drive A: is LINUX BOOT Volume Serial Number is 2410-07EF
Directory for A:\
LDLINUX SYS 5480 1999-04-19 23:24 VMLINUZ 530921 1999-04-19 23:24
BOOT MSG 559 1999-04-19 23:24
EXPERT MSG 668 1999-04-19 23:24
GENERAL MSG 986 1999-04-19 23:24
KICKIT MSG 979 1999-04-19 23:24
PARAM MSG 875 1999-04-19 23:24
RESCUE MSG 1020 1999-04-19 23:24
SYSLINUX CFG 420 1999-04-19 23:24
INITRD IMG 878502 1999-04-19 23:24
10 files 1,420,410 bytes
35,840 bytes freeThis use of spaces often led to confusion with novice DOS users, who thought of the "." as part of the file's identifier, rather than merely a convention for separating the two components of that identifier.
The filename extension was originally used to easily determine the file's generic type. The need to condense a file's type into three characters frequently led to inscrutable extensions. Examples include using
.GFX for
graphics files,
.TXT for
plain text, and
.MUS for
music. However, because many different software programs have been made that all handle these data types (and others) in a variety of ways, filename extensions started to become closely associated with certain products—even specific product versions. For example, early
WordStar files used
.WS or
.WSn, where
n was the program's version number. Also, filename extensions began to conflict between separate files. One example is
.rpm, used by both the
RPM Package Manager and
RealPlayer (for
Real
Player
Media files); another being
.qif shared by both
Quicken Information
Files (financial
ledgers) and
QuickTime Image
Format (pictures).
As time went on, hundreds of different extensions came into use, as software developers invented more and more file formats. This led to reference manuals being published, devoted entirely to listing the extensions and the type (or types) of data that might be found in files so named. These issues led to the need for alternative systems with significantly lower chances of conflicts.
Some other operating systems, such as
Multics and
Unix, that used filename extensions generally had much more liberal standards for filenames. Many allowed full filename lengths of 14 or more characters, and maximum name lengths up to 255 were not uncommon. The file systems on those operating systems stored the file name as a single string, not split into base name and extension components, with the '.' being just another character allowed in file names. Thus, those systems generally allowed for variable-length filename extensions, and also tended to allow more than one dot. Some components of Multics and Unix, and applications running on them, used extensions, in some cases, to indicate file types, but they didn't use them as much - for example, programs and ordinary text files had no extensions in their names.
The
High Performance File System (HPFS), used in Microsoft and
IBM's
OS/2 also supported long file names, and didn't divide the file name into a name and an extension. However, the convention of using extensions continued, even though HPFS supported extended attributes for files, allowing a file's type to be stored with the file as an extended attribute.
In addition, Microsoft's
Windows NT's native file system,
NTFS, supported long file names and didn't divide the file name into a name and an extension, and, again, the convention of using extensions continued, for compatibility with existing versions of Windows.
Eventually,
Microsoft introduced support for long file names, and removed the 8.3 name/extension split in file names, in an extended version of the commonly used
FAT file system called VFAT. VFAT first appeared in
Windows NT 3.5 and
Windows 95. The internal implementation of long file names in VFAT is largely considered to be an ugly
kludge, but it removed the important length restriction, and allowed files to have a mix of
upper case and
lower case letters, on machines that would not run
Windows NT well. However, the use of three character extensions under Windows has continued, originally for backward compatibility with older versions of Windows and now by habit, along with the problems it creates.
As the
Internet age arrived, it was possible to discern who was using Windows systems to edit their web pages versus who used
Macintosh or Unix computers, since the Windows users were generally restricted to ending their web page filenames in
.HTM (instead of
.html). This also became a problem with programmers experimenting with the
Java programming language, since it
required source code files to have the four-letter extension
.java and
compiled object code output files to have the five-letter
.class extension.
Depending on the settings of the shell/file browser the file extension may not be shown. Malicious users who spread a
computer virus or
computer worm may use a file name like
LOVE-LETTER-FOR-YOU.TXT.vbs which then shows up as
LOVE-LETTER-FOR-YOU.TXT if the user has file extensions disabled (which is the default behavior of Windows Explorer). Therefore, to a user who has file extensions hidden, this may look like a harmless text file rather than a potentially dangerous computer program written in
VBScript.
Later Windows versions (starting with
Windows XP Service Pack 2 and
Windows Server 2003) include a customizable database of file types that could be considered dangerous in certain
zones (including, but not limited to,
downloads from the
WWW and e-mail attachments), that applications can query, and standardize a common
API to invoke
antivirus programs. These mechanisms are meant to replace the often inconsistent, conflicting or weak mechanisms that existing applications already have in place, hopefully spelling death for nonsense such as certain antivirus software
blacklisting scripts as intrinsically dangerous - even more so, in fact, than native executables. The latter approach is actually a cover-up to hide a well-known weakness of blacklist-based (as opposed to
heuristic) antivirus software:
malware can evade detection by simply "shifting shape" into a semantically equivalent form, becoming different enough from what the antivirus expects to stay undetected. This technique, usually called
polymorphism, is a lot easier and more effective with scripting languages. In short, most antivirus software can only block
known malware, making them useless against custom (or merely yet unknown) malware.
In network contexts, files are regarded as
streams of bits and do not have filenames or filename extensions.
In the
internet protocol suite the information about a certain
type relating to a certain bitstream is encoded in the
MIME Content-type of the stream, represented by a row of text in a block of text preceding the stream, such as:
Content-type: text/plain
BeOS, whose
BFS file system supports extended attributes, would tag a file with its MIME Content-type as an extended attribute. The
KDE and
GNOME desktop environments associate a MIME Content-type with a file by examining the filename extension and examining the contents of the file, in the fashion of the
file command, as a
heuristic. They choose the application to launch when a file is opened based on the MIME Content-type, reducing the dependency on filename extensions.
*
List of file formats*
File format*
File (Unix)*
Metadata*
Type code*
Creator code*
* Listings of common filename extensions:
**
File Extension Archives**
File extensions database**
dotwhat**
Exhaustive list / related applications**
FILExt**
FileInfo.net**
Wotsit's Format**
Saugus.net's Filename Extensions Glossary**
PRONOM technical registry*
Microsoft Application Search (Add
&ext= followed by an extension to search for that extension)
*
Online TrID File Identifier*
DROID automatic format identification tool