We are often asked for recommendations as to which file formats are best-suited to long-term preservation and access. While the Borthwick Institute for Archives places no restrictions on the types of files that we collect, it is important to consider how your choices today will affect whether or not your files will be accessible and understandable in the future. In particular, the file format that you choose can have a significant impact on how easily your files can be shared and used. The following guidance is intended to help you identify file formats suitable for long-term preservation. Generally speaking, these file formats are open-source, widely-used, and are either uncompressed or use lossless compression.
Please note: This information is intended to provide some guidance as potential donors create and manage their files. We do not ask that donors migrate all files to conform with these recommendations before transfer to the archives. Instead, staff at the Borthwick will work with you to make decisions regarding file formats.
Every file format has a set of specifications detailing how it is encoded and how it is interpreted by the appropriate software. In cases where file formats are owned and managed commercially, these specifications may not be openly available to the public. As a result, continued access to proprietary file formats depends on ongoing support from the organisation that develops and manages them. When support for a format ends due to changing priorities at the organisation or because the organisation goes out of business, lack of access to the format specifications increase the risk of that format becoming obsolete. In contrast, organisations or communities committed to maintaining open-source formats make specifications and other related documentation openly available to users. What’s more, these formats are often compatible with multiple software platforms.
The Borthwick does not recommend use of proprietary file formats, as these formats may become difficult or impossible to access in the future. Examples of proprietary file formats include Photoshop's .PSD files and Paint Shop Pro's .PSP files (for still images) and RealMedia's .RM files (for audio files).
In order to maximise storage space, some file formats use compression algorithms to identify data that can be removed from the file at negligible cost to the end-user and thereby reduce the overall file size. Files are compressed in one of two ways, using either lossless or lossy compression. Lossless compression is reversible, as it removes data without discarding it. This allows for reduced file size without compromising the quality of the file. In contrast, lossy compression removes and discards data every time the file is saved. This means that the quality of a file using lossy compression is affected with each new copy or change.
The file format that you choose will depend largely on what is most suitable for the use and display of your content. In general:
The file formats that you choose will depend largely on what is most suitable for the use and display of your content. The following recommendations are intended to help guide your decision-making.
Office documents and text-based files |
PDF/A: Portable Document Format (Archival) |
PDF: Portable Document Format |
|
DOCX: MS Word Open XML Document (created in MS Office 2007 onwards) |
|
PPTX: MS PowerPoint Open XML Document (created in MS Office 2007 onwards) |
|
XLSX: MS Excel Open XML Document (created in MS Office 2007 onwards) |
|
ODT: OpenDocument Text Document (created in OpenOffice) |
|
ODS: OpenDocument Spreadsheet (created in OpenOffice) |
|
ODP: OpenDocument Presentation (created in OpenOffice) |
|
TXT: Plain Text File (ANSI or UTF-8 encoded) |
|
RTF: Rich Text Format File |
|
XML: Extensible Markup Language Data File |
|
CSV: Comma Separated Values File |
|
TSV: Tab Separated Values File |
|
Raster (or bit-map) image files |
TIFF: Tagged Image Format File |
JPEG/JFIF: Joint Photographic Experts Group JPEG Interchange Format File (lossy compression) |
|
JPEG 2000: Joint Photographic Experts Group (lossless compression) |
|
GIF: Graphic Interchange Format |
|
PNG: Portable Network Graphic |
|
Vector image files |
SVG: Scalable Vector Graphics File |
Audio files |
WAV: Waveform Audio File Format |
FLAC: Free Lossless Audio Codec File |
|
AIFF: Audio Interchange File Format |
|
MP3: Moving Picture Experts Group Layer 3 compression |
|
Video files |
AVI: Audio Video Interleave File (uncompressed) |
MOV: Quicktime Movie (uncompressed) |
|
MXF: Material Exchange Format |
|
MP4: Moving Picture Experts Group (with H.264 encoding) |
Organise your files into meaningfully named folders with logical relationships.
Using unique, concise, and descriptive file and folder names will help ensure that you and other users are able to find the information that you need when you need it. Choose a file-naming system that is easy for you to use and manage, and which provides a short but meaningful description of the file or folder’s contents. Once you have chosen a file-naming system, be consistent in its application.
Keep file names short: Long file names can cause problems for computers. Keep files names under 25 characters. Where possible, abbreviate, truncate, and use acronyms, but only if those abbreviations and acronyms will be understood by other users.
Avoid certain special characters: Certain special characters (such as ^ ~ \ / : * < > | ! # % & £ $ , . ‘) are often used by computers to indicate specific commands, which may result in problems if they appear in your file names. The exceptions to this rule are hyphens ( - ) and underscores ( _ ), which can both be used in file names.
Do not put spaces in file names: Spaces in file names can also cause problems. Instead use hyphens (“file-name”), underscores (“file_name”) or camel-casing (“fileName”).
Dates at the beginning of a file name will enable chronological sorting: This provides a quick and easy way to sort your files by date if appropriate. The recommended format for dates is YYYY-MM-DD (for example, 2nd February, 2023 would be represented as 2023-02-02).
Where relevant, use file names for version control: Where you are creating multiple versions of a file, use the file name to help quickly keep track of those those versions. For example: