Remove Identifying Metadata From Files

Published on   | Last edited on 
5 min
PDF: Letter | A4

Categories: Defensive

Metadata is 'data about data' or 'information about information'. In the context of files, this can mean information that is automatically embedded in the file, and this information can be used to deanonymize you. For example, an image file will often have metadata about when it was taken, where it was taken, what camera it was taken with, and so on. A PDF file may have information about what program created it, what computer, etc. This can be used by investigators to link a photo to the camera on which it was taken, a video to the computer on which it was edited, and so on. To learn more about how metadata can be used to identify and reveal personal information, see Behind the Data: Investigating metadata. Before you put a sensitive file on the Internet, cleanse it of metadata.

Metadata Anonymization Toolkit

Fortunately, there is a tool that comprehensively cleans metadata, and it is available as both a command line interface and a graphical user interface. The command line version is called mat2 and is open-source, and the graphical version is called Metadata Cleaner and is also open-source. Both programs are included in Tails and Qubes-Whonix by default.

Using the Metadata Cleaner

If you are not comfortable with the command line, we recommend using Metadata Cleaner - it uses mat2 under the hood, so it has all the same functionality. Metadata Cleaner is better than Exiftool and other metadata removal software - see the comparison docs.

Metadata Cleaner shows the metadata it detects, but "it doesn't mean that a file is clean from any metadata if mat2 doesn't show any. There is no reliable way to detect every single possible metadata for complex file formats." You should clean the file even if no metadata is displayed.

To use the Metadata Cleaner, first add a file. When you click it, the current metadata is displayed. Select the file, then select Clean, followed by Save. You can verify that the metadata has been removed by re-adding the cleaned file and viewing its metadata.

When you clean a PDF file, it is converted to images, so you cannot select the text in it. If you want to retain this ability, there is a lightweight cleaning mode that cleans only the superficial metadata of your file, but not the metadata of embedded resources (such as images in the PDF). Embedded resources with metadata can be avoided by using Metadata Cleaner on the images before importing them into the layout software, and by using layout software on Tails or Qubes-Whonix such as Scribus that are generic for those operating systems. You can enable "lightweight mode" in the Metadata Cleaner settings.

Note the limitations of Metadata Cleaner: "mat2 only removes metadata from your files, it does not anonymise their content, nor can it handle watermarking, steganography, or any too custom metadata field/system. If you really want to be anonymous, use file formats that do not contain any metadata, or better: use plain-text."

Photo and Video Forensics

While it is possible to remove all metadata from an image or video, forensic examination may still reveal what device was used to capture it. As the Whonix docs note:

Every camera's sensor has a unique noise signature because of subtle hardware differences. The sensor noise is detectable in the pixels of every image and video shot with the camera and could be fingerprinted. In the same way ballistics forensics can trace a bullet to the barrel it came from, the same can be accomplished with adversarial digital forensics for all images and videos. Note this effect is different from file metadata.

Multiple photos or videos from the same camera can be tied together in this way, and if the camera is recovered, it can be confirmed where the files came from. Cheap cameras can be purchased from a refurbished store and used only once for pictures or videos that require high security.

Printer Forensics

All modern printers leave invisible watermarks to encode information such as the serial number of the printer and when it was printed. When printed material is scanned, these marks are present in the file. To learn more, see Revealing Traces in Printouts and Scans and the Whonix documentation on printing and scanning.

Further Reading