Information and good practices for image files

Recommendations on how to handle and name your image files.

Why is this important?

When you handle many different files on a daily basis, their types, names and properties actually have a large impact on your workflow: while PDFs and TXTs can be opened on any computer with most softwares, DOCX or PPTX files are hard to open without their proprietary softwares. As you probably have already experienced many times, opening a DOCX document in Microsoft Word is not the same as opening it on Google Docs. Some formatting, fonts and other information might be lost or, even worse, misinterpreted.

Did you know, for example, that Microsoft Excel has a hard limit of 1,048,576 rows by 16,384 columns (and that if you open a file with more rows or columns than this limit, they will be deleted even if you don’t save over the file)? Did you know that including a PNG image in a PDF file is pointless since they work on two different rendering technologies? Or did you know that saving a gray-scale TIF image to a RGB color image for a presentation completely changes the structure and values of the image?

Given these few examples, you might well realize that it’s important to understand the properties of your files and how they can be properly handled by your softwares. In this post we will focus on image files and microscopy images more specifically; but some principles, as illustrated above, will still apply to any file type.

No lost information!

We should strive to preserve all the original information: data and metadata.

Image data
The data is composed of all the pixel values and structure contained in your image.
Image metadata
The metadata is composed of all the additional information about the image, usually revolving around the acquisition parameters of the microscope and some information regarding the sample or experimental conditions.

Of course we should aim to preserve all the information, how could we lose it? Well, as explained above, this might happen when we save or open a file in an inappropriate manner!

When we save an image in JPEG format, no matter how careful we are, the JPEG algorithm will compress our data and it will change the pixel values (even if our eyes might not see it!). Similarly, exporting a CZI (Zeiss microscope proprietary format) file to a TIFF image will preserve all of our data but might actually lose some of the metadata in the process. For example we might not be able to keep track of how tiles should be put together or what the exact pixel size might be.

TIFFS and proprietary formats

There are many many available file formats in fluorescence microscopy, but the large variety mostly comes from how different companies have decided to store the metadata of their images, not so much the data.

Given that the data itself, in most formats, has the same structure, it is usually safe to export it to TIFFs, the standard most used image file format perfect for small to medium files. TIFFs are basically tables in layers that contain many rows and columns as pixels in our image. Each ‘cell’ contains a value equals to the pixel value at that row and column. If there are more channels / time-points or Z-slices, TIFFs images contain extra layers coding for a new dimension for each new layer of information (3D for XYZ, 4D for XYZC, 5D for XYZCT etc..). TIFFs retain the raw pixel values of your images and are pretty fast to read (up to some size limits), so they are ideal to work with when analyzing your image data.

TIFF image structure
Export options
Make sure you don’t accidentally compress or normalize your data when exporting from another software. Data could also be compressed and still saved as TIFF, but must be avoided for quantitative analysis!

But of course there are exceptions: some files formats (.ims, .czi etc..) are arranged in a pyramidal structure which means that they contain your data in a nested structure from low resolution to high resolution. The advantage of this approach is that you can load in memory and display only what’s needed: it’s exactly the technology Google Maps uses to only display the level of details you need when you look at a map! This saves memory and time while preserving the possibility of browsing your images. However, due to this structure, pyramidal image formats are not always straightforward to open. However, for analysis, unless the image size is too large, the easiest approach would be to export to TIFF the highest resolution pyramid layer.
Pyramidal image structure

The most risky factor about exporting images as TIFFs is the possible loss of metadata, since the TIFF format might not interpret or store correctly the metadata coming from the proprietary format. For this reason, it is heavily recommended to only export as TIFF the images needed for analysis but keep the original proprietary format images to make sure that the stored metadata is correct.

Keep original file
Always keep the original proprietary image file to preserve the metadata (.czi, .lif, .nd2 etc..). TIFFs are mostly for analysis unless you make sure the metadata are stored properly and correctly!

Bio-formats and OME-TIFF

Since there are so many different formats and standards, a large community has gathered around the Open Microscopy Environment (OME) initiative. They created the OME-TIFF formats which has the same advantages of a TIFF file but with completely standardized metadata structure. This means that if a correct and complete transformation of metadata strucutre can occur between a proprietary file format and a OME-TIFF, then all metadata information is retained. This is basically the only really standardized approach to metadata organization, so it should be preferred if possible.

Bio-formats provide tools to read and convert many proprietary formats to other formats (including OME-TIFF of course) through Fiji, CellProfiler, Icy, Python and many more. This means that most open-source softwares can open proprietary format images and could potentially convert them to a more standardized format. However, the reading of the metadata is still a problem, so keeping the original file format is still preferred.

OME also built OMERO, a pretty popular system to store, tag and handle images reproducibily and collaboratively. Finally, a next-generation format (NGF) is in development to remove leftovers disadvantages of the other formats and to be more compatible with large amount of data.

Raster vs. Vector images

When we talk about image formats, we don’t talk only about microscopy formats but also commonly found color image formats such as PNGs, JPEGs, GIFs and SVGs. Even if none of these formats are ideal for image analysis of microscopy images, they are often used either in publications or for public presentations. We cannot enter in the details of all these formats but a few differences are pretty obvious:

  • JPEG compression is lossy -> saves a lot of space but loses information
  • PNG compression is lossless -> takes up more space but no loss of information
  • GIF are 8-bit images that allow for several frames to create animations
  • All above are examples of raster images while SVG, AI, EPS and PDF are vector images.

Raster images are what we commonly think of as images: a grid of pixels in which some values are stored. TIFFs as we described above are raster images. In case of a color image (usually RGB) we would have 3 values for each pixel: a red, a green and a blue value. As you can see from the image below, if we zoom in into a raster image, we will eventually see the single pixels in a grid, meaning that if the grid is large, the image will appear at higher resolution (with more detail) and if the grid is smaller it will appear blurrier (with less detail). That’s why some key parameters for raster images are the image size and the dpi/ppi (dots/pixels per inch).

Vector graphics are very different! They are not a grid of pixels but they store sets of coordinates for each shape in the image and they ‘reconstruct’ the final image by rendering those coordinates onto your screen. That’s why if you zoom in on a PDF file you will see the text and figures re-rendered at a higher resolution. In a vector image you will never see the individual pixels because the shapes are constantly re-rendered according to the zoom factor you are currently viewing them under.
Raster vs. vector comparison. *Note that this is a PNG image!
Recommended use of raster and vector graphics
  • Use PNG or JPEG for presentations and or websites. Larger images are heavier but look better.
  • Use GIF for small animations in presentation or websites.
  • Use SVG or PDF for illustration or figures in journal figures and publications. If required a raster format use TIFF.
  • Don’t mix raster and vector images (for example in Illustrator or Inkscape). You will either have a nice-looking vector figure with a low-resolution raster image or you might as well export both as a raster image.

For further information and clarifications on file formats and how to use them see the excellent bioimagebook from Pete Bankhead!

Descriptive but simply-formatted names

Descriptive names

Non descriptive names
  • test1 / test2
  • cells-nuclei-transfected
  • KO-mouse-immunostain
Descriptive names
  • Dish1-HeLa-plasmidX-10uMstimulation
  • Slide05-HumanHepatocytes-AbXXX-ATTO488-WGA-Alexa647
  • WellC2-KOgeneX-DAPI-2hIncubation

It’s important to keep the most important and relevant information in the file name. Both because it is convenient to have it at a first glance without having to open the file to check what’s inside and because some important information might not be stored in the file at all! For example we might be able to find all the imaging settings in an image metadata but nowhere we could find what cell line we used for the experiment!

Try to find a criteria and structure for your file names and try to stick with it across multiple experiments.

Simple formatting names

Wrongly-formatted names
  • Josè KO/WT comparison 5.2µM PMA
  • HEKcells-fixée-5%PFA-replicate#3
  • Organoid-DAPI+JF585-20%488nm+50%561nm
Properly-formatted names
  • jose-KOgeneX-PMA5p2uM
  • HEKcells-5pPFAfixation-replicate3
  • organoid-DAPI-20p488nm-JF585S-50p561nm

While descriptive, you should keep the formatting of the name as simple as possible. This means avoiding everything that is not simple letters or simple characters like dashes (’-’) or underscores (’_’), upper- and lower-case letters. This means avoiding special letters (accented, superscript, etc..) and special characters (’.’, ‘/’, ‘%’, ‘@’, ‘$’, ‘#’, etc..).

The main reason to avoid special characters is simple: they often mean something already in the system or they are not recognized/read properly by certain softwares. For example, slashes (’/’ or ‘') are used in file paths (‘home/user/Documents’ or ‘C:\User\Documents'), while dots (’.’) are used for file extensions (’.png’, ‘.tif’). Spaces are also dangerous because they usually split different strings of text, so one string which should be a single filename, now it’s split into multiple strings and only the first part might be recognized as the filename. Other more unusual characters can be risky just for the fact that are not accepted in most processing or analysis softwares (i.e. Excel cells or Python/R scripts etc..); so having a filename containing these characters would force you to rename them later on during processing or to change the naming convention along the process which can be inconvenient and error-prone.

Guidelines on file names
  • Try to always have a uniqued identifier for each sample/experiment (ID)
  • Try to group information between dashes or underscore (i.e. ID-StainingInfo-StimulationInfo-ImagingInfo)
  • Avoid too many dashes/underscores (and certainly spaces) by using the CamelBack notation (i.e. SequentialWordsButSeparatedByCapitalLetters)
  • Replace symbols and special characters with literal string (+ -> plus, % -> perc, . -> p, etc..)
  • Include name and date either in file names or upstream folders
  • Always use the same structure across different files, it will be so much easier parse names with analysis softwares later on

Bottom line, keep file names simple and descriptive. It might be a bit harder to read but for a computer it’s much clearer.

An eye on the file size

Managing storage space for our data is always a struggle, so it might be tempting to delete some data or to opt to use file formats that save us some space. Let’s understand some basics.

Every pixel in an typical fluorescence microscopy image has a integer value (we cannot have fractions or negative amount of photons) that goes between 0 (no photons) and $2^{Image Bit Depth}$. The bit depth of an image is the number of bits used to represent each pixel: this can be seen as the “color range (or depth)” of each pixel. For example a 8-bit image can have values between 0 and 255 in each of its pixel; overall it can ‘contain’ $2^8=256$ shades of gray. Similarly, a 16-bit image can contain $2^{16}=65536$ shades of gray.

Now we have to take into account how large the image is, how many channels, time points, tiles and Z-slices it has. A typical 16-bit ($D$, 1 for 8-bit, 2 for 16-bit etc..) image might be 2048x2048 pixels ($S$), with 3 channels ($C$) (DAPI, GFP and brightfield), 10 time points ($T$) (if it’s live cell time-series) and one Z-slice ($Z$) (in this case we are not interested in a 3D volume but just in a single 2D plane with our sample). Knowing this information we can actually calculate the total file size this image should occupy on our disk.

$$ Size = ((S \times S) \times D) \times C \times T \times Z$$

Which in our example image described above would be: $((1024 \times 1027)\times 2) \times 3 \times 10 \times 1 = 61440$KB equivalent to $61440/1024 = 60$MB. So a 16-bit image 1024x1024 large with 3 channels and 10 time points will occupy 60Mb on our disk. Usually this will be a little more because the metadata will also take some space. Go ahead! Create an empty image in ImageJ (File->New->Image..) and test it yourself! Check out this Excel sheet to see some more examples of file size.

This whole explanation serves multiple purposes:

  1. If a file you have saved on your disk occupies less than the calculate space, something is wrong! Typically the file is compressed somehow. Compressed images occupy less space and can often appear of similar quality (JPEG), that’s why they are very often used in photography. However compression often means alteration of our data and it’s usually very bad for quantitative fluorescence microscopy!
  2. If you save a 16-bit image to 8-bit depth, you will compress the range from 65536 to 256 possible shades which can notably reduce the contrast or clip (remove) values!
  3. If an image contains negative or non-integer values, it probably means it has been manipulated already somehow!
  4. Acquire only what you really need! It’s often tempting to acquire many channels, time points, tiles and Z-slices increasing the final file size exponentially. You can now calculate what the impact of your decisions will be on your final data storage strategy.

With light-sheet microscopes becoming the norm, data size skyrockets! Check out this illustration to have an idea of the scale of file size we are going to encounter soon!