Tuesday, November 5, 2013

Color Subsampling Notation

It's time to return to this blog and I'm getting started by taking some of the most popular posts and updating them and featuring content that still sparks conversations like one I had today with a friend and colleague that inspired me to revisit this post (I originally posted this topic in 2008).

Color Subsampling gets confused with color precision (8 bits per channel, 10 bits per channel, etc.) and color channels (Does 4:2:0 mean there isn't ANY Cr channel samples?).

In actuality, the number is really best characterized as a ratio. (click on the chart for a large visual)


“4” in the first slot is easiest to think of as representing the baseline of four pixels (and these ratios only apply to digital video signals).  The first number represents the first channel of the RGB or Y'CbCr group.

The second number and third number are frequently thought to represent the remaining two channels, but actually the second number refers to the sampling frequency of both the second and third channels horizontally and the third number was originally intended to indicate the sampling frequency of both vertically, though the system was developed without really considering vertical subsampling systems like 4:2:0. 

In the current system, the third number is either the same as the second number as in 4:2:2 and 4:1:1 indicating no vertical subsampling…all the vertical color difference samples are there in each column that has a horizontal color difference sample. In ratios where the third number is zero, the “0” indicates that there is a 2:1 vertical subsample in addition to the horizontal color difference subsample.



4:4:4
A designation of 4:4:4 would mean that there is a discreet sample for each of three color channels making up the signal for every pixel. While this could apply to either RGB or Y'CbCr used for video, 4:4:4 would most often be seen with an RGB signal, but 4:4:4 could refer to a Y'CbCr color sampling scheme 

RGB does not subsample one color channel in relation to another, so 4:2:2 (or 4:1:1, etc...) would never refer to RGB.

4:2:2
This number is most prevalent in high-end video formats and refers to a discrete sample for Y’ (luma) on every pixel and samples for each color difference signal is sampled at one value for every two pixels. While in theory this sounds like the elimination of a lot of information (a third actually) compared to 4:4:4, the human eye prioritizes the detail in the luma portion of the image and most humans would be hard pressed to see the difference between a color Y’ CB CR image in 4:4:4 and one in 4:2:2. In fact, 4:2:2 is good enough that most video types that are designated as “uncompressed” are actually color subsampled at 4:2:2.

4:1:1
NTSC DV introduced us to this aggressive, lossy color subsampling scheme. For every four Y’ samples horizontally, there is only one sample for Cb and Cr.  This creates a 4x1 four pixel horizontal “block” with common color difference values, though each pixel has a discreet Y’ value so the pixels aren’t identical. 

While DV footage was used extensively, even in broadcasting, it can be a challenge for special effects and compositing as chroma keying and green and blue screen work requires a lot of subtle tonal variations to recognize irregular vertical edges. Canopus and Matrox each created custom methods of decode for DV to attempt to improve (effectively interpolating to 4:2:2) the four pixel horizontal spread for better keying, and many software keyers have similar measures in place. 

It's interesting to note that even though 4:2:0 subsampling is thought by many to be somewhat inferior to 4:1:1, 4:2:0 (compression set aside from color subsample for a moment) can actually be slightly easier to composite or key as there is only one pixel of interpolated value in either the vertical or horizontal direction, while 4:1:1 interpolates 3 pixel values horizontally.

4:2:0
PAL DV users and anyone who outputs to MPEG has seen this number. Many people find it confusing at first as it appears the notation as a Y’ sample for each pixel, a Cb sample for every two pixels, and no samples whatsoever for Cr.  In reality, there are the same number of color difference samples as NTSC DV with the pixels arranged differently. 

Also confusing: all the color difference sample sites for the various approaches to 4:2:0 are not standard. (see chart) JPEG/MPEG-1 structures the samples so that they’re sited in the center of the four pixel block. MPEG-2 sites the samples between pixels vertically, and PAL DV sites the difference samples on alternating lines. Even with the color difference samples sited differently for different applications of 4:2:0, you could say there are still four pixel blocks that net out to the same amount of color difference samples as 4:1:1 and simply picture these 4 pixel “blocks” as square (2x2) instead of a horizontal line (4x1) like NTSC DV’s 4:1:1.

4:2:2:4, 4:4:4:4
As if all this isn’t complicated enough…you could add a number. 4:2:2:4 or 4:4:4:4 refer to 4:2:2 or 4:4:4 color sampling with the addition of an alpha channel for keying purposes. The fourth channel would carry an 8 bit or 10 bit (depending on the image format) grayscale map indicating relative transparency of each pixel in the image. The alpha number is always the same as the Y' sample.

3:1:1
This ratio appeared during the period of HDCAM's introduction.   Playback is 1920x1080, but actually record 1440x1080 to tape. In my opinion the most confusing aspect is not so much that there is a different baseline number, but whether or not that number is a proportion of “4” in itself as 1440/1920 is 3 of 4. 

The interpolation to 1920x1080 4:2:2 (this is how the manufacturer presents the specs on the playout picture) and the color difference subsampling ratio of 3:1:1 are separate issues and their mathematical scale to full raster 1920x1080 is most likely coincidental. 3 equates to 1440 horizontal Y’ samples and 1 is a ratio to 3 designating 480 horizontal color difference samples. This notation is NOT on the chart as it does not exist anywhere but in file storage, and the end user can only access HDCAM footage as 4:2:2 SDI output without a proprietary post solution.

As we continue to see new formats and frame sizes, we'll continue to see new approaches to storing and encoding images, but the color subsampling notations will likely stay in place for the foreseeable future.

TimK