Color Science for the Unintended

By mw @ 2024-10-20 01:14 Updated @ 2025-01-16 22:42

Introduction

Most of us are not color scientists, yet we encounter and work on colors all the time. Maybe you are buying a new TV, and the manufacture says that the TV covers “99% of P3”. “What does it mean?” you asked the clerk in the electronics store. “Oh it means it can display far more colors than the average TV,” the clerk replied, and they show you this diagram:

“Yes, but what does it really mean? Why does this…thing…look like this? Why are all the colors squeezed into the lower-left corner of the positive xy plane? What happened to the upper right half?”

“Ah well, because our TVs are really good! You don’t need to care about all the details; just enjoy the intense colors!”

Ok, maybe you are not buying the TV after all. Maybe you are just an innocent programmer. At some point you decide to build your own website, you have all the contents and the structures figured out, only the styling remains. Now you are trying to decide the color of the <a> element. The color #2e3735 looks promising.

But what is #2e3735, really? Why does it look like that? Who decided that this particular hex sequence should look like this yellow-green-ish color? Maybe you know that all colors on the web are defined in the sRGB space, but then what is sRGB? How is that defined? Is there an end to this chain of whys and hows?

This article tries to answer these questions.

Huge disclaimer: I am not a color scientist, either. I decided to dig into this topic, because I was that TV buyer, and I was that innocent programmer. And I still am. I still have a lot of gaps in the knowledge, so this article will probably contain mistakes. But I’ll do my best :→ Generally I will have references for important statements. The ones that do not are most likely just my unsophisticated understanding on the subject (aka. “trust me bro”, aka. “I guess”).

The CIE XYZ color space

Although you almost never use this space directly, or encounter an image in XYZ space (the notable exception being DCP), you will see this space get referred to all the time. Most color spaces you will work in are practically defined by some transformation from the XYZ space, including the all-important sRGB. But before talking about the XYZ space, we need to briefly look at an extremely important concept: color matching functions.

Under normal lighting conditions, the cone cells in the retina take care of reproducing colors. There are three kinds of cone cell, with distinct responses to the spectrum. The responses are, of course, very hard (or impossible) to measure directly. So we have to find something close that is easier to measure, and that something is color matching functions.

[w&s]Imagine looking at a color produced by a monochromatic light, for example, orange. But as you know, some combination of red, green, and blue light (these are also monochromatic, but from now on I will only call the former monochromatic light “monochromatic” to prevent confusion) can also produce orange (red + green, with little to no blue). Therefore we can do an experiment, where we sweep the frequency range of visible light, and try to match the resulting color with combinations of red, green, and blue which are at fixed frequencies specified beforehand. In another word, with a monochromatic light of a given frequency, we can ask ourselves: can we mix the red, green, and blue light and adjust their intensities, so that the mix looks exactly the same color as the monochromatic light? The answer is usually “yes” (I will explain the “unusual” part later), and people have done this experiment. For each frequency of that monochromatic light, we can get three numbers, which are the intensities of the red, green, and blue light. After the experiment, we get three curves like these:

Figure 1. The RGB color matching functions

We will now call these three curves \(\bar{r}\), \(\bar{g}\), and \(\bar{b}\), respectively. They are functions of wavelength (or frequency). We will also call these the “color matching functions”, because… well… we get to these by matching colors. Using these functions as basis, a linear vector space can be constructed, with inner product

\[f \cdot g = \int_0^{\infty}\!\! \mathrm{d}\lambda\, f(\lambda) g(\lambda).\]

This is extremely useful, because now if we are given some kind of light, with a power distribution \(E(\lambda)\), we can just calculate how much intensity of the red, green, and blue light we need to mix, in order to match the color of that light:

\[R = E \cdot \bar{r},\quad G = E \cdot \bar{g},\quad B = E \cdot \bar{b}.\]

In the equations above, R, G, and B are the intensity of the red, green, and blue light we need to mix in order to match the color of E, repsectively. This totally makes sense --- let us go back to our orange light. Using the equations above, we can figure out how we can get the color orange by mixing red, green, and blue light. Let us suppose the orange light is monochromatic at \(\lambda_0 = 600 \mathrm{nm}\). Its power distribution is then \(E = A \delta(\lambda - \lambda_0)\), where δ is the Dirac delta function, and A is just some arbitrary number denoting how strong the light is. Now we can just calculate R, G, and B by simply evaluating the integrals:

\[R = E \cdot \bar{r} = \int_0^{\infty} \! A\delta(\lambda - \lambda_0) \bar{r}(\lambda)\, \mathrm{d} \lambda = A\bar{r}(\lambda_0).\]

This is just the value of \(\bar{r}\) at 600 nm. Similarly \(G = A\bar{g}(\lambda_0)\), \(B = A\bar{b}(\lambda_0)\). Now we can just read on the color matching functions plot to get \(R \approx 0.31\), \(G \approx 0.07\), \(B \approx 0\), (assuming A = 1,) which basically just confirms our common sense that we can mix red light and green light to get orange. As you can see, once we have the color matching functions, we will never need to do the experiment of asking a human to match colors anymore, we can just do some calculation!

It is worth noting, however, that unlike the usual cartesian coordinate systems or the Hilbert space that has a similar definition of inner product, this space is not orthogonal, because \(\bar{r} \cdot \bar{g} \neq 0\), \(\bar{g} \cdot \bar{b} \neq 0\), \(\bar{b} \cdot \bar{r} \neq 0\). There is no physical law that puts such constraints. As a result, there is no guarantee that different power distributions (the \(E(\lambda)\) mentioned previously) would always produce different R, G, and B. In another word, given two lights with different power distribution \(E_1 \neq E_2\), it is entirely possible that

\[E_1 \cdot \bar{r} = E_2 \cdot \bar{r},\quad E_1 \cdot \bar{g} = E_2 \cdot \bar{g},\quad E_1 \cdot \bar{b} = E_2 \cdot \bar{b}.\]

This may seem strange at first, but think about it --- this basically just means that lights with different power distribution can appear to be the same color to the human eye, which is, of course, completely natural! The experiment of matching colors would not be possible if that is not the case.

At this point, I must address the “elephant in the room” regarding the color matching functions. Recall that these functions are intensities of the red, green, and blue light. But some portion of \(\bar{r}\) is clearly negative! How on earth can one have a light with negative intensity?

Let us look at, for example, λ = 510 nm, where \(\bar{r} < 0\). This is a kind of green color. What happened here was that the participants of the experiment were unable to match this monochromatic light with any combination of the red, green (which is not 510 nm, by the way), and blue light. The only way they could make the match was to mix the 510 nm light with some red. And therefore in the resulting color matching functions, they had to take this red out, resulting a negative value.

Now maybe this makes sense to you, but it is undeniable that having negative values here is both inconvenient and weird. Therefore people have defined another set of color matching functions, which are just projectively transformed from the real color matching functions. They look like this:

Figure 2. The XYZ color matching functions

Similar to the real color maching functions, these also form a linear vector space, with the same definition of inner product. The basis are called \(\bar{x}\), \(\bar{y}\), and \(\bar{z}\); and we also have

\[X = E \cdot \bar{x}, \quad Y = E \cdot \bar{y}, \quad Z = E \cdot \bar{z}.\]

This is called the XYZ space. Togather with the real color matching functions, they form the CIE 1931 Standard Colorimetric System. Like I mentioned, the XYZ space is the starting point of practically all other color spaces.

We can go one step further, and simplify this still. We can define the so-called chromaticity x, y, and z as dimension-less variables

\[x = \frac{X}{X + Y + Z}, \quad y = \frac{Y}{X + Y + Z}, \quad z = 1 - x - y.\]

We can then let x and y form a 2-dimentional space (z is not independent anymore). With this we can do another experiment: given a monochromatic light with some frequency, we can calculate the value of x and y, and put a point on the xy plane. If we sweep the whole spectrum, we should end up with a curve. What does it look like? The answer is the following graph:

Figure 3. The chromaticity line

This is called the chromaticity line. Imagine drawing a straight line connecting the two ends of this curve; this line is called the line of purple, because all colors on this line look like some kind of purple. This also closes the chromaticity line. All the possible colors viewable by human are enclosed within this region. (I am not sure exactly why though. Is this mathematical, physical, or biological? I have not found an answer yet in [w&s]. There is a Wikipedia entry, but I am not fully convinced.)

The RGB color space

// WIP

An RGB color space is spanned by a set of red, green, and blue. The definition of those colors is specific to each RGB space. Fig. [The RGB color space] shows a linear RGB space “embedded” in the xyz space.

Figure 4. The RGB color space

The red, green, and blue vector coincides with those of sRGB and Rec.709 under “white point D65”. The big grey triangle is the \(x+y+z = 1\) plane. The chromaticity line is drawn as the red curve. If one projects it onto the x-y plane, one recovers the curve in fig. [[img-domain]]. The RGB vectors defines a triangle (shown in black, the smaller one) on the \(x+y+z = 1\) plane; this is the gamut of the RGB color space. The light grey dashed lines visualize the entirety of the linearized version of the RGB color space, inside which are all the colors that can be expressed with this linearized RGB color space.

The sRGB space

Color space sRGB is widely used as a “standard” space on the web. When a program doesn’t support color management, it usually assumes that all the colors it needs to display are in sRGB (I’m looking at you, Chrome). However, professionally sRGB is almost never used in any production workflow, because of its limited gamut. Therefore it is important to know that sRGB is, and how to convert other space from/to it.

The basis vector and gamut of sRGB is shown in Fig. [The RGB color space]. It is then easy to find that there is a linear transformation between linearized sRGB and XYZ:

\[\begin{pmatrix} R_\mathrm{linear}\\G_\mathrm{linear}\\B_\mathrm{linear}\end{pmatrix}= \begin{pmatrix} 3.2406&-1.5372&-0.4986\\ -0.9689&1.8758&0.0415\\ 0.0557&-0.2040&1.0570 \end{pmatrix} \begin{pmatrix} X \\ Y \\ Z \end{pmatrix}.\]

However the regular sRGB has a nonlinear gamma map to the linearized version shown in the figure:

\[C_\mathrm{linear}= \begin{cases}\frac{C_\mathrm{srgb}}{12.92}, & C_\mathrm{srgb}\le0.04045\\ \left(\frac{C_\mathrm{srgb}+a}{1+a}\right)^{2.4}, & C_\mathrm{srgb}>0.04045, \end{cases}\]

in which \(a = 0.055\). Note that this map contains a linear section at dark region, and a power law section at the rest of the domain. Therefore, if one wants to convert a color from sRGB to XYZ, one needs to first use the nonlinear map to convert the sRGB color to linearized sRGB, and use the linear transformation to convert it to XYZ. One can use the following ImageMagick command to achieve this:

convert -alpha off -fx "p <= 0.04045 ? p / 12.92 : ((p + 0.055) / 1.055) ^ 2.4" \
   -color-matrix "0.4124564 0.3575761 0.1804375 0.2126729 0.7151522 0.0721750 0.0193339 0.1191920 0.9503041" \
   -evaluate multiply 0.9166 -gamma 2.6 \ # -depth 12 -quality 0 \
   src dest

A very useful scenario in which this command is useful is when one want to convert frames of a movie in sRGB to DCI format, which is are JPEG 2000 files in XYZ space. However if one use any encoder (for example FFmpeg) to encode the frames back to video (and thus convert color space back to sRGB), one will find that the color is wrong in comparison to the original frames. This is because even though sRGB utilizes the multi-section nonlinear map mentioned previously, major video editing softwares like Premiere and Vegas only use a lazy approximate version, which is a simple power law with \(\gamma = 2.6\) across the whole domain. Therefore encoders have to use this map to linearize sRGB in order to achieve consistent color. Thus the ImageMagick command can be simplified to

convert -alpha off -gamma 0.4545454545 \
  -color-matrix "0.412390799265959 0.357584339383878 0.180480788401834 0.212639005871510   0.715168678767756 0.072192315360734 0.019330818715592   0.119194779794626 0.950532152249661" \
  -evaluate multiply 0.91655527974 -gamma 2.6 \ # -depth 12 -quality 0 \
  src dest

About HDR (WIP)

In the context of home entertainment/personal computing, “HDR” refers to the technology that allows the monitor to display more contrasty scenes. The current HDR technologies are, in my opinion, weird and wrong.

Usually, the values in the RGB channels determine how bright a pixel is. If I were to design a standard that allows brighter content, what would I do? Naturally I would arrived at either of the following:

Let there be a switch, which, when turned on, tells the monitor that 255 in the RGB channels means, for example, 1000 nit of brightness. Of course 0 still means black, the rest of the values can interpolate between these extremes according to some formula. Depending on the formula, this approach would result in banding in various brightness region. To fight that I could ask the content and monitor to support an extended range of 0—1023.
Or I could extend the range of values of the RGB channels to, for example, 0—1023. I will then define that the 0—255 range should work exactly as in the non-HDR case, but for 256—1023, the monitor should raise brightness as needed.

Of course the industry ended up with a completely different approach. Today we have a number of standards for HDR, but for the most part they work in the same way. Basically they ask that the content and monitor to be 10-bit, which is necessary to avoid banding, among other things. But they also require a fourth channel besides the RGB, which is exclusively for the brightness. This is completely insane! The RGB values already have brightness info. Why do they need a separate channel? What if I ask for #111111 and a high brightness? Would it just look the same as #ffffff??

I did not really dig into the history of these standards, but my guess is that this design works better on the current hardware technology. The vast majority of HDR monitor/TVs are “mini LED”. A non-HDR LCD display has a LCD layer, and a backlight layer, which is a thin sheet of diffuser that diffuses the light from the LEDs at the edge of the screen. A mini-LED display has the same LCD layer, but the backlight is provided by a number of LEDs, each of which would light up a small portion of the screen. And therefore the whole screen can have several different brightness at the same time, because these LED backlights can have different brightness. In this case the monitor does have 4 channels --- the RGB channels that drive the LCD, and a brightness channel that drives the LED backlight.

References

[w&s] G. Wyszecki & W. S. Stiles, Color Science: Concepts and Methods, Quantitative Data and Formulae