Co-authored by Tyler Wu and Lucy Cui
What is Data Visualization?
Data visualization, like many forms of communication, is essentially a more compact and efficient representation of a larger, more complex concept. In language, we use words to simplify and represent complex thoughts. In counting, we use numbers to represent large quantities that our memory could not possibly hold on a one-by-one basis. Data visualization is essentially just a “visual” counting of data. Take the bar chart below for example. It represents the populations of the 10 most populous countries in 2022 by representing the country’s population with its bar length.
Visualizing the data this way has many benefits over showing the data in a spreadsheet alone. For one, it allows us to compare quantities more efficiently, and perform complex tasks like understanding the distribution or comparing the population of continents. In addition, with spreadsheets of larger datasets, speed becomes even more of an issue, sometimes to the point of being impractical.
The Role of Color in Data Visualization
When we first choose a color scale for our visualization, the choice might seem pretty straightforward and intuitive. You might think something like: “I should choose a set of visually appealing colors, and they should have some connotative relation to the type of data we are presenting”. While these aesthetic considerations are certainly important points, there are also aspects of perception that need to be considered beforehand. For example, the idea that a continuous scale might make more sense for numerical, continuous data and a discrete scale makes more sense for nominal or ordinal data. In addition, the property of color to choose for a scale: hue, saturation, or lightness, certainly also influences the readability and effectiveness of a visualization.
A Crash Course in Color Theory
Before jumping into the use cases of color scales, let’s review some basic color theory. There are three properties of colors to consider: hue, saturation, and lightness.
-
Hue is typically what we think of, when we think of colors, (for example, red, yellow, and blue).
-
Saturation represents how pure the color is, (for example, a vibrant orange vs. a duller brown with more grey).
-
Lastly, lightness reflects the intensity of the light in the color, (for example, light and dark green).
Here is a quick video demonstrating what each of these properties looks like:
(You can try adjusting the sliders yourself on colorizer.org.)
Note that with lightness and saturation, we have an innate ordering of colors as more or less light or saturated. Specifically, we can quickly identify which of two colors is “lighter”, or which is “more saturated”. However, with hue, we can’t exactly say when one hue has more or less “hue” than another. (While, there is a physical difference in wavelength, this is something we learn through education, and not an ordering that we know innately from comparing two hues.) This distinction informs how we use different color scales for different purposes. Specifically, the ability to rank colors may align more with numerical data that might need to be ranked, and the ability to differentiate colors lends itself more to categorical data that needs to be differentiated.
When to Use a Hue Scale Versus a Brightness Scale
There are two common tasks we perform when viewing a visualization:
-
identifying the category of a graphical mark (e.g. a point or bar) in a visualization
-
comparing the numerical values of two or more graphical marks.
The Case for Using a Hue Scale
Let’s use Google Maps as an example. A typical identification task one might have when using Google Maps could be identifying where a park is on the map. Ignoring labeling and other symbols for the moment, notice that the identification of parks in general is made easier by each type of location having a hue (color) distinct from that of other marks. Then, the task of identifying a park is simplified to locating a mark with the green hue on the map.
If, hypothetically, Google decided to color freeways with a hue similar to the green of the parks, the task of finding a certain park would be much more difficult. Since, then, instead of just other parks competing with the “target” park for attention, the similarly green freeways now also become more similar “distractors”.
This concept is related to visual search studies, which show that the more dissimilar the distractors are, the faster individuals can locate the target. Take the below diagrams for example. The red T should be easier to locate in the left diagram than the right. This is because the distractors are more dissimilar in the left diagram (blue is different from red and L is different from T) than on the right diagram (red L’s and blue T’s share either color or shape with the red T), leading to a “pop out” effect, where the target stands out from distractors by having distinct features, for the target T on the left (R, Gabriel, 2016).
The same concept applies to an identification task in a visualization. The more distinct the colors used for the marks, the faster the viewer will be able to make the identification. A large volume of studies have found that “a multicolored [hue] scale vastly outperforms a brightness scale” with identification tasks (Breslow et. al, 2009).
The Case for Using a Lightness Scale
The second common task is comparing the value of two marks. In the heat map below, we might have an intuitive idea that the darker, blue areas are greater in value than the lighter areas. In fact, one study found that we have a “robust bias to interpret darker colors as mapping onto larger quantities” (Silverman et. al, 2016). You can verify values by hovering over a square. However, if we swapped this scale to a hue scale, we notice immediately that the intuitive ordering is lost. You can try swapping the scales with the button below:
// console.log(“script running”)
var config = {
“vw”: window.innerWidth * 0.8,
“vh”: window.innerHeight * 0.8,
“anim_speed”: 1000
}
var margin = {top: 0, right: 50, bottom: 50, left: 30},
width = config.vh – margin.left – margin.right,
big_width = config.vh * 1.5 – margin.left – margin.right
height = config.vh – margin.top – margin.bottom;
let rect_data;
const t = d3.transition().duration(config.anim_speed).ease(d3.easeCubic);
var svg = d3.select(“#color-props-svg”);
svg
.attr(“width”, big_width + margin.left + margin.right)
.attr(“height”, height + margin.bottom);
// Rectangles
var rect_group = svg.append(“g”)
.attr(“class”,”colorRects”);
// Labels of row and columns
var myGroups = [“A”, “B”, “C”, “D”, “E”, “F”, “G”, “H”, “I”, “J”]
var myVars = [“v1”, “v2”, “v3”, “v4”, “v5”, “v6”, “v7”, “v8”, “v9”, “v10”]
// Build X scales and axis:
var x = d3.scaleBand()
.range([margin.left, width])
.domain(myGroups)
.padding(0.01);
svg.append(“g”)
.attr(“transform”, “translate(0,” + height + “)”)
.call(d3.axisBottom(x));
// Build Y scales and axis:
var y = d3.scaleBand()
.range([height, 0])
.domain(myVars)
.padding(0.01);
svg.append(“g”)
.attr(“transform”, “translate(” + margin.left +”,0)”)
.call(d3.axisLeft(y));
// var brightnessScale = d3.scaleLinear()
// .domain([0,100])
// .range([“white”, “#2774AE”]); // light to dark
// console.log(“brightnessScale”,brightnessScale.range());
// brightness color scale
var brightnessQuant = d3.scaleQuantize()
.domain([0, 100])
.range([“white”,”#C5D9E9″, “#89B3D3”, “#5894C1”, “#2774AE”]);
// hue color scale
var hueQuant = d3.scaleQuantize()
.domain([0,100])
.range([“#F2DA57”, “#E25A42″,”#B396AD”,”#33B6D0″,”#A0B700″]); // selected from sunlight-styleguide
// create a tooltip
var tooltip = d3.select(“#color-properties-div”)
.append(“div”)
.style(“opacity”, 0)
.style(“position”, “absolute”)
.attr(“class”, “tooltip”)
.style(“background-color”, “white”)
.style(“border”, “solid”)
.style(“border-width”, “2px”)
.style(“border-radius”, “5px”)
.style(“padding”, “5px”)
var mouseover = function(d) {
tooltip.style(“opacity”, 1)
}
var mousemove = function(event,d) {
tooltip
.html(“The value of
this cell is: ” + d.value)
.style(“left”, (event.pageX + 10) + “px”)
.style(“top”, (event.pageY) + “px”);
}
var mouseleave = function(d) {
tooltip.style(“opacity”, 0)
}
function createRects(data,g,scale){
g.selectAll(“rect”)
.data(data, function(d) {return d.group + ‘:’ + d.variable;})
.join(
enter => enter.append(“rect”))
.attr(“x”, function(d) { return x(d.group) })
.attr(“y”, function(d) { return y(d.variable) })
.attr(“width”, x.bandwidth() )
.attr(“height”, y.bandwidth() )
.style(“fill”, function(d) { return scale(d.value)} )
.on(“mouseover”, mouseover)
.on(“mousemove”, (event,d) => mousemove(event, d))
.on(“mouseleave”, mouseleave)
}
d3.csv(“https://raw.githubusercontent.com/holtzy/D3-graph-gallery/master/DATA/heatmap_data.csv”).then(data => {
rect_data = data;
createRects(rect_data,rect_group,brightnessQuant);
}); // d3 csv then
svg.append(“g”)
.attr(“class”, “legendQuant”)
.attr(“curr-scale”,”bright-scale”)
.attr(“transform”, “translate(“+ x.bandwidth() * 11.5 + “,20)”);
var brightnessLegend = d3.legendColor()
.labelFormat(d3.format(“.0f”))
.scale(brightnessQuant);
// add scale to group
svg.select(“.legendQuant”)
.call(brightnessLegend);
var hueLegend = d3.legendColor()
.labelFormat(d3.format(“.0f”))
.scale(hueQuant);
function swapScales(){
let legend = svg.select(“.legendQuant”);
if(legend.attr(“curr-scale”) == “bright-scale”){
legend
.attr(“curr-scale”,”hue-scale”)
.call(hueLegend);
createRects(rect_data,rect_group,hueQuant);
}
else{
legend
.attr(“curr-scale”,”bright-scale”)
.call(brightnessLegend);
createRects(rect_data,rect_group,brightnessQuant);
}
}
// swap color button
var swap_button = document.getElementById(“swap-button”);
swap_button.onclick = swapScales;
A study by Breslow et al. (2009) found that lightness scales are both faster and more accurate to use for comparison tasks like these.
What This Means for Rainbow Scales
Does this mean that you can no longer use rainbow scales to visualize your numerical data? Not necessarily. In fact, rainbow scales are usually the de-facto scale for climate visualizations. The main issue, however, is that many rainbow scales are not “‘perceptually uniform’ – they create sharp artificial boundaries between colors (particularly involving yellow) that are not necessarily representative of the underlying data” (Hawkins, E. 2016).
Color is a very powerful tool for any complex visualization with multiple categories and values. If your visualization is just for fun, and communicating the information efficiently and with the utmost accuracy is not the top-most priority, then using a rainbow scale is probably fine. However, if efficiency and accuracy are vital to your visualization, then most of the time there are scales better than a rainbow scale. In fact there are many popular pretty multi-hue scales that are still perceptually uniform like Viridis, Magma, etc. that can still be used for your visualization.
Some Other Small Color-related Visualization Tips
-
Yellow and green are perceived as brighter than other hues, so be careful when used in conjunction with other hues (Siggy 2016).
-
We perceive marks with similar hues as being more similar to each other, so if your data is categorical, make sure your colors are “equidistant” in wavelength from each other, (i.e. if we choose red, orange, and blue for category colors, the red and orange categories might be subconsciously perceived as more related to each other than to blue).
-
Sometimes using multiple colors isn’t necessary at all. In some situations, it could be distracting or confusing if other attributes of graphical marks, like shape or size, are enough to differentiate categories.
-
Lastly, have fun choosing your colors! You can consider the connotations of different colors and what you want to convey with color in your graph, then select your colors around those ideas.
References
Breslow LA, Trafton JG, Ratwani RM. A perceptual process approach to selecting color scales for complex visualizations. Journal of Experimental Psychology: Applied. 2009;15(1):25-34. https://www.proquest.com/scholarly-journals/perceptual-process-approach-selecting-color/docview/614497204/se-2?accountid=14512. doi: http://dx.doi.org/10.1037/a0015085.
Google. (n.d.). [Google Maps image of Los Angeles area]. Retrieved April 22, 2022, from https://www.google.com/maps/place/Los+Angeles,+CA/@34.0514389,-118.2423557,13.5z/data=!4m5!3m4!1s0x80c2c75ddc27da13:0xe22fdf6f254608f4!8m2!3d34.0522342!4d-118.2436849
Radvansky, Gabriel, A.; Ashcraft, Mark, H. (2016). Cognition (6 ed.). Pearson Education, Inc.
Siggy. (2016). Why is yellow the brightest color? https://atrivialknot.wordpress.com/2016/03/04/why-is-yellow-the-brightest-color/
Silverman A., Gramazio C., Schloss K. (2016). The dark is more (Dark+) bias in colormap data visualizations with legends. Journal of Vision 2016;16(12):628. doi: https://doi.org/10.1167/16.12.628.
The Trustees of Indiana University. (n.d.). Visual Search. Cognitive Science Software. Retrieved April 22, 2022, from https://pcl.sitehost.iu.edu/CogsciSoftware/VisualSearch/