4 Using Legends, Colors, Fonts, and Axes to Improve Visualizations
4.1 Introduction
In the last section, we learned how to use bar charts and dot charts to help communicate “how much” of something that has been observed between categorical groups.
As we saw, we can make small modifications to our ggplot2
code to substantially improve the interpretability and aesthetic quality of the visualization using things like color and plot themes.
In this section, we’re going to take that a step further by learning how we can leverage ggplot2
code to create and modify legends and elements of our axes, use a variety of colors, color palettes, text and fonts.
Let’s begin by using some text to improve our NYC Garbage visualization:
4.2 Annotating Visualizations with Text
Recall where we left off with our horizontal bar chart:
We can generally see from this visualization that Staten Island produced the least amount of garbage for the month of September 2011 and Brooklyn produced the most.
We can also generally determine the amount of garbage collected. For instance, Staten Island was generally around 20K tons, whereas Queens and Brooklyn were somewhat more than 60K tons.
It might be helpful if we put the actual amount associated with each borough on the bars themselves to increase the amount of information the reader can glean from the visualization.
To do this, we can make use of a new geom: geom_text
|>
trash_tot ggplot(aes(x=Sum_Trash,y=reorder(BOROUGH,Sum_Trash))) +
geom_bar(stat='identity',color='black',fill='white') +
geom_text(aes(label=Sum_Trash)) +
labs(y = "NYC Borough",
x = "Total Refuse Collected (in tons)",
title = "Trash Collected in NYC by Borough",
subtitle = "September 2011") +
theme_classic()
Okay cool! But what is the most obvious problem?
The label is centered at the end of the bar, making the text difficult to read. We can change the justification of the text by using the hjust
argument. This argument allows us to horizontally adjust the alignment of our text labels. hjust
can assume a value between 0 and 1 with a value of 0 implying complete right justification and a value of 1 implying complete left justification.
Note, hjust
and its counterpart vjust
can assume values outside of this interval if we want to move our labels further away from the point they are anchored upon.
Let’s try hjust = 1
to move the text labels inside of the bars:
|>
trash_tot ggplot(aes(x=Sum_Trash,y=reorder(BOROUGH,Sum_Trash))) +
geom_bar(stat='identity',color='black',fill='white') +
geom_text(aes(label=Sum_Trash),hjust = 1) +
labs(y = "NYC Borough",
x = "Total Refuse Collected (in tons)",
title = "Trash Collected in NYC by Borough",
subtitle = "September 2011") +
theme_classic()
Looks better! But now notice that the label itself is rounded to just one decimal place, which seems unusual. We can fix this, and also move the label more inside of the bar, directly within geom_text
|>
trash_tot ggplot(aes(x=Sum_Trash,y=reorder(BOROUGH,Sum_Trash))) +
geom_bar(stat='identity',color='black',fill='white') +
geom_text(aes(label=round(Sum_Trash)),hjust=1.25) +
labs(y = "NYC Borough",
x = "Total Refuse Collected (in tons)",
title = "Trash Collected in NYC by Borough",
subtitle = "September 2011") +
theme_classic()
4.2.1 Modifying Font Characteristics
Our visualization is much improved over what we had originally created! But consider the font style. Right now, our visualization uses a sans serif style font in a black color by default.
What if we wanted to change that?
4.2.1.1 Font Family
While we can specify a wide variety of fonts, the main three guaranteed to work everywhere in a ggplot2
visualization are sans (default), serif (like Times New Roman), and mono (like typewriter font):
<- data.frame(x = 1, y = 3:1, family = c("sans", "serif", "mono"))
df |>
df ggplot(aes(x, y)) +
geom_text(aes(label = family, family = family))
4.2.1.2 Font Face
We can also make our fonts, bold, italic, bold.italic, or plain:
<- data.frame(x = 1:4, fontface = c("plain", "bold", "italic", "bold.italic"))
df |>
df ggplot(aes(1, x)) +
geom_text(aes(label = fontface, fontface = fontface))
4.2.1.3 Font Color
We can modify the color of our text uniformly by using the color
argument within the geom_text
function by either using a named color (see the colors()
function for the full list) or using hex codes:
## Salmon Font ##
|>
df ggplot(aes(1, x)) +
geom_text(aes(label = fontface, fontface = fontface),
color='salmon')
## Manchester City Light Blue ##
|>
df ggplot(aes(1, x)) +
geom_text(aes(label = fontface, fontface = fontface),
color = "#6CABDD")
For our NYC Garbage example, suppose I want the text to serif style, with a bold italic font face, and in Atlanta Braves red (hex code #CE1141):
|>
trash_tot ggplot(aes(x=Sum_Trash,y=reorder(BOROUGH,Sum_Trash))) +
geom_bar(stat='identity',color='black',fill='white') +
geom_text(aes(label=round(Sum_Trash)),family='serif',
fontface='bold.italic',color='#CE1141',hjust=1.25) +
labs(y = "NYC Borough",
x = "Total Refuse Collected (in tons)",
title = "Trash Collected in NYC by Borough",
subtitle = "September 2011") +
theme_classic()
4.3 Modifying Axes Elements
4.3.1 Axis Length
Still using our NYC Garbage example, let’s suppose I’d rather have the text labels outside of the bars to the right rather than inside the bars to the left. Remember, we can make a simple change to our hjust
argument to do this:
|>
trash_tot ggplot(aes(x=Sum_Trash,y=reorder(BOROUGH,Sum_Trash))) +
geom_bar(stat='identity',color='black',fill='white') +
geom_text(aes(label=round(Sum_Trash)),family='serif',
fontface='bold.italic',color='#CE1141',hjust=-0.25) +
labs(y = "NYC Borough",
x = "Total Refuse Collected (in tons)",
title = "Trash Collected in NYC by Borough",
subtitle = "September 2011") +
theme_classic()
Whoops! Now I can’t see Brooklyn’s label! It’s being truncated by the size of our viewing window.
One way this can be modified is by increasing the x-axis length. We can do this by using the limits
argument within the scale_x_continuous
function.
|>
trash_tot ggplot(aes(x=Sum_Trash,y=reorder(BOROUGH,Sum_Trash))) +
geom_bar(stat='identity',color='black',fill='white') +
geom_text(aes(label=round(Sum_Trash)),family='serif',
fontface='bold.italic',color='#CE1141',hjust=-0.25) +
labs(y = "NYC Borough",
x = "Total Refuse Collected (in tons)",
title = "Trash Collected in NYC by Borough",
subtitle = "September 2011") +
theme_classic() +
scale_x_continuous(limits = c(0,75000))
Awesome! That solved the problem!
4.3.2 Modifying Tick Marks
Notice that our tick marks on the x-axis are in increments of 20,000.
What if we want to increase the number of tick marks to be in increments of 10,000 instead? We can again use scale_x_continuous
this time making use of the breaks
argument:
|>
trash_tot ggplot(aes(x=Sum_Trash,y=reorder(BOROUGH,Sum_Trash))) +
geom_bar(stat='identity',color='black',fill='white') +
geom_text(aes(label=round(Sum_Trash)),family='serif',
fontface='bold.italic',color='#CE1141',hjust=-0.25) +
labs(y = "NYC Borough",
x = "Total Refuse Collected (in tons)",
title = "Trash Collected in NYC by Borough",
subtitle = "September 2011") +
theme_classic() +
scale_x_continuous(limits = c(0,75000),
breaks = seq(0,80000,by=10000))
4.3.3 Formatting Tick Mark Labels
In the above visualization, we note that each tick mark represents a unit measured in the thousands as we can see by the three trailing zeros in each tick mark label.
We may perhaps wish to represent “thousand” by the common label “K” so that 10000 = 10K.
To do this using ggplot2
, we can once again use the very useful scale_x_continuous
function, now adding a new element of functionality – the labels
function:
|>
trash_tot ggplot(aes(x=Sum_Trash,y=reorder(BOROUGH,Sum_Trash))) +
geom_bar(stat='identity',color='black',fill='white') +
geom_text(aes(label=round(Sum_Trash)),family='serif',
fontface='bold.italic',color='#CE1141',hjust=-0.25) +
labs(y = "NYC Borough",
x = "Total Refuse Collected (in tons)",
title = "Trash Collected in NYC by Borough",
subtitle = "September 2011") +
theme_classic() +
scale_x_continuous(limits = c(0,75000),
breaks = seq(0,80000,by=10000),
labels = scales::label_number(suffix = "K",
scale = 1e-3))
Notice in the above code, we are using the label_number
function from the scales
package to add the “K” suffix to the labels and scale (or multiply) the numeric labels by \(1/1000\).
4.3.4 Modifying Axis Font Styles
We already learned how to modify font styles in the context of geom_text
, but we can use the exact same logic and syntax to modify font in our axes as well as titles!
So suppose we want our tick mark labels and axis titles to match the formatting of our data labels. We can do this using the theme
function:
|>
trash_tot ggplot(aes(x=Sum_Trash,y=reorder(BOROUGH,Sum_Trash))) +
geom_bar(stat='identity',color='black',fill='white') +
geom_text(aes(label=round(Sum_Trash)),family='serif',
fontface='bold.italic',color='#CE1141',hjust=-0.25) +
labs(y = "NYC Borough",
x = "Total Refuse Collected (in tons)",
title = "Trash Collected in NYC by Borough",
subtitle = "September 2011") +
theme_classic() +
theme(axis.text.x = element_text(family = 'serif',
face = 'bold.italic',
color = '#CE1141'),
axis.title.x = element_text(family = 'serif',
face = 'bold.italic',
color = '#CE1141'),
axis.text.y = element_text(family = 'serif',
face = 'bold.italic',
color = '#CE1141'),
axis.title.y = element_text(family = 'serif',
face = 'bold.italic',
color = '#CE1141')) +
scale_x_continuous(limits = c(0,75000),
breaks = seq(0,80000,by=10000),
labels = scales::label_number(suffix = "K",
scale = 1e-3))
4.4 Legends
Rather than having the bars all be a uniform color (white in this case), suppose I want to have the colors of the bars differ by the particular borough they’re representing. We can do so with a very slight modification to the existing code.
In the global ggplot
function, let’s add fill=BOROUGH
:
|>
trash_tot ggplot(aes(x=Sum_Trash,y=reorder(BOROUGH,Sum_Trash),fill=BOROUGH)) +
geom_bar(stat='identity',color='black') +
geom_text(aes(label=round(Sum_Trash)),family='serif',
fontface='bold.italic',color='#CE1141',hjust=-0.25) +
labs(y = "NYC Borough",
x = "Total Refuse Collected (in tons)",
title = "Trash Collected in NYC by Borough",
subtitle = "September 2011") +
theme_classic() +
theme(axis.text.x = element_text(family = 'serif',
face = 'bold.italic'),
axis.title.x = element_text(family = 'serif',
face = 'bold.italic'),
axis.text.y = element_text(family = 'serif',
face = 'bold.italic'),
axis.title.y = element_text(family = 'serif',
face = 'bold.italic')) +
scale_x_continuous(limits = c(0,80000),
breaks = seq(0,80000,by=10000),
labels = scales::label_number(suffix = "K",
scale = 1e-3))
Cool, right? But now, we don’t really have a need for the y-axis labels. We can supress those and the tick marks using the theme
function:
|>
trash_tot ggplot(aes(x=Sum_Trash,y=reorder(BOROUGH,Sum_Trash),fill=BOROUGH)) +
geom_bar(stat='identity',color='black') +
geom_text(aes(label=round(Sum_Trash)),family='serif',
fontface='bold.italic',color='#CE1141',hjust=-0.25) +
labs(y = "NYC Borough",
x = "Total Refuse Collected (in tons)",
title = "Trash Collected in NYC by Borough",
subtitle = "September 2011") +
theme_classic() +
theme(axis.text.x = element_text(family = 'serif',
face = 'bold.italic'),
axis.title.x = element_text(family = 'serif',
face = 'bold.italic'),
axis.text.y = element_blank(),
axis.title.y = element_text(family = 'serif',
face = 'bold.italic'),
axis.ticks.y = element_blank()) +
scale_x_continuous(limits = c(0,80000),
breaks = seq(0,80000,by=10000),
labels = scales::label_number(suffix = "K",
scale = 1e-3))
Notice the legend title is all caps. We can modify the legend title in the labs
function:
|>
trash_tot ggplot(aes(x=Sum_Trash,y=reorder(BOROUGH,Sum_Trash),fill=BOROUGH)) +
geom_bar(stat='identity',color='black') +
geom_text(aes(label=round(Sum_Trash)),family='serif',
fontface='bold.italic',color='#CE1141',hjust=-0.25) +
labs(y = "NYC Borough",
x = "Total Refuse Collected (in tons)",
title = "Trash Collected in NYC by Borough",
subtitle = "September 2011",
fill = "Borough") +
theme_classic() +
theme(axis.text.x = element_text(family = 'serif',
face = 'bold.italic'),
axis.title.x = element_text(family = 'serif',
face = 'bold.italic'),
axis.text.y = element_blank(),
axis.title.y = element_text(family = 'serif',
face = 'bold.italic'),
axis.ticks.y = element_blank()) +
scale_x_continuous(limits = c(0,80000),
breaks = seq(0,80000,by=10000),
labels = scales::label_number(suffix = "K",
scale = 1e-3))
We can also control the position of the legend within the visualization through the legend.position
argument within the theme
function (default is legend.position='right'
):
## Top ##
|>
trash_tot ggplot(aes(x=Sum_Trash,y=reorder(BOROUGH,Sum_Trash),fill=BOROUGH)) +
geom_bar(stat='identity',color='black') +
geom_text(aes(label=round(Sum_Trash)),family='serif',
fontface='bold.italic',color='#CE1141',hjust=-0.25) +
labs(y = "NYC Borough",
x = "Total Refuse Collected (in tons)",
title = "Trash Collected in NYC by Borough",
subtitle = "September 2011",
fill = "Borough") +
theme_classic() +
theme(axis.text.x = element_text(family = 'serif',
face = 'bold.italic'),
axis.title.x = element_text(family = 'serif',
face = 'bold.italic'),
axis.text.y = element_blank(),
axis.title.y = element_text(family = 'serif',
face = 'bold.italic'),
axis.ticks.y = element_blank(),
legend.position = "top") +
scale_x_continuous(limits = c(0,80000),
breaks = seq(0,80000,by=10000),
labels = scales::label_number(suffix = "K",
scale = 1e-3))
## Bottom ##
|>
trash_tot ggplot(aes(x=Sum_Trash,y=reorder(BOROUGH,Sum_Trash),fill=BOROUGH)) +
geom_bar(stat='identity',color='black') +
geom_text(aes(label=round(Sum_Trash)),family='serif',
fontface='bold.italic',color='#CE1141',hjust=-0.25) +
labs(y = "NYC Borough",
x = "Total Refuse Collected (in tons)",
title = "Trash Collected in NYC by Borough",
subtitle = "September 2011",
fill = "Borough") +
theme_classic() +
theme(axis.text.x = element_text(family = 'serif',
face = 'bold.italic'),
axis.title.x = element_text(family = 'serif',
face = 'bold.italic'),
axis.text.y = element_blank(),
axis.title.y = element_text(family = 'serif',
face = 'bold.italic'),
axis.ticks.y = element_blank(),
legend.position = "bottom") +
scale_x_continuous(limits = c(0,80000),
breaks = seq(0,80000,by=10000),
labels = scales::label_number(suffix = "K",
scale = 1e-3))
4.4.1 Changing Color Palettes
4.4.1.1 Viridis
In the above plot, the generated colors are the defaults. We can change the palette we use either manually or by using palettes within packages such as viridis
, which provides colorblind-friendly palettes.
library(viridis)
|>
trash_tot ggplot(aes(x=Sum_Trash,y=reorder(BOROUGH,Sum_Trash),fill=BOROUGH)) +
geom_bar(stat='identity',color='black') +
geom_text(aes(label=round(Sum_Trash)),family='serif',
fontface='bold.italic',color='#CE1141',hjust=-0.25) +
labs(y = "NYC Borough",
x = "Total Refuse Collected (in tons)",
title = "Trash Collected in NYC by Borough",
subtitle = "September 2011",
fill = "Borough") +
theme_classic() +
theme(axis.text.x = element_text(family = 'serif',
face = 'bold.italic'),
axis.title.x = element_text(family = 'serif',
face = 'bold.italic'),
axis.text.y = element_blank(),
axis.title.y = element_text(family = 'serif',
face = 'bold.italic'),
axis.ticks.y = element_blank(),
legend.position = "bottom") +
scale_x_continuous(limits = c(0,80000),
breaks = seq(0,80000,by=10000),
labels = scales::label_number(suffix = "K",
scale = 1e-3)) +
scale_fill_viridis(discrete = T)
Within scale_fill_viridis
, we have eight different palettes we can specify (A - H). So for example, if I want to employ the “turbo” option (“H”):
|>
trash_tot ggplot(aes(x=Sum_Trash,y=reorder(BOROUGH,Sum_Trash),fill=BOROUGH)) +
geom_bar(stat='identity',color='black') +
geom_text(aes(label=round(Sum_Trash)),family='serif',
fontface='bold.italic',color='#CE1141',hjust=-0.25) +
labs(y = "NYC Borough",
x = "Total Refuse Collected (in tons)",
title = "Trash Collected in NYC by Borough",
subtitle = "September 2011",
fill = "Borough") +
theme_classic() +
theme(axis.text.x = element_text(family = 'serif',
face = 'bold.italic'),
axis.title.x = element_text(family = 'serif',
face = 'bold.italic'),
axis.text.y = element_blank(),
axis.title.y = element_text(family = 'serif',
face = 'bold.italic'),
axis.ticks.y = element_blank(),
legend.position = "bottom") +
scale_x_continuous(limits = c(0,80000),
breaks = seq(0,80000,by=10000),
labels = scales::label_number(suffix = "K",
scale = 1e-3)) +
scale_fill_viridis(discrete = T,
option = "H")
4.4.1.2 RColorBrewer
The RColorBrewer
package also provides a nice list of palettes we can use to customize our visualization. Let’s take a look at all of our possibilities:
library(RColorBrewer)
print(brewer.pal.info)
maxcolors category colorblind
BrBG 11 div TRUE
PiYG 11 div TRUE
PRGn 11 div TRUE
PuOr 11 div TRUE
RdBu 11 div TRUE
RdGy 11 div FALSE
RdYlBu 11 div TRUE
RdYlGn 11 div FALSE
Spectral 11 div FALSE
Accent 8 qual FALSE
Dark2 8 qual TRUE
Paired 12 qual TRUE
Pastel1 9 qual FALSE
Pastel2 8 qual FALSE
Set1 9 qual FALSE
Set2 8 qual TRUE
Set3 12 qual FALSE
Blues 9 seq TRUE
BuGn 9 seq TRUE
BuPu 9 seq TRUE
GnBu 9 seq TRUE
Greens 9 seq TRUE
Greys 9 seq TRUE
Oranges 9 seq TRUE
OrRd 9 seq TRUE
PuBu 9 seq TRUE
PuBuGn 9 seq TRUE
PuRd 9 seq TRUE
Purples 9 seq TRUE
RdPu 9 seq TRUE
Reds 9 seq TRUE
YlGn 9 seq TRUE
YlGnBu 9 seq TRUE
YlOrBr 9 seq TRUE
YlOrRd 9 seq TRUE
The first set of palettes (labeled “div”) are best for quantitative data.
The third set (labeled “seq”) are best for quantitative data with clear extremes.
The middle set (labeled “qual”) is what would be most appropriate for us: the qualitative palettes.
Let’s try Pastel1
:
|>
trash_tot ggplot(aes(x=Sum_Trash,y=reorder(BOROUGH,Sum_Trash),fill=BOROUGH)) +
geom_bar(stat='identity',color='black') +
geom_text(aes(label=round(Sum_Trash)),family='serif',
fontface='bold.italic',color='#CE1141',hjust=-0.25) +
labs(y = "NYC Borough",
x = "Total Refuse Collected (in tons)",
title = "Trash Collected in NYC by Borough",
subtitle = "September 2011",
fill = "Borough") +
theme_classic() +
theme(axis.text.x = element_text(family = 'serif',
face = 'bold.italic'),
axis.title.x = element_text(family = 'serif',
face = 'bold.italic'),
axis.text.y = element_blank(),
axis.title.y = element_text(family = 'serif',
face = 'bold.italic'),
axis.ticks.y = element_blank(),
legend.position = "bottom") +
scale_x_continuous(limits = c(0,80000),
breaks = seq(0,80000,by=10000),
labels = scales::label_number(suffix = "K",
scale = 1e-3)) +
scale_fill_brewer(palette = "Pastel1")
4.4.1.3 Custom Palettes
Being able to use existing color palettes in R packages like viridis
and RColorBrewer
are nice! But there may be instances where we need to use a custom palette (consider colors for branding!).
To do this, we will employ the scale_fill_manual
function after creating a vector called borough_colors
which specifies which borough is assigned which specific color. Note, we can also use hex colors here rather than these specific named colors.
<- c("Bronx" = 'red',
borough_colors "Brooklyn" = 'blue',
"Manhattan" = "orange",
"Queens" = "yellow",
"Staten Island" = 'violet')
|>
trash_tot ggplot(aes(x=Sum_Trash,y=reorder(BOROUGH,Sum_Trash),fill=BOROUGH)) +
geom_bar(stat='identity',color='black') +
geom_text(aes(label=round(Sum_Trash)),family='serif',
fontface='bold.italic',color='#CE1141',hjust=-0.25) +
labs(y = "NYC Borough",
x = "Total Refuse Collected (in tons)",
title = "Trash Collected in NYC by Borough",
subtitle = "September 2011",
fill = "Borough") +
theme_classic() +
theme(axis.text.x = element_text(family = 'serif',
face = 'bold.italic'),
axis.title.x = element_text(family = 'serif',
face = 'bold.italic'),
axis.text.y = element_blank(),
axis.title.y = element_text(family = 'serif',
face = 'bold.italic'),
axis.ticks.y = element_blank(),
legend.position = "bottom") +
scale_x_continuous(limits = c(0,80000),
breaks = seq(0,80000,by=10000),
labels = scales::label_number(suffix = "K",
scale = 1e-3)) +
scale_fill_manual(values = borough_colors)