A custom scatterplot with auto-positioned labels to explore the
palmerpenguins dataset made with R
and the
tidyverse
. This blogpost guides you through a highly
customized scatterplot that includes a variety of custom colors,
markers, and fonts. The library ggrepel
is used to
automatically adjust the position of labels in the plots.
This page showcases the work of
Tuo Wang that introduces
packages to make
ggplot2
plots more beautiful. You can find the original code on Tuo’s blog
here.
Thanks to him for accepting sharing his work here! Thanks also to Tomás Capretto who split the original code into this step-by-step guide!
As usual, it is first necessary to load some packages before
building the figure.
ggrepel
provides geoms for ggplot2
to repel overlapping text
labels. Text labels repel away from each other, away from data
points, and away from edges of the plotting area in an automatic
fashion. Also,
randomNames
is used to generate random names that will be the text labels in the
chart.
Note: randomNames
is only available
for R > 4.0.0.
library(ggrepel)
library(palmerpenguins)
library(randomNames)
library(tidyverse)
The palmerpenguins data were collected and made available by
Dr. Kristen Gorman
and the
Palmer Station, Antarctica LTER, a member of the
Long Term Ecological Research Network. This dataset was popularized by
Allison Horst in her R
package
palmerpenguins
with the goal to offer an alternative to the iris dataset for data
exploration and visualization.
data("penguins", package = "palmerpenguins")
First of all, observations with missing values are discarded from the dataset.
<- drop_na(penguins) penguins
Then, each observation is assined a random name. A new variable
highlight
is added to the dataset to indicate which names
are highlighted in the plot. These are the names starting with the
letter "C"
.
## Generate random names
# The results of set.seed may depends on R version.
set.seed(2021+03+27)
<- randomNames(nrow(penguins), which.names = "first")
name_vector
## Create 'highlight' indicator variable
<- penguins %>%
penguins mutate(
name = name_vector,
highlight = case_when(
str_starts(name, "C") ~ name,
TRUE ~ ""
) )
Today’s chart is a scatterplot that shows the association between
the flip length and the bill length of the penguins in the dataset.
Point are colored according to the species to add an extra layer of
information to the vizualisation. The first step is to create a
basic colored scatterplot with ggplot2
. Let’s get
started!
# Note `color = species` and s`hape = species`.
# This means each species will have BOTH a different color and shape.
<- ggplot(penguins, aes(x = flipper_length_mm, y = bill_length_mm)) +
plt geom_point(
aes(color = species, shape = species),
size = 1.5,
alpha = 0.8 # It's nice to add some transparency because there may be overlap.
+
) # Use custom colors
scale_color_manual(
values = c("#386cb0", "#fdb462", "#7fc97f")
)
That’s a pretty good start, but let’s make it better!
What’s truely missing here are labels. It’s very frustrating not knowing which item is hidden under a data point, isn’t it!?
It is pretty challenging to add many labels on a plot since labels
tend to overlap each other, making the figure unreadable.
Fortunately, the ggrepel
package is here to help us. It
provides an algorithm that will automatically place the labels for
us. Let’s do it!
<- plt +
plt geom_text_repel(
aes(label = highlight),
family = "Poppins",
size = 3,
min.segment.length = 0,
seed = 42,
box.padding = 0.5,
max.overlaps = Inf,
arrow = arrow(length = unit(0.010, "npc")),
nudge_x = .15,
nudge_y = .5,
color = "grey50"
)
Isn’t it wonderful how well ggrepel
works?
The chart above is pretty close from being publication ready. What’s needed now is a good title, a legend to make color and shapes more insightful, and some axis customization:
<- plt +
plt # Add axes labels, title, and subtitle
labs(
title = "Palmer Penguins Data Visualization",
subtitle = "Scatter plot of flipper lenth vs bill length",
x = "flip length (mm)",
y = "bill length (mm)"
+
) theme(
# The default font when not explicitly specified
text = element_text(family = "Lobster Two", size = 8, color = "black"),
# Customize legend text, position, and background.
legend.text = element_text(size = 9, family = "Roboto"),
legend.title = element_text(face = "bold", size=12, family = "Roboto"),
legend.position = c(1, 0),
legend.justification = c(1, 0),
legend.background = element_blank(),
# This one removes the background behind each key in the legend
legend.key = element_blank(),
# Customize title and subtitle font/size/color
plot.title = element_text(
family = "Lobster Two",
size = 20,
face = "bold",
color = "#2a475e"
),plot.subtitle = element_text(
family = "Lobster Two",
size = 15,
face = "bold",
color = "#1b2838"
),plot.title.position = "plot",
# Adjust axis parameters such as size and color.
axis.text = element_text(size = 10, color = "black"),
axis.title = element_text(size = 12),
axis.ticks = element_blank(),
# Axis lines are now lighter than default
axis.line = element_line(colour = "grey50"),
# Only keep y-axis major grid lines, with a grey color and dashed type.
panel.grid.minor = element_blank(),
panel.grid.major.x = element_blank(),
panel.grid.major.y = element_line(color = "#b4aea9", linetype ="dashed"),
# Use a light color for the background of the plot and the panel.
panel.background = element_rect(fill = "#fbf9f4", color = "#fbf9f4"),
plot.background = element_rect(fill = "#fbf9f4", color = "#fbf9f4")
)
What a lovely plot!