Data visualization has become an indispensable component of modern psychology research, enabling researchers to transform complex datasets into meaningful visual narratives that facilitate both analysis and communication. R offers significant advantages for data visualization through its reproducibility and transparency, benefits that align with the growing emphasis on open science practices in psychological research. This comprehensive guide explores how psychologists can harness the power of R to create sophisticated, publication-ready visualizations that enhance research quality and impact.
The Critical Role of Data Visualization in Psychology Research
In the wake of psychology's replication crisis, the field has increasingly recognized the importance of transparent and reproducible research practices. Data visualization benefits from the same advantages as statistical analysis when writing code rather than using point-and-click software—reproducibility and transparency, with the additional practical benefit that code can be reused and adapted for future projects rather than starting from scratch each time.
Focusing on data visualization early in the research process allows researchers to get a good grasp of their data and any general patterns within those data prior to running any inferential tests. This exploratory approach helps identify outliers, understand distributions, and detect potential issues before conducting formal statistical analyses.
Data visualization has always been a vital component of any research project, as it facilitates the extraction of meaningful insights regardless of the data's complexity, and enables researchers to effectively communicate their results within their own discipline, across interdisciplinary boundaries and with a lay audience. The ability to create compelling visualizations has become increasingly valuable as research findings are shared through diverse channels, from academic journals to social media platforms.
Why R Stands Out for Data Visualization in Psychology
Unparalleled Flexibility and Customization
R is very powerful when it comes to data visualization, and while the core capabilities of R are impressive, it's the myriad of specialized packages that elevate its potential to unparalleled heights, with ggplot2 standing out as a foundational tool offering a versatile platform for creating a wide array of plots. This flexibility is particularly valuable in psychology research, where diverse data types and research questions demand tailored visualization approaches.
Using R for data visualization gives the researcher almost total control over each element of the plot, and although this flexibility can seem daunting at first, the ability to write reusable code recipes and use recipes created by others is highly advantageous. This level of customization enables psychologists to create visualizations that precisely match their research needs and publication requirements.
Professional-Quality Output
ggplot2 tends to make really good looking production-ready plots, with Hadley Wickham being influenced by works of Edward Tufte when developing ggplot2. The level of customization and the professional outputs available using R has led news outlets such as the BBC and The New York Times to adopt R as their preferred data-visualization tool. This professional pedigree demonstrates R's capability to produce publication-ready graphics suitable for the most demanding contexts.
Open-Source Ecosystem and Community Support
One of the advantages of using R is that researchers have a much larger range of fully customizable data visualization options than are typically available in point-and-click software because of the open-source nature of R. The R community continuously develops new packages and extensions, ensuring that researchers have access to cutting-edge visualization techniques.
The CRAN repository of R packages has now topped a dizzying 18,000+ packages, meaning there are packages for practically any data visualization task you can imagine, from visualizing cancer genomes to graphing the action of a book. This extensive ecosystem provides psychologists with specialized tools for virtually any visualization challenge they might encounter.
Integration with Statistical Analysis
R's strength lies in its seamless integration of statistical analysis and visualization within a single environment. Unlike workflows that require switching between different software packages, R allows researchers to import data, conduct analyses, create visualizations, and generate reports all within one cohesive framework. This integration reduces errors, improves efficiency, and enhances the reproducibility of research findings.
Getting Started: Setting Up Your R Environment
Installing R and RStudio
R is the programming language, whereas RStudio is an integrated development environment that makes working with R easier. To begin your data visualization journey, you'll need to install both R (the underlying statistical programming language) and RStudio (the user-friendly interface that makes working with R more intuitive).
First, download and install R from the Comprehensive R Archive Network (CRAN) at https://cran.r-project.org/. Choose the version appropriate for your operating system (Windows, Mac, or Linux). After installing R, download and install RStudio Desktop from https://posit.co/download/rstudio-desktop/, which provides a more user-friendly interface for working with R.
Essential Packages for Psychology Research
The tidyverse is an opinionated collection of R packages designed for data science that share an underlying design philosophy, grammar, and data structures, including many packages for data import, tidying, and visualization. For psychology researchers, the tidyverse ecosystem provides a coherent set of tools that work seamlessly together.
To install the essential packages for data visualization, open RStudio and run the following commands in the console:
# Install the tidyverse collection (includes ggplot2, dplyr, tidyr, and more)
install.packages("tidyverse")
# Install additional visualization packages
install.packages("plotly") # For interactive visualizations
install.packages("patchwork") # For combining multiple plots
install.packages("ggpubr") # Publication-ready plots
install.packages("viridis") # Color-blind friendly palettes
Once installed, you'll need to load these packages at the beginning of each R session:
library(tidyverse) # Loads ggplot2 and other core packages
library(plotly)
library(patchwork)
Understanding the Grammar of Graphics
Participants are introduced to the concepts explored in "The Layered Grammar of Graphics" by Hadley Wickham (2010), and they learn about the modern application of Gestalt psychology principles, discovered nearly 100 years ago, and how they can be used to develop and evaluate visualizations. This theoretical foundation is crucial for creating effective visualizations.
ggplot2 uses a grammar-based approach of describing a plot that makes it conceptually different from most other software such as Matlab, Matplotlib in Python, etc. This grammar-based approach breaks down visualizations into distinct components that can be combined and layered to create complex graphics.
The Core Components of ggplot2
ggplot2 is a flexible and useful tool for creating plots in R, where the data set and coordinate system can be defined using the ggplot function, and additional layers, including geoms, are added using the + operator. Understanding these components is essential for creating effective visualizations:
- Data: The dataset you want to visualize
- Aesthetics (aes): How variables map to visual properties (x-axis, y-axis, color, size, shape)
- Geometries (geom): The type of plot (points, lines, bars, boxes)
- Scales: Control how data values map to visual properties
- Themes: Overall visual appearance of the plot
- Facets: Creating multiple plots based on subgroups
The initial call specifies the most important part: how individual variables map on various properties even before telling ggplot2 which visuals will be used to plot the data, and when the x-axis is specified to represent a variable, ggplot2 figures out the range of values and the axis's label. This intelligent design means that much of the work is done automatically, allowing researchers to focus on the substance of their visualizations.
Creating Your First Visualizations
Building a Basic Scatter Plot
Scatter plots are fundamental for examining relationships between continuous variables, a common task in psychology research. Here's a comprehensive example using simulated data from a study examining the relationship between anxiety and stress levels:
library(ggplot2)
# Create example dataset
psych_data <- data.frame(
participant_id = 1:50,
anxiety = rnorm(50, mean = 45, sd = 10),
stress = rnorm(50, mean = 38, sd = 8),
group = rep(c("Control", "Treatment"), each = 25)
)
# Create basic scatter plot
ggplot(psych_data, aes(x = anxiety, y = stress)) +
geom_point() +
labs(
title = "Relationship Between Anxiety and Stress Levels",
x = "Anxiety Score (GAD-7)",
y = "Perceived Stress Score (PSS-10)"
)
# Enhanced scatter plot with color and trend line
ggplot(psych_data, aes(x = anxiety, y = stress, color = group)) +
geom_point(size = 3, alpha = 0.7) +
geom_smooth(method = "lm", se = TRUE) +
scale_color_manual(values = c("Control" = "#2E86AB", "Treatment" = "#A23B72")) +
labs(
title = "Anxiety-Stress Relationship by Treatment Group",
x = "Anxiety Score (GAD-7)",
y = "Perceived Stress Score (PSS-10)",
color = "Group"
) +
theme_minimal() +
theme(
plot.title = element_text(size = 14, face = "bold"),
legend.position = "bottom"
)
Creating Distribution Plots
Understanding the distribution of your data is crucial before conducting statistical analyses. Here are several approaches to visualizing distributions:
# Histogram
ggplot(psych_data, aes(x = anxiety)) +
geom_histogram(binwidth = 5, fill = "#2E86AB", color = "white", alpha = 0.8) +
labs(
title = "Distribution of Anxiety Scores",
x = "Anxiety Score (GAD-7)",
y = "Frequency"
) +
theme_minimal()
# Density plot
ggplot(psych_data, aes(x = anxiety, fill = group)) +
geom_density(alpha = 0.6) +
scale_fill_manual(values = c("Control" = "#2E86AB", "Treatment" = "#A23B72")) +
labs(
title = "Anxiety Score Distribution by Group",
x = "Anxiety Score (GAD-7)",
y = "Density",
fill = "Group"
) +
theme_minimal()
Advanced Visualization Techniques for Psychology Research
Box Plots and Violin Plots
Boxplots are useful for visualizing the distribution of a continuous variable. The tutorial walks the reader through how to replicate plots that are commonly available in point-and-click software, such as histograms and box plots, and shows how the code for these "basic" plots can be easily extended to less commonly available options, such as violin box plots.
Box plots provide a concise summary of data distribution, showing the median, quartiles, and potential outliers. Violin plots extend this by showing the full distribution shape:
# Basic box plot
ggplot(psych_data, aes(x = group, y = stress, fill = group)) +
geom_boxplot(alpha = 0.7) +
scale_fill_manual(values = c("Control" = "#2E86AB", "Treatment" = "#A23B72")) +
labs(
title = "Stress Levels by Treatment Group",
x = "Group",
y = "Perceived Stress Score (PSS-10)"
) +
theme_minimal() +
theme(legend.position = "none")
# Violin plot with individual points
ggplot(psych_data, aes(x = group, y = stress, fill = group)) +
geom_violin(alpha = 0.6, trim = FALSE) +
geom_boxplot(width = 0.2, alpha = 0.8, outlier.shape = NA) +
geom_jitter(width = 0.1, alpha = 0.3, size = 2) +
scale_fill_manual(values = c("Control" = "#2E86AB", "Treatment" = "#A23B72")) +
labs(
title = "Distribution of Stress Scores by Group",
subtitle = "Violin plot with overlaid boxplot and individual data points",
x = "Group",
y = "Perceived Stress Score (PSS-10)"
) +
theme_minimal() +
theme(legend.position = "none")
Bar Plots for Categorical Data
Bar plots are one of the most common plots in Psychology, most useful for representing counts of data that are divided into categories, such as showing the number of male and female participants in a study. However, it's important to use bar plots appropriately and consider alternatives when visualizing continuous data.
# Create categorical data
response_data <- data.frame(
condition = rep(c("Baseline", "Post-Treatment", "Follow-up"), each = 3),
response = rep(c("Improved", "No Change", "Worsened"), 3),
count = c(45, 30, 15, 60, 20, 10, 55, 25, 10)
)
# Stacked bar plot
ggplot(response_data, aes(x = condition, y = count, fill = response)) +
geom_bar(stat = "identity", position = "stack") +
scale_fill_manual(values = c("Improved" = "#06A77D",
"No Change" = "#F5B700",
"Worsened" = "#D62246")) +
labs(
title = "Treatment Response Across Time Points",
x = "Assessment Time",
y = "Number of Participants",
fill = "Response Category"
) +
theme_minimal()
# Grouped bar plot
ggplot(response_data, aes(x = condition, y = count, fill = response)) +
geom_bar(stat = "identity", position = "dodge") +
scale_fill_manual(values = c("Improved" = "#06A77D",
"No Change" = "#F5B700",
"Worsened" = "#D62246")) +
labs(
title = "Treatment Response Comparison",
x = "Assessment Time",
y = "Number of Participants",
fill = "Response Category"
) +
theme_minimal()
Faceting for Complex Comparisons
Faceting allows you to generate multiple plots based on a categorical variable. This technique is particularly valuable in psychology research when examining how relationships vary across different subgroups or conditions.
# Create more complex dataset
longitudinal_data <- data.frame(
participant_id = rep(1:30, each = 3),
time_point = rep(c("Baseline", "Week 4", "Week 8"), 30),
depression = rnorm(90, mean = 25, sd = 8),
anxiety = rnorm(90, mean = 30, sd = 7),
treatment = rep(rep(c("CBT", "Medication", "Combined"), each = 10), 3),
age_group = rep(rep(c("18-35", "36-50", "51+"), each = 10), 3)
)
# Faceted scatter plot by treatment type
ggplot(longitudinal_data, aes(x = depression, y = anxiety, color = time_point)) +
geom_point(size = 2, alpha = 0.6) +
geom_smooth(method = "lm", se = FALSE) +
facet_wrap(~ treatment, ncol = 3) +
scale_color_viridis_d(option = "plasma", end = 0.8) +
labs(
title = "Depression-Anxiety Relationship Across Treatment Types",
x = "Depression Score (PHQ-9)",
y = "Anxiety Score (GAD-7)",
color = "Time Point"
) +
theme_minimal() +
theme(
strip.text = element_text(size = 11, face = "bold"),
legend.position = "bottom"
)
# Faceted by multiple variables
ggplot(longitudinal_data, aes(x = time_point, y = depression, fill = treatment)) +
geom_boxplot(alpha = 0.7) +
facet_grid(age_group ~ treatment) +
scale_fill_brewer(palette = "Set2") +
labs(
title = "Depression Scores by Treatment, Time, and Age Group",
x = "Time Point",
y = "Depression Score (PHQ-9)",
fill = "Treatment"
) +
theme_minimal() +
theme(
axis.text.x = element_text(angle = 45, hjust = 1),
legend.position = "none"
)
Correlation Matrices and Heatmaps
Heatmaps are excellent for visualizing correlation matrices, which are common in psychology research when examining relationships among multiple variables:
library(reshape2)
# Create correlation matrix
psych_measures <- data.frame(
Depression = rnorm(100, 25, 8),
Anxiety = rnorm(100, 30, 7),
Stress = rnorm(100, 35, 9),
Sleep_Quality = rnorm(100, 6, 2),
Life_Satisfaction = rnorm(100, 5, 1.5)
)
# Calculate correlations
cor_matrix <- cor(psych_measures)
cor_melted <- melt(cor_matrix)
# Create heatmap
ggplot(cor_melted, aes(x = Var1, y = Var2, fill = value)) +
geom_tile(color = "white") +
geom_text(aes(label = round(value, 2)), color = "black", size = 4) +
scale_fill_gradient2(
low = "#2E86AB",
mid = "white",
high = "#A23B72",
midpoint = 0,
limits = c(-1, 1)
) +
labs(
title = "Correlation Matrix of Psychological Measures",
x = "",
y = "",
fill = "Correlation"
) +
theme_minimal() +
theme(
axis.text.x = element_text(angle = 45, hjust = 1),
plot.title = element_text(size = 14, face = "bold")
)
Working with Tidy Data
Traditionally, psychologists have been taught data skills using wide-format data, which typically has one row of data for each participant with separate columns for each score or variable, and for repeated measures variables, the dependent variable is split across different columns. However, ggplot2 works best with data in "long" or "tidy" format.
Converting Wide to Long Format
Understanding how to reshape data is essential for effective visualization in R:
library(tidyr)
# Wide format data (typical psychology data structure)
wide_data <- data.frame(
participant = 1:20,
baseline_depression = rnorm(20, 25, 5),
week4_depression = rnorm(20, 20, 5),
week8_depression = rnorm(20, 15, 5),
baseline_anxiety = rnorm(20, 30, 6),
week4_anxiety = rnorm(20, 25, 6),
week8_anxiety = rnorm(20, 20, 6)
)
# Convert to long format
long_data <- wide_data %>%
pivot_longer(
cols = -participant,
names_to = c("time", "measure"),
names_pattern = "(.*)_(.*)",
values_to = "score"
)
# Now easy to visualize
ggplot(long_data, aes(x = time, y = score, color = measure, group = interaction(participant, measure))) +
geom_line(alpha = 0.3) +
geom_point(alpha = 0.5) +
stat_summary(aes(group = measure), fun = mean, geom = "line", size = 2) +
stat_summary(aes(group = measure), fun = mean, geom = "point", size = 4) +
facet_wrap(~ measure, scales = "free_y") +
scale_color_manual(values = c("depression" = "#2E86AB", "anxiety" = "#A23B72")) +
labs(
title = "Individual and Mean Trajectories of Depression and Anxiety",
x = "Time Point",
y = "Score",
color = "Measure"
) +
theme_minimal() +
theme(legend.position = "none")
Interactive Visualizations with Plotly
Plotly is a library that can create interactive plots with minimal code and integrates seamlessly with R, with the ggplotly() function being particularly useful for building interactive data experiences in R. Plotly's forte is making interactive plots, and it offers some charts you won't find in most packages, like contour plots, candlestick charts, and 3D charts.
Interactive visualizations are particularly valuable for exploratory data analysis and for presentations where you want to allow your audience to explore the data:
library(plotly)
# Create a ggplot2 plot
p <- ggplot(psych_data, aes(x = anxiety, y = stress, color = group,
text = paste("Participant:", participant_id,
"<br>Anxiety:", round(anxiety, 1),
"<br>Stress:", round(stress, 1)))) +
geom_point(size = 3, alpha = 0.7) +
scale_color_manual(values = c("Control" = "#2E86AB", "Treatment" = "#A23B72")) +
labs(
title = "Interactive Anxiety-Stress Relationship",
x = "Anxiety Score (GAD-7)",
y = "Perceived Stress Score (PSS-10)",
color = "Group"
) +
theme_minimal()
# Convert to interactive plotly plot
ggplotly(p, tooltip = "text")
# Create native plotly 3D scatter plot
plot_ly(
data = psych_data,
x = ~anxiety,
y = ~stress,
z = ~rnorm(50, 40, 8), # Adding a third dimension
color = ~group,
colors = c("#2E86AB", "#A23B72"),
type = "scatter3d",
mode = "markers",
marker = list(size = 5)
) %>%
layout(
title = "3D Visualization of Psychological Measures",
scene = list(
xaxis = list(title = "Anxiety"),
yaxis = list(title = "Stress"),
zaxis = list(title = "Depression")
)
)
Customizing Visualizations for Publication
Themes and Styling
ggplot2 has a number of built-in visual themes that you can apply as an extra layer, and each part of a theme can be independently customized, which may be necessary if you have journal guidelines on fonts for publication. Creating publication-ready visualizations requires attention to detail in styling and formatting.
# Create a custom theme for publication
theme_publication <- function(base_size = 12) {
theme_minimal(base_size = base_size) +
theme(
# Text elements
plot.title = element_text(size = base_size + 2, face = "bold", hjust = 0),
plot.subtitle = element_text(size = base_size, color = "gray40", hjust = 0),
axis.title = element_text(size = base_size, face = "bold"),
axis.text = element_text(size = base_size - 1),
legend.title = element_text(size = base_size, face = "bold"),
legend.text = element_text(size = base_size - 1),
# Grid and background
panel.grid.major = element_line(color = "gray90", size = 0.3),
panel.grid.minor = element_blank(),
panel.background = element_rect(fill = "white", color = NA),
plot.background = element_rect(fill = "white", color = NA),
# Legend
legend.position = "bottom",
legend.background = element_rect(fill = "white", color = "gray80"),
legend.key = element_rect(fill = "white", color = NA),
# Margins
plot.margin = margin(10, 10, 10, 10)
)
}
# Apply custom theme
ggplot(psych_data, aes(x = anxiety, y = stress, color = group)) +
geom_point(size = 3, alpha = 0.7) +
geom_smooth(method = "lm", se = TRUE, alpha = 0.2) +
scale_color_manual(values = c("Control" = "#2E86AB", "Treatment" = "#A23B72")) +
labs(
title = "Relationship Between Anxiety and Stress by Treatment Group",
subtitle = "N = 50 participants (25 per group)",
x = "Anxiety Score (GAD-7)",
y = "Perceived Stress Score (PSS-10)",
color = "Group",
caption = "Error bands represent 95% confidence intervals"
) +
theme_publication()
Color Considerations for Accessibility
One issue with using ggplot2 for visualization is that the default color scheme is not accessible, as the red and green default palette is difficult for color-blind people to differentiate and also does not display well in gray scale, but you can specify exact custom colors for your plots or use a custom color palette.
For categorical colors, the "Set2", "Dark2", and "Paired" palettes from the brewer scale functions are color-blind-safe. Using accessible color palettes ensures your visualizations can be interpreted by all readers:
library(viridis)
library(RColorBrewer)
# Using viridis color scales (color-blind friendly)
ggplot(longitudinal_data, aes(x = time_point, y = depression, fill = treatment)) +
geom_boxplot(alpha = 0.8) +
scale_fill_viridis_d(option = "plasma", end = 0.8) +
labs(
title = "Depression Scores Across Treatment Types",
x = "Time Point",
y = "Depression Score (PHQ-9)",
fill = "Treatment Type"
) +
theme_publication()
# Using ColorBrewer palettes
ggplot(response_data, aes(x = condition, y = count, fill = response)) +
geom_bar(stat = "identity", position = "dodge") +
scale_fill_brewer(palette = "Set2") +
labs(
title = "Treatment Response Distribution",
x = "Assessment Time",
y = "Number of Participants",
fill = "Response"
) +
theme_publication()
Combining Multiple Plots
The patchwork package lets you combine multiple ggplot2 plots into one visual masterpiece, and is a game-changer for anyone who's struggled with the often cumbersome task of combining plots. This is particularly useful for creating comprehensive figures for publications:
library(patchwork)
# Create individual plots
p1 <- ggplot(psych_data, aes(x = group, y = anxiety, fill = group)) +
geom_violin(alpha = 0.6) +
geom_boxplot(width = 0.2, alpha = 0.8) +
scale_fill_manual(values = c("Control" = "#2E86AB", "Treatment" = "#A23B72")) +
labs(title = "A) Anxiety Scores", x = "", y = "Anxiety (GAD-7)") +
theme_publication() +
theme(legend.position = "none")
p2 <- ggplot(psych_data, aes(x = group, y = stress, fill = group)) +
geom_violin(alpha = 0.6) +
geom_boxplot(width = 0.2, alpha = 0.8) +
scale_fill_manual(values = c("Control" = "#2E86AB", "Treatment" = "#A23B72")) +
labs(title = "B) Stress Scores", x = "", y = "Stress (PSS-10)") +
theme_publication() +
theme(legend.position = "none")
p3 <- ggplot(psych_data, aes(x = anxiety, y = stress, color = group)) +
geom_point(size = 2, alpha = 0.6) +
geom_smooth(method = "lm", se = TRUE) +
scale_color_manual(values = c("Control" = "#2E86AB", "Treatment" = "#A23B72")) +
labs(title = "C) Anxiety-Stress Relationship",
x = "Anxiety (GAD-7)",
y = "Stress (PSS-10)",
color = "Group") +
theme_publication()
# Combine plots with patchwork
combined_plot <- (p1 | p2) / p3 +
plot_annotation(
title = "Comprehensive Analysis of Anxiety and Stress by Treatment Group",
subtitle = "Violin plots show distribution, scatter plot shows relationship",
caption = "N = 50 participants; Error bands represent 95% CI",
theme = theme(plot.title = element_text(size = 16, face = "bold"))
)
combined_plot
Specialized Visualizations for Psychology Research
Effect Size Visualizations
Visualizing effect sizes and confidence intervals is increasingly important in psychology research, particularly in the context of the replication crisis and emphasis on effect size reporting:
library(ggpubr)
# Create effect size data
effect_data <- data.frame(
study = c("Study 1", "Study 2", "Study 3", "Study 4", "Meta-Analysis"),
effect_size = c(0.45, 0.38, 0.52, 0.41, 0.44),
lower_ci = c(0.25, 0.18, 0.32, 0.21, 0.35),
upper_ci = c(0.65, 0.58, 0.72, 0.61, 0.53),
type = c(rep("Individual", 4), "Summary")
)
# Forest plot
ggplot(effect_data, aes(x = effect_size, y = study, color = type)) +
geom_point(size = 4) +
geom_errorbarh(aes(xmin = lower_ci, xmax = upper_ci), height = 0.2, size = 1) +
geom_vline(xintercept = 0, linetype = "dashed", color = "gray50") +
scale_color_manual(values = c("Individual" = "#2E86AB", "Summary" = "#A23B72")) +
labs(
title = "Effect Sizes Across Studies: CBT for Depression",
subtitle = "Cohen's d with 95% confidence intervals",
x = "Effect Size (Cohen's d)",
y = "",
color = "Study Type"
) +
theme_publication() +
theme(
panel.grid.major.y = element_blank(),
legend.position = "bottom"
)
Likert Scale Visualizations
Likert scales are ubiquitous in psychology research, and specialized visualizations can effectively communicate response patterns:
# Create Likert scale data
likert_data <- data.frame(
item = rep(c("I feel confident in my abilities",
"I can handle difficult situations",
"I believe in my decision-making",
"I trust my judgment"), each = 5),
response = rep(c("Strongly Disagree", "Disagree", "Neutral",
"Agree", "Strongly Agree"), 4),
percentage = c(5, 10, 15, 45, 25,
3, 8, 20, 50, 19,
7, 12, 18, 40, 23,
4, 9, 22, 48, 17)
)
# Convert response to factor with correct order
likert_data$response <- factor(likert_data$response,
levels = c("Strongly Disagree", "Disagree",
"Neutral", "Agree", "Strongly Agree"))
# Create diverging stacked bar chart
ggplot(likert_data, aes(x = item, y = percentage, fill = response)) +
geom_bar(stat = "identity", position = "stack") +
coord_flip() +
scale_fill_manual(
values = c("Strongly Disagree" = "#D62246",
"Disagree" = "#F5B700",
"Neutral" = "#E8E8E8",
"Agree" = "#06A77D",
"Strongly Agree" = "#005F73")
) +
labs(
title = "Self-Efficacy Scale Responses",
subtitle = "Percentage distribution across items (N = 200)",
x = "",
y = "Percentage",
fill = "Response"
) +
theme_publication() +
theme(
axis.text.y = element_text(hjust = 0),
legend.position = "bottom"
)
Survival Curves and Time-to-Event Data
For longitudinal studies examining time to relapse, dropout, or other events, survival curves provide valuable insights:
library(survival)
library(survminer)
# Create survival data
surv_data <- data.frame(
time = c(rexp(50, 0.1), rexp(50, 0.05)),
status = c(rbinom(50, 1, 0.6), rbinom(50, 1, 0.4)),
treatment = rep(c("Standard Care", "Enhanced Treatment"), each = 50)
)
# Fit survival model
fit <- survfit(Surv(time, status) ~ treatment, data = surv_data)
# Create survival plot
ggsurvplot(
fit,
data = surv_data,
pval = TRUE,
conf.int = TRUE,
risk.table = TRUE,
risk.table.height = 0.25,
palette = c("#2E86AB", "#A23B72"),
title = "Time to Relapse by Treatment Type",
xlab = "Time (Months)",
ylab = "Relapse-Free Probability",
legend.title = "Treatment",
legend.labs = c("Standard Care", "Enhanced Treatment"),
ggtheme = theme_publication()
)
Best Practices for Data Visualization in Psychology
Show the Data, Not Just Summaries
One of the most important principles in modern data visualization is to show the underlying data whenever possible, rather than relying solely on summary statistics. Bar charts showing only means and error bars can hide important distributional information. Consider using violin plots, jitter plots, or individual data points overlaid on summary statistics.
Ensure Reproducibility
One of R's greatest strengths is reproducibility. Always save your visualization code in well-documented scripts. Use relative file paths, include session information, and consider using R Markdown or Quarto documents to integrate your code, visualizations, and narrative text into a single reproducible document.
# Example of well-documented visualization code
# Project: Depression Treatment Study
# Author: [Your Name]
# Date: 2024-04-07
# Purpose: Create publication figures for main analysis
# Load required packages
library(tidyverse)
library(patchwork)
# Set random seed for reproducibility
set.seed(12345)
# Load data
data <- read_csv("data/treatment_study.csv")
# Create visualization
plot <- ggplot(data, aes(x = time, y = depression_score, color = treatment)) +
geom_point() +
geom_smooth(method = "lm") +
theme_minimal()
# Save plot
ggsave("figures/figure1_treatment_effects.png",
plot = plot,
width = 8,
height = 6,
dpi = 300)
# Print session information for reproducibility
sessionInfo()
Consider Your Audience
Different audiences require different visualization approaches. Academic papers may benefit from more technical, detailed visualizations, while presentations to stakeholders or the public may require simpler, more intuitive graphics. Interactive visualizations work well for online presentations but need static alternatives for print publications.
Avoid Chart Junk and Maintain Clarity
Following principles from Edward Tufte's work on data visualization, focus on maximizing the data-to-ink ratio. Remove unnecessary gridlines, decorative elements, and redundant labels. Every element in your visualization should serve a purpose in communicating your data.
Saving and Exporting Visualizations
Creating high-quality visualizations is only part of the process—you also need to export them in appropriate formats for different uses:
# Save as high-resolution PNG for presentations
ggsave("figure1.png",
plot = last_plot(),
width = 10,
height = 6,
dpi = 300,
bg = "white")
# Save as PDF for publications (vector format, scalable)
ggsave("figure1.pdf",
plot = last_plot(),
width = 10,
height = 6,
device = cairo_pdf)
# Save as TIFF for some journal requirements
ggsave("figure1.tiff",
plot = last_plot(),
width = 10,
height = 6,
dpi = 600,
compression = "lzw")
# Save with specific dimensions in inches (common journal requirement)
ggsave("figure1_journal.pdf",
plot = last_plot(),
width = 7, # Single column width
height = 5,
units = "in")
Resources for Continued Learning
The R visualization ecosystem is vast and continuously evolving. Here are valuable resources for deepening your knowledge:
Essential Books and Tutorials
- R for Data Science by Hadley Wickham and Garrett Grolemund - A comprehensive introduction to the tidyverse ecosystem, available free online at https://r4ds.had.co.nz/
- ggplot2: Elegant Graphics for Data Analysis by Hadley Wickham - The definitive guide to ggplot2
- Fundamentals of Data Visualization by Claus O. Wilke - Excellent principles for creating effective visualizations, available at https://clauswilke.com/dataviz/
- The Visual Display of Quantitative Information by Edward Tufte - Classic text on visualization principles
Online Resources and Communities
- R Graph Gallery (https://r-graph-gallery.com/) - Hundreds of chart examples with reproducible code
- RStudio Community - Active forum for asking questions and sharing solutions
- Stack Overflow - Searchable database of R programming questions and answers
- Twitter/X #rstats community - Active community sharing tips, visualizations, and resources
Specialized Packages Worth Exploring
- ggpubr - Publication-ready plots with statistical comparisons
- gganimate - Create animated visualizations
- ggridges - Ridge plots for visualizing distributions
- ggraph - Network and graph visualizations
- corrplot - Specialized correlation matrix visualizations
- waffle - Waffle charts and pictograms
- ggalluvial - Alluvial diagrams for categorical data flows
Common Pitfalls and How to Avoid Them
Overplotting in Scatter Plots
When you have many data points, they can overlap and obscure patterns. Solutions include using transparency (alpha parameter), jittering points slightly, using 2D density plots, or creating hexbin plots:
# Problem: Overplotting
ggplot(large_dataset, aes(x = var1, y = var2)) +
geom_point()
# Solution 1: Add transparency
ggplot(large_dataset, aes(x = var1, y = var2)) +
geom_point(alpha = 0.3)
# Solution 2: Use 2D density
ggplot(large_dataset, aes(x = var1, y = var2)) +
geom_density_2d_filled() +
scale_fill_viridis_d()
# Solution 3: Hexbin plot
ggplot(large_dataset, aes(x = var1, y = var2)) +
geom_hex() +
scale_fill_viridis_c()
Inappropriate Use of Dual Y-Axes
Dual y-axes can be misleading and are generally discouraged. Instead, consider faceting, using different plot types, or standardizing your variables before plotting.
Ignoring Data Structure
Psychology data often has hierarchical or nested structures (participants within groups, repeated measures within participants). Ensure your visualizations account for this structure, perhaps by using different colors for different levels or creating separate panels.
Integrating Visualizations into Your Research Workflow
Effective data visualization should be integrated throughout your research process, not just added at the end for publication. Use visualizations during data cleaning to identify errors and outliers, during exploratory analysis to understand patterns and relationships, during model building to assess assumptions and fit, and during communication to present findings clearly.
Consider creating a standardized visualization script template for your lab or research group. This ensures consistency across projects and makes it easier to generate visualizations quickly. Include sections for data loading, preprocessing, exploratory plots, publication figures, and supplementary materials.
Conclusion
R provides a practical approach to data visualization specifically aimed at researchers, detailing the rationale for using R for data visualization and introducing the "grammar of graphics" that underlies data visualization using the ggplot2 package. The investment in learning R for data visualization pays substantial dividends for psychology researchers through enhanced reproducibility, greater flexibility, and the ability to create publication-quality graphics.
While the initial learning curve may seem steep, the grammar of graphics approach ultimately makes creating complex visualizations more intuitive and systematic. Using the grammar of graphics is a paradigm change that will make your life 100 times easier. By mastering these tools, psychologists can transform their data into compelling visual narratives that advance scientific understanding and effectively communicate research findings to diverse audiences.
The key to success is practice and iteration. Start with simple visualizations and gradually incorporate more advanced techniques as you become comfortable with the basics. Engage with the R community, explore examples in the R Graph Gallery, and don't hesitate to adapt code from others to suit your needs. Remember that every expert R user started as a beginner, and the supportive R community is always ready to help.
As psychology continues to embrace open science practices and reproducible research, proficiency in R data visualization will become increasingly valuable. Whether you're creating exploratory plots to understand your data, diagnostic plots to check statistical assumptions, or publication-ready figures to communicate your findings, R provides the tools you need to turn data into insights. The journey from raw data to meaningful visualization is both a technical skill and an art form—one that enhances not only how we present research but how we think about and understand psychological phenomena.