9  Factor Ordering

9.1 Introduction

R’s factor system is both powerful and treacherous. Alphabetical ordering makes categorical plots hard to read, and accidental conversion of numeric factors causes data corruption. This chapter teaches you how to control factor ordering and avoid common pitfalls.

Learning Objectives

By the end of this chapter, you will:

  • Understand why alphabetical factor ordering is rarely useful
  • Learn to reorder factors by frequency, by another variable, or manually
  • Master the forcats package for factor manipulation
  • Know how to properly convert factors to numeric without data loss
  • Avoid the “numeric factor” trap (numbered categories vs. actual numbers)
  • Create publication-ready plots with meaningful orderings

9.2 The Problem: Alphabetical Ordering

# Create data with logical categories
category_df <- data.frame(
  category = c("Low", "Medium", "High", "Very High", "Low", "High"),
  value = c(10, 20, 15, 30, 12, 25)
)

# R defaults to alphabetical!
p1 <- ggplot(category_df, aes(x = category, y = value)) +
  geom_col(fill = "coral") +
  theme_classic(base_size = 12) +
  labs(title = "Alphabetical (Wrong!)")
# Specify levels explicitly
category_df$category <- factor(category_df$category,
                       levels = c("Low", "Medium", "High", "Very High"))

# Now plots use logical order!
p2 <- ggplot(category_df, aes(x = category, y = value)) +
  geom_col(fill = "steelblue") +
  theme_classic(base_size = 12) +
  labs(title = "Logical Order (Correct!)")

Random order makes no sense!

Much better - order makes sense!

9.3 Solution: forcats Package

Part of tidyverse, designed for factor manipulation

library(forcats)

# Reorder car classes by median highway mpg
p <- mpg %>%
  ggplot(aes(x = fct_reorder(class, hwy, median),
             y = hwy)) +
  geom_boxplot(fill = "lightblue") +
  coord_flip() +
  labs(x = "Vehicle Class",
       y = "Highway MPG",
       title = "Ordered by median MPG")

Boxplots now ordered by median value!

9.4 fct_reorder(): Order by Another Variable

# Order diamond cuts by mean price
p <- diamonds %>%
  group_by(cut) %>%
  summarise(mean_price = mean(price)) %>%
  ggplot(aes(x = fct_reorder(cut, mean_price),
             y = mean_price,
             fill = cut)) +
  geom_col(show.legend = FALSE) +
  coord_flip() +
  labs(x = "Diamond Cut",
       y = "Mean Price ($)",
       title = "Cuts ordered by average price")

Bars ordered by length!

9.5 fct_infreq(): Order by Frequency

# Order by how common each vehicle class is
p <- ggplot(mpg,
            aes(x = fct_infreq(class))) +
  geom_bar(fill = "steelblue") +
  coord_flip() +
  labs(x = "Vehicle Class",
       y = "Count",
       title = "Most common classes first")

Most common category first!

Great for survey data

9.6 fct_inorder(): Order by Appearance

# Create data with specific order
treatment_data <- data.frame(
  treatment = c("Control", "Low Dose",
                "Medium Dose", "High Dose",
                "Control", "Low Dose",
                "Medium Dose", "High Dose"),
  response = c(10, 12, 15, 18,
               11, 13, 16, 19)
)

# Keep order as they appear in data
p <- ggplot(treatment_data,
       aes(x = fct_inorder(treatment),
           y = response)) +
  geom_boxplot(fill = "lightblue") +
  labs(x = "Treatment", y = "Response")

Preserves the order from your data!

9.7 Natural Sorting: var1, var2, …, var10

The problem with alphabetical sorting:

# Create data with numbered variables
var_data <- data.frame(
  variable = rep(c("var1", "var2", "var10", "var20"), each = 5),
  value = rnorm(20, mean = rep(c(10, 15, 20, 25), each = 5), sd = 2)
)

# Alphabetical order: var1, var10, var2, var20 (wrong!)
p <- ggplot(var_data, aes(x = variable, y = value)) +
  geom_boxplot(fill = "coral") +
  labs(title = "Alphabetical: var1, var10, var2, var20")

9.8 Natural Sorting: The Solution

# Use gtools::mixedsort() for natural/alphanumeric sorting
library(gtools)

var_data$variable <- factor(var_data$variable,
                            levels = mixedsort(unique(var_data$variable)))

# Natural order: var1, var2, var10, var20 (correct!)
p <- ggplot(var_data, aes(x = variable, y = value)) +
  geom_boxplot(fill = "steelblue") +
  labs(title = "Natural sort: var1, var2, var10, var20")

Note: forcats::fct_inseq() only works if factor levels are purely numeric strings (e.g., “1”, “2”, “10”), not mixed alphanumeric like “var1”, “var10”

9.9 fct_rev(): Reverse Order

# Reverse the frequency order
p <- ggplot(mpg, aes(x = fct_rev(fct_infreq(class)))) +
  geom_bar(fill = "coral") +
  coord_flip() +
  labs(x = "Vehicle Class", y = "Count",
       title = "Least common classes first (reversed)")

Now least common first instead of most common!

9.10 fct_relevel(): Move Specific Levels

# Move "Control" to front for treatment groups
treatment_data <- data.frame(
  treatment = c("Low Dose", "High Dose", "Control",
                "Medium Dose", "Low Dose", "Control"),
  response = c(12, 18, 10, 15, 13, 11)
)

p <- treatment_data %>%
  mutate(treatment = fct_relevel(treatment, "Control")) %>%
  ggplot(aes(treatment, response)) +
  geom_boxplot(fill = "lightblue") +
  labs(x = "Treatment", y = "Response")

Control always shown first

Common in experimental data!

9.11 Facet Ordering

# Order facets by median highway mpg for each vehicle class
p <- mpg %>%
  filter(class %in% c("pickup", "minivan", "compact")) %>%
  mutate(class = fct_reorder(class, hwy, median)) %>%
  ggplot(aes(x = displ, y = hwy)) +
  geom_point(color = "steelblue", alpha = 0.6) +
  facet_wrap(~class) +
  labs(x = "Engine Displacement (L)", y = "Highway MPG")

Facet panels in meaningful order!

9.12 ⚠️ The Danger of Numeric Factors

Converting between factor and numeric can destroy your data!


# Original numeric data
doses <- c(10, 20, 50, 100, 200)
print(doses)
[1]  10  20  50 100 200
# Convert to factor (common in data import!)
dose_factor <- factor(doses)
print(dose_factor)
[1] 10  20  50  100 200
Levels: 10 20 50 100 200


# Looks fine, right? But look at the internal structure:
str(dose_factor)
 Factor w/ 5 levels "10","20","50",..: 1 2 3 4 5
levels(dose_factor)
[1] "10"  "20"  "50"  "100" "200"


# Try to convert back to numeric - WRONG!
as.numeric(dose_factor)
[1] 1 2 3 4 5
# Correct way: convert via character
as.numeric(as.character(dose_factor))
[1]  10  20  50 100 200


The danger: Many functions silently convert factors to integers!

9.13 ⚠️ Missing Levels: The Silent Data Corruption

Even worse: missing levels get renumbered!

  subject response
1       1       10
2       2       15
3       4       25
4       5       30
5       1       11
6       2       16
7       4       24
8       5       29
# Convert to factor (happens during import!)
subject_factor <- factor(subject)
str(subject_factor)
 Factor w/ 4 levels "1","2","4","5": 1 2 3 4 1 2 3 4

Levels: “1” “2” “4” “5” - looks OK…


# Try to convert back - DATA CORRUPTION!
as.numeric(subject_factor)
[1] 1 2 3 4 1 2 3 4

Returns: 1 2 3 4 1 2 3 4

Your subject 4 became 3!

Your subject 5 became 4!


This is catastrophic for analysis!

Your statistical models and plots will use the wrong subject numbers. Always use as.numeric(as.character(factor)) not as.numeric(factor).
Or better yet. NEVER use numbers for categorical data!

9.14 ⚠️ Numeric Factors After Reordering

dose_data <- tibble(
  dose = factor(c(0, 10, 50, 100, 200)),
  response = c(5, 25, 80, 70, 30)
) %>%
  mutate(dose_ordered =
         fct_reorder(dose, response))

dose_data
# A tibble: 5 × 3
  dose  response dose_ordered
  <fct>    <dbl> <fct>       
1 0            5 0           
2 10          25 10          
3 50          80 50          
4 100         70 100         
5 200         30 200         
# Try to use numerically - DISASTER!
dose_data <- dose_data %>%
  mutate(
    wrong = as.numeric(dose_ordered),
    correct = as.numeric(as.character(dose_ordered))
  )
# A tibble: 5 × 5
  dose  response dose_ordered wrong correct
  <fct>    <dbl> <fct>        <dbl>   <dbl>
1 0            5 0                1       0
2 10          25 10               2      10
3 50          80 50               5      50
4 100         70 100              4     100
5 200         30 200              3     200

9.15 Recommendations: Avoid Numeric Factors

Best Practices
  1. Never store numeric values as factors unless they represent categories
  2. Prefix categorical numbers: Use “Group_1”, “Group_2” instead of “1”, “2”
  3. Check imported data: CSV imports often convert numbers to factors
  4. Use readr::read_csv() instead of read.csv() - better type detection and no implicit conversion to factors.
  5. Always convert via character: as.numeric(as.character(factor)) not as.numeric(factor)
  6. Check with str() before analysis to verify data types

Bad: Numeric categories

# DON'T do this - ambiguous!
groups <- factor(c(1, 2, 4, 5))
str(groups)
 Factor w/ 4 levels "1","2","4","5": 1 2 3 4
# Are these numbers or categories?

Good: Prefixed categories

# DO this - clearly categorical!
groups <- factor(c("Group_1", "Group_2",
                   "Group_4", "Group_5"))
str(groups)
 Factor w/ 4 levels "Group_1","Group_2",..: 1 2 3 4
# Obviously categories, safe!

9.16 forcats Cheat Sheet

library(forcats)

fct_reorder(f, x, fun)   # Order by another variable
fct_infreq(f)            # Order by frequency
fct_inorder(f)           # Order by appearance in data
fct_inseq(f)             # Order by numeric value (if purely numeric)
fct_rev(f)               # Reverse current order
fct_relevel(f, "A", "B") # Move specific levels to front
fct_recode(f, new = "old") # Rename levels
fct_lump_n(f, n = 5)     # Keep top n, lump others as "Other"
fct_explicit_na(f)       # Make NA a visible level

# For natural sort (var1, var2, var10):
factor(x, levels = gtools::mixedsort(unique(x)))

9.17 Debugging Factor Issues

# Example: vehicle classes in mpg dataset
vehicle_class <- factor(mpg$class)

# Check level order
vehicle_class %>% levels()
[1] "2seater"    "compact"    "midsize"    "minivan"    "pickup"    
[6] "subcompact" "suv"       
# See factor structure (shows integer encoding)
str(vehicle_class)
 Factor w/ 7 levels "2seater","compact",..: 2 2 2 2 2 2 2 2 2 2 ...
# Check how many observations per level
table(vehicle_class)
vehicle_class
   2seater    compact    midsize    minivan     pickup subcompact        suv 
         5         47         41         11         33         35         62 
# Convert to character if needed for text operations
class_char <- as.character(vehicle_class)

9.18 Key Takeaways & Best Practices

Remember

The Problem:

  • R defaults to alphabetical factor ordering - usually wrong!
  • Numeric factors are dangerous - converting destroys your data
  • Missing levels get renumbered - group 4 becomes 3!

The Solutions:

  • forcats package provides powerful ordering tools:
    • fct_reorder() orders by another variable (most useful!)
    • fct_infreq() orders by frequency
    • fct_relevel() to put control/baseline first
    • fct_rev() to reverse order
  • Manual levels for logical ordering (Low/Med/High, months, etc.)
  • Prefix categorical numbers: “Group_1” not “1”

Always:

  • Check factor order before plotting with str() or levels()
  • Convert via character: as.numeric(as.character(f)) not as.numeric(f)
  • Think about your reader - what order makes sense?

9.19 Summary

Key Takeaways
  1. Alphabetical order is rarely useful - readers can’t interpret patterns

  2. forcats package provides essential tools:

    • fct_reorder(category, value) - order by another variable (most common!)
    • fct_infreq() - order by frequency (highest first)
    • fct_rev() - reverse current order
    • fct_relevel(f, "Control") - move specific level first
  3. Manual ordering for logical sequences:

    factor(x, levels = c("Low", "Medium", "High"))
    factor(month, levels = month.name)  # built-in constant
  4. The numeric factor trap:

    • as.numeric(factor(c("1", "2", "10"))) gives 1, 2, 3 (WRONG!)
    • as.numeric(as.character(factor(c("1", "2", "10")))) gives 1, 2, 10 (correct)
  5. Prefix numbered categories: Use “Group_1” not “1” to prevent accidental conversion

  6. Always check: str(data) and levels(variable) before plotting

  7. Think about your reader: What ordering tells the story best?

9.20 Exercises

Try It Yourself
  1. Create a dataset with unordered categories and fix the ordering:

    df <- data.frame(
      treatment = c("High", "Low", "Medium", "High", "Low"),
      response = c(8.2, 3.1, 5.5, 9.1, 2.8)
    )
    # Fix: factor(treatment, levels = c("Low", "Medium", "High"))
  2. Practice fct_reorder():

    starwars %>%
      mutate(species = fct_reorder(species, height, .fun = median, na.rm = TRUE)) %>%
      ggplot(aes(height, species)) + geom_boxplot()
  3. Create a factor from numbers (“1”, “2”, “10”) and practice safe conversion

  4. Compare plots with alphabetical vs. reordered factors - which is easier to read?

9.21 Further Reading