Tabular Structure

Tabular Data in R: The Foundation

Author

Raju Rimal

Published

November 30, 2024

Modified

March 19, 2025

If you’ve just started your journey with R, welcome! One of the first tools you’ll encounter is the mighty data.frame. It’s not flashy, but it’s the backbone of data manipulation in R. From analyzing survey results to crunching genomics data, data.frame is where the magic begins. But let’s not just scratch the surface—let’s dig deeper, uncover its quirks, and discover its hidden powers.


Introduction to Tabular Data in R

Imagine a table in Excel: rows are individual observations, and columns are variables. That’s tabular data. In R, the data.frame represents this structure. But here’s the twist: a data.frame is more than just a table.


The Secret Identity of data.frame: A Fancy List

Surprise! A data.frame is actually a list. Yes, it’s a list where:

  • Each column is a vector.
  • All vectors are of the same length.
  • Columns can store different types of data (numbers, text, logical values—you name it).

Here’s a quick example:

# Creating a data.frame
df <- data.frame(Name = c("Alice", "Bob"), Age = c(25, 30), Married = c(TRUE, FALSE))
str(df)  # Shows its structure
'data.frame':   2 obs. of  3 variables:
 $ Name   : chr  "Alice" "Bob"
 $ Age    : num  25 30
 $ Married: logi  TRUE FALSE

Cool, right? Understanding this “list-like” structure is key to mastering R’s tabular data.


Working with data.frame: The Basics

Creating a data.frame

You can build a data.frame from scratch using vectors or lists:

# Creating a data.frame from vectors
students <- data.frame(
  Name = c("Alice", "Bob", "Charlie"),
  Age = c(23, 25, 22),
  Grade = c("A", "B", "A")
)
print(students)
     Name Age Grade
1   Alice  23     A
2     Bob  25     B
3 Charlie  22     A

Exploring Your Data

These functions are your go-to tools for peeking under the hood:

  • str(): Understand the structure.
  • summary(): Get a statistical overview.
  • head(): See the first few rows.
str(students)
'data.frame':   3 obs. of  3 variables:
 $ Name : chr  "Alice" "Bob" "Charlie"
 $ Age  : num  23 25 22
 $ Grade: chr  "A" "B" "A"
summary(students)
     Name                Age           Grade          
 Length:3           Min.   :22.00   Length:3          
 Class :character   1st Qu.:22.50   Class :character  
 Mode  :character   Median :23.00   Mode  :character  
                    Mean   :23.33                     
                    3rd Qu.:24.00                     
                    Max.   :25.00                     
head(students)
     Name Age Grade
1   Alice  23     A
2     Bob  25     B
3 Charlie  22     A

Try these on built-in datasets like mtcars or iris to get comfortable.


Interacting with data.frame: From Basic to Pro-Level

Selecting Data

Rows and columns can be selected with simple [row, column] indexing:

# Selecting columns
students$Name       # Single column
[1] "Alice"   "Bob"     "Charlie"
students[, c("Name", "Grade")]  # Multiple columns
     Name Grade
1   Alice     A
2     Bob     B
3 Charlie     A
# Selecting rows
students[students$Grade == "A", ]  # Filter rows where Grade is 'A'
     Name Age Grade
1   Alice  23     A
3 Charlie  22     A

Want to add a column? Easy!

students$Passed <- students$Grade != "F"

Renaming columns is a bit more verbose:

names(students)[names(students) == "Age"] <- "Years"

The Real Fun: Merging and Reshaping Data

Combining Data

Imagine you have two class lists and need to combine them. Use rbind() to stack rows or merge() to combine columns.

class1 <- data.frame(Name = c("Alice", "Bob"), Grade = c("A", "B"))
class2 <- data.frame(Name = c("Charlie", "David"), Grade = c("B", "C"))

# Stacking rows
all_classes <- rbind(class1, class2)

# Merging with another dataset
extra_info <- data.frame(Name = c("Alice", "Charlie"), Age = c(23, 22))
merged <- merge(all_classes, extra_info, by = "Name", all.x = TRUE)

Reshaping Data

Data often needs to switch between wide and long formats. For example:

library(reshape2)

# Melting: Wide to Long
long <- melt(iris, id.vars = "Species")

# Casting: Long to Wide
wide <- dcast(long, Species ~ variable, mean)

Using R’s Built-In Magic

Let’s level up with some lesser-known R functions that are perfect for data.frame wrangling.

Transforming Data

Add or modify columns on the fly:

students <- transform(students, FullName = paste(Name, "Smith"))

Aggregating Data

Summarize data by groups:

aggregate(mpg ~ cyl, data = mtcars, mean)  # Average MPG by cylinder count
  cyl      mpg
1   4 26.66364
2   6 19.74286
3   8 15.10000

Stack and Unstack

Switch between formats:

stacked <- stack(mtcars[, 1:3])
unstacked <- unstack(stacked)

Filtering with Filter()

Effortlessly keep rows that meet a condition:

filtered <- Filter(function(x) mean(x) > 20, mtcars)  # Columns with mean > 20

Powerful Function Application

Apply functions column-wise with Map() or combine results across columns with Reduce():

# Apply mean to each column
means <- Map(mean, mtcars)

# Sum all columns together
sum_result <- Reduce(`+`, mtcars[, 1:3])

Why Use data.frame?

Strengths

  • It’s simple and intuitive.
  • Supports mixed data types.
  • Fully compatible with R’s base functions.

Limitations

  • Slow for large datasets.
  • No built-in chaining syntax (you’ll need external packages like dplyr for that).

Conclusion

The data.frame is where you start, but it’s far from where you’ll stop. It’s a fantastic tool for small to moderate datasets and a perfect introduction to tabular data in R. Once you’ve mastered it, you’ll be ready to tackle more advanced tools like data.table, tibble, or even the hybrid tidytable.