Data Structures

Vectors, Lists, DataFrames, and Beyond

Author

Raju Rimal

Published

November 30, 2024

Modified

March 19, 2025

R is a powerful language for data analysis, and at its core lies a variety of data structures designed to handle diverse types of data. Understanding these structures is crucial for effectively managing, manipulating, and analyzing data in R. This guide introduces you to R’s primary data structures, their properties, and how to work with them.


What Are Data Structures?

Data structures are containers that organize and store data. R offers several data structures, each optimized for specific tasks:

  1. Vectors
  2. Matrices
  3. Arrays
  1. Data Frames
  2. Lists
  3. Factors

Let’s explore each in detail.


1. Vectors: The Building Blocks of R

Vectors are the most basic data structure in R. They are one-dimensional arrays that contain elements of the same type (numeric, character, or logical).

Creating Vectors

# Numeric vector
numbers <- c(1, 2, 3, 4)

# Character vector
names <- c("Alice", "Bob", "Charlie")

# Logical vector
flags <- c(TRUE, FALSE, TRUE)

Accessing Vector Elements

numbers[1]   # Access the first element
[1] 1
names[2:3]   # Access the second and third elements
[1] "Bob"     "Charlie"

Vectorized Operations

R allows operations to be applied to entire vectors at once:

numbers * 2       # Multiplies each element by 2
[1] 2 4 6 8
numbers + c(10)   # Adds 10 to each element
[1] 11 12 13 14

2. Matrices: Two-Dimensional Arrays

Matrices are two-dimensional arrays where all elements are of the same type.

Creating Matrices

# Create a 3x3 matrix
matrix_data <- matrix(1:9, nrow = 3, byrow = TRUE)
print(matrix_data)
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    5    6
[3,]    7    8    9

Accessing Matrix Elements

matrix_data[1, 2]     # Element in row 1, column 2
[1] 2
matrix_data[, 2]      # Entire second column
[1] 2 5 8
matrix_data[1, ]      # Entire first row
[1] 1 2 3

Matrix Operations

# Matrix addition
matrix_data + 2
     [,1] [,2] [,3]
[1,]    3    4    5
[2,]    6    7    8
[3,]    9   10   11
# Matrix multiplication
matrix_data %*% t(matrix_data)  # Multiply with its transpose
     [,1] [,2] [,3]
[1,]   14   32   50
[2,]   32   77  122
[3,]   50  122  194

3. Arrays: Multi-Dimensional Data

Arrays extend matrices to more than two dimensions.

Creating Arrays

# Create a 3D array
array_data <- array(1:24, dim = c(3, 4, 2))
print(array_data)
, , 1

     [,1] [,2] [,3] [,4]
[1,]    1    4    7   10
[2,]    2    5    8   11
[3,]    3    6    9   12

, , 2

     [,1] [,2] [,3] [,4]
[1,]   13   16   19   22
[2,]   14   17   20   23
[3,]   15   18   21   24

Accessing Array Elements

array_data[1, 2, 1]    # Element at [row, column, layer]
[1] 4
array_data[, , 2]      # All rows and columns from the second layer
     [,1] [,2] [,3] [,4]
[1,]   13   16   19   22
[2,]   14   17   20   23
[3,]   15   18   21   24

4. Data Frames: The Workhorse for Data Analysis

Data frames are tabular structures where each column can have a different data type, making them ideal for datasets.

Creating Data Frames

data <- data.frame(
  Name = c("Alice", "Bob", "Charlie"),
  Age = c(25, 30, 35),
  Score = c(90, 85, 88)
)
print(data)
     Name Age Score
1   Alice  25    90
2     Bob  30    85
3 Charlie  35    88

Accessing Data Frame Elements

data$Name        # Access a column by name
[1] "Alice"   "Bob"     "Charlie"
data[1, ]        # Access the first row
   Name Age Score
1 Alice  25    90
data[1:2, "Age"] # Access specific rows and columns
[1] 25 30

Adding and Removing Columns

# Add a new column
data$Passed <- data$Score > 85

# Remove a column
data$Passed <- NULL

5. Lists: Storing Diverse Data

Lists can contain elements of different types and sizes, including other lists or data frames.

Creating Lists

my_list <- list(
  Name = "Alice",
  Age = 25,
  Scores = c(90, 85, 88)
)
print(my_list)
$Name
[1] "Alice"

$Age
[1] 25

$Scores
[1] 90 85 88

Accessing List Elements

my_list$Name       # Access by name
[1] "Alice"
my_list[[2]]       # Access by position
[1] 25
my_list[["Scores"]][1]  # Access nested elements
[1] 90

Modifying Lists

my_list$City <- "New York"   # Add a new element
my_list$Age <- NULL          # Remove an element

6. Factors: Categorical Data

Factors are used to represent categorical data, storing levels as unique identifiers.

Creating Factors

colors <- factor(c("Red", "Blue", "Red", "Green", "Blue"))
print(colors)
[1] Red   Blue  Red   Green Blue 
Levels: Blue Green Red
levels(colors)     # Check the levels
[1] "Blue"  "Green" "Red"  

Reordering Levels

colors <- factor(colors, levels = c("Red", "Green", "Blue"))
print(colors)
[1] Red   Blue  Red   Green Blue 
Levels: Red Green Blue

Choosing the Right Data Structure

Data Structure Use Case Example
Vector Simple one-dimensional data A list of ages
Matrix Numerical data in 2D Temperature readings over days
Array Multi-dimensional data RGB values of an image
Data Frame Tabular data Survey results
List Mixed data types User profiles
Factor Categorical data Survey responses: “Yes”, “No”

Practice Exercise

Here’s a simple exercise to practice:

  1. Create a data frame containing the following columns:
    • Name: Names of three people.
    • Age: Their ages.
    • Score: Their scores in a test.
  2. Add a new column called Passed, indicating whether the Score is greater than 50.
  3. Convert the Name column into a factor.

Conclusion

Understanding data structures is fundamental for efficient data manipulation and analysis in R. Each structure has its unique properties and use cases, and mastering them will significantly enhance your ability to handle data effectively.