# Numeric vector
numbers <- c(1, 2, 3, 4)
# Character vector
names <- c("Alice", "Bob", "Charlie")
# Logical vector
flags <- c(TRUE, FALSE, TRUE)Data Structures
Vectors, Lists, DataFrames, and Beyond
R is a powerful language for data analysis, and at its core lies a variety of data structures designed to handle diverse types of data. Understanding these structures is crucial for effectively managing, manipulating, and analyzing data in R. This guide introduces you to R’s primary data structures, their properties, and how to work with them.
What Are Data Structures?
Data structures are containers that organize and store data. R offers several data structures, each optimized for specific tasks:
- Vectors
- Matrices
- Arrays
- Data Frames
- Lists
- Factors
Let’s explore each in detail.
1. Vectors: The Building Blocks of R
Vectors are the most basic data structure in R. They are one-dimensional arrays that contain elements of the same type (numeric, character, or logical).
Creating Vectors
Accessing Vector Elements
numbers[1] # Access the first element[1] 1
names[2:3] # Access the second and third elements[1] "Bob" "Charlie"
Vectorized Operations
R allows operations to be applied to entire vectors at once:
numbers * 2 # Multiplies each element by 2[1] 2 4 6 8
numbers + c(10) # Adds 10 to each element[1] 11 12 13 14
2. Matrices: Two-Dimensional Arrays
Matrices are two-dimensional arrays where all elements are of the same type.
Creating Matrices
# Create a 3x3 matrix
matrix_data <- matrix(1:9, nrow = 3, byrow = TRUE)
print(matrix_data) [,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
Accessing Matrix Elements
matrix_data[1, 2] # Element in row 1, column 2[1] 2
matrix_data[, 2] # Entire second column[1] 2 5 8
matrix_data[1, ] # Entire first row[1] 1 2 3
Matrix Operations
# Matrix addition
matrix_data + 2 [,1] [,2] [,3]
[1,] 3 4 5
[2,] 6 7 8
[3,] 9 10 11
# Matrix multiplication
matrix_data %*% t(matrix_data) # Multiply with its transpose [,1] [,2] [,3]
[1,] 14 32 50
[2,] 32 77 122
[3,] 50 122 194
3. Arrays: Multi-Dimensional Data
Arrays extend matrices to more than two dimensions.
Creating Arrays
# Create a 3D array
array_data <- array(1:24, dim = c(3, 4, 2))
print(array_data), , 1
[,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
, , 2
[,1] [,2] [,3] [,4]
[1,] 13 16 19 22
[2,] 14 17 20 23
[3,] 15 18 21 24
Accessing Array Elements
array_data[1, 2, 1] # Element at [row, column, layer][1] 4
array_data[, , 2] # All rows and columns from the second layer [,1] [,2] [,3] [,4]
[1,] 13 16 19 22
[2,] 14 17 20 23
[3,] 15 18 21 24
4. Data Frames: The Workhorse for Data Analysis
Data frames are tabular structures where each column can have a different data type, making them ideal for datasets.
Creating Data Frames
data <- data.frame(
Name = c("Alice", "Bob", "Charlie"),
Age = c(25, 30, 35),
Score = c(90, 85, 88)
)
print(data) Name Age Score
1 Alice 25 90
2 Bob 30 85
3 Charlie 35 88
Accessing Data Frame Elements
data$Name # Access a column by name[1] "Alice" "Bob" "Charlie"
data[1, ] # Access the first row Name Age Score
1 Alice 25 90
data[1:2, "Age"] # Access specific rows and columns[1] 25 30
Adding and Removing Columns
# Add a new column
data$Passed <- data$Score > 85
# Remove a column
data$Passed <- NULL5. Lists: Storing Diverse Data
Lists can contain elements of different types and sizes, including other lists or data frames.
Creating Lists
my_list <- list(
Name = "Alice",
Age = 25,
Scores = c(90, 85, 88)
)
print(my_list)$Name
[1] "Alice"
$Age
[1] 25
$Scores
[1] 90 85 88
Accessing List Elements
my_list$Name # Access by name[1] "Alice"
my_list[[2]] # Access by position[1] 25
my_list[["Scores"]][1] # Access nested elements[1] 90
Modifying Lists
my_list$City <- "New York" # Add a new element
my_list$Age <- NULL # Remove an element6. Factors: Categorical Data
Factors are used to represent categorical data, storing levels as unique identifiers.
Creating Factors
colors <- factor(c("Red", "Blue", "Red", "Green", "Blue"))
print(colors)[1] Red Blue Red Green Blue
Levels: Blue Green Red
levels(colors) # Check the levels[1] "Blue" "Green" "Red"
Reordering Levels
colors <- factor(colors, levels = c("Red", "Green", "Blue"))
print(colors)[1] Red Blue Red Green Blue
Levels: Red Green Blue
Choosing the Right Data Structure
| Data Structure | Use Case | Example |
|---|---|---|
| Vector | Simple one-dimensional data | A list of ages |
| Matrix | Numerical data in 2D | Temperature readings over days |
| Array | Multi-dimensional data | RGB values of an image |
| Data Frame | Tabular data | Survey results |
| List | Mixed data types | User profiles |
| Factor | Categorical data | Survey responses: “Yes”, “No” |
Practice Exercise
Here’s a simple exercise to practice:
- Create a data frame containing the following columns:
- Name: Names of three people.
- Age: Their ages.
- Score: Their scores in a test.
- Add a new column called
Passed, indicating whether theScoreis greater than 50. - Convert the
Namecolumn into a factor.
Conclusion
Understanding data structures is fundamental for efficient data manipulation and analysis in R. Each structure has its unique properties and use cases, and mastering them will significantly enhance your ability to handle data effectively.