# Numeric vector
<- c(1, 2, 3, 4)
numbers
# Character vector
<- c("Alice", "Bob", "Charlie")
names
# Logical vector
<- c(TRUE, FALSE, TRUE) flags
Data Structures
Vectors, Lists, DataFrames, and Beyond
R is a powerful language for data analysis, and at its core lies a variety of data structures designed to handle diverse types of data. Understanding these structures is crucial for effectively managing, manipulating, and analyzing data in R. This guide introduces you to R’s primary data structures, their properties, and how to work with them.
What Are Data Structures?
Data structures are containers that organize and store data. R offers several data structures, each optimized for specific tasks:
- Vectors
- Matrices
- Arrays
- Data Frames
- Lists
- Factors
Let’s explore each in detail.
1. Vectors: The Building Blocks of R
Vectors are the most basic data structure in R. They are one-dimensional arrays that contain elements of the same type (numeric, character, or logical).
Creating Vectors
Accessing Vector Elements
1] # Access the first element numbers[
[1] 1
2:3] # Access the second and third elements names[
[1] "Bob" "Charlie"
Vectorized Operations
R allows operations to be applied to entire vectors at once:
* 2 # Multiplies each element by 2 numbers
[1] 2 4 6 8
+ c(10) # Adds 10 to each element numbers
[1] 11 12 13 14
2. Matrices: Two-Dimensional Arrays
Matrices are two-dimensional arrays where all elements are of the same type.
Creating Matrices
# Create a 3x3 matrix
<- matrix(1:9, nrow = 3, byrow = TRUE)
matrix_data print(matrix_data)
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
Accessing Matrix Elements
1, 2] # Element in row 1, column 2 matrix_data[
[1] 2
2] # Entire second column matrix_data[,
[1] 2 5 8
1, ] # Entire first row matrix_data[
[1] 1 2 3
Matrix Operations
# Matrix addition
+ 2 matrix_data
[,1] [,2] [,3]
[1,] 3 4 5
[2,] 6 7 8
[3,] 9 10 11
# Matrix multiplication
%*% t(matrix_data) # Multiply with its transpose matrix_data
[,1] [,2] [,3]
[1,] 14 32 50
[2,] 32 77 122
[3,] 50 122 194
3. Arrays: Multi-Dimensional Data
Arrays extend matrices to more than two dimensions.
Creating Arrays
# Create a 3D array
<- array(1:24, dim = c(3, 4, 2))
array_data print(array_data)
, , 1
[,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
, , 2
[,1] [,2] [,3] [,4]
[1,] 13 16 19 22
[2,] 14 17 20 23
[3,] 15 18 21 24
Accessing Array Elements
1, 2, 1] # Element at [row, column, layer] array_data[
[1] 4
2] # All rows and columns from the second layer array_data[, ,
[,1] [,2] [,3] [,4]
[1,] 13 16 19 22
[2,] 14 17 20 23
[3,] 15 18 21 24
4. Data Frames: The Workhorse for Data Analysis
Data frames are tabular structures where each column can have a different data type, making them ideal for datasets.
Creating Data Frames
<- data.frame(
data Name = c("Alice", "Bob", "Charlie"),
Age = c(25, 30, 35),
Score = c(90, 85, 88)
)print(data)
Name Age Score
1 Alice 25 90
2 Bob 30 85
3 Charlie 35 88
Accessing Data Frame Elements
$Name # Access a column by name data
[1] "Alice" "Bob" "Charlie"
1, ] # Access the first row data[
Name Age Score
1 Alice 25 90
1:2, "Age"] # Access specific rows and columns data[
[1] 25 30
Adding and Removing Columns
# Add a new column
$Passed <- data$Score > 85
data
# Remove a column
$Passed <- NULL data
5. Lists: Storing Diverse Data
Lists can contain elements of different types and sizes, including other lists or data frames.
Creating Lists
<- list(
my_list Name = "Alice",
Age = 25,
Scores = c(90, 85, 88)
)print(my_list)
$Name
[1] "Alice"
$Age
[1] 25
$Scores
[1] 90 85 88
Accessing List Elements
$Name # Access by name my_list
[1] "Alice"
2]] # Access by position my_list[[
[1] 25
"Scores"]][1] # Access nested elements my_list[[
[1] 90
Modifying Lists
$City <- "New York" # Add a new element
my_list$Age <- NULL # Remove an element my_list
6. Factors: Categorical Data
Factors are used to represent categorical data, storing levels as unique identifiers.
Creating Factors
<- factor(c("Red", "Blue", "Red", "Green", "Blue"))
colors print(colors)
[1] Red Blue Red Green Blue
Levels: Blue Green Red
levels(colors) # Check the levels
[1] "Blue" "Green" "Red"
Reordering Levels
<- factor(colors, levels = c("Red", "Green", "Blue"))
colors print(colors)
[1] Red Blue Red Green Blue
Levels: Red Green Blue
Choosing the Right Data Structure
Data Structure | Use Case | Example |
---|---|---|
Vector | Simple one-dimensional data | A list of ages |
Matrix | Numerical data in 2D | Temperature readings over days |
Array | Multi-dimensional data | RGB values of an image |
Data Frame | Tabular data | Survey results |
List | Mixed data types | User profiles |
Factor | Categorical data | Survey responses: “Yes”, “No” |
Practice Exercise
Here’s a simple exercise to practice:
- Create a data frame containing the following columns:
- Name: Names of three people.
- Age: Their ages.
- Score: Their scores in a test.
- Add a new column called
Passed
, indicating whether theScore
is greater than 50. - Convert the
Name
column into a factor.
Conclusion
Understanding data structures is fundamental for efficient data manipulation and analysis in R. Each structure has its unique properties and use cases, and mastering them will significantly enhance your ability to handle data effectively.