# A simple string
<- "Hello, R!"
my_string print(my_string)
[1] "Hello, R!"
String Manipulation and Regular Expressions with Practical Examples
Raju Rimal
November 30, 2024
March 19, 2025
Strings are a crucial part of data manipulation and analysis in R. From cleaning messy datasets to extracting specific information, the ability to efficiently work with text can save time and improve the quality of your results. This blog dives into string manipulation in R, focusing on the power of regular expressions and string functions.
In R, strings are character data types represented by text enclosed in quotes.
Strings are often stored in vectors, making them compatible with R’s vectorized operations:
[1] "Apple" "Banana" "Cherry"
nchar()
: Count Characterstoupper()
and tolower()
: Change Casesubstr()
: Extract or Replace Substringsstringr
The stringr
package, part of the tidyverse, simplifies string operations and introduces a consistent syntax.
str_detect()
str_extract()
str_replace()
str_split()
[[1]]
[1] "Apple" "Banana" "Cherry"
Regular expressions (regex) are powerful tools for pattern matching.
Symbol | Meaning | Example |
---|---|---|
. |
Any character | "a.c" matches "abc" |
* |
Zero or more occurrences | "a*" matches "aaa" |
+ |
One or more occurrences | "a+" matches "aa" |
\\d |
Any digit | "\\d" matches "1" |
^ |
Start of a string | "^A" matches "Apple" |
$ |
End of a string | "e$" matches "Apple" |
[1] "Banana" "Blueberry"
# Remove leading and trailing spaces
dirty_text <- c(" Hello ", " World ")
cleaned_text <- str_trim(dirty_text)
print(cleaned_text)
[1] "Hello" "World"
For large datasets, use stringi
, a faster alternative to stringr
for complex text processing.
library(stringi)
# Count occurrences of a pattern
stri_count_regex(c("apple", "banana", "cherry"), "a")
[1] 1 3 0
logs <- c("ERROR: Disk full", "INFO: Process started", "WARNING: Low memory")
error_logs <- str_subset(logs, "^ERROR")
print(error_logs)
[1] "ERROR: Disk full"
Strings are more than just text—they’re data waiting to be transformed. By mastering R’s string manipulation functions and the power of regular expressions, you can efficiently clean, extract, and analyze text data.