install.packages("tidytable")
Tidytable in R
Tidyverse Simplicity with Data.Table Performance
Introduction
Data manipulation in R often requires balancing user-friendly syntax with computational efficiency. The tidytable
package bridges this gap, combining the clear, pipe-friendly syntax of the tidyverse with the speed of data.table
. If you’ve worked with dplyr
or data.table
, tidytable
offers a familiar yet high-performance alternative.
In this article, we’ll explore the core features of tidytable
, dive into its syntax, and demonstrate its utility through practical examples.
What is tidytable?
tidytable
integrates:
- Tidyverse Syntax: Use functions like
filter()
,mutate()
, andsummarize()
seamlessly. - data.table Backend: Achieve faster processing, especially on large datasets.
- Pipe Compatibility: Works perfectly with
%>%
or the native|>
pipe.
Getting Started with tidytable
Installation
Install tidytable
from CRAN:
Creating a Tidytable
You can create a tidytable directly or convert existing data frames:
library(tidytable)
# From vectors
<- tidytable(x = 1:5, y = letters[1:5])
tt
# Converting from a data.frame
<- data.frame(x = 1:5, y = letters[1:5])
df <- as_tidytable(df) tt
Core Features and Syntax
1. Selecting and Filtering Rows
Use filter()
for subsetting data:
<- as_tidytable(mtcars)
mtcars %>%
mtcars filter(cyl == 6)
# A tidytable: 7 × 11
mpg cyl disp hp drat wt qsec vs am gear carb
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 21 6 160 110 3.9 2.62 16.5 0 1 4 4
2 21 6 160 110 3.9 2.88 17.0 0 1 4 4
3 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1
4 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1
5 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4
6 17.8 6 168. 123 3.92 3.44 18.9 1 0 4 4
7 19.7 6 145 175 3.62 2.77 15.5 0 1 5 6
2. Adding or Modifying Columns
Add new variables with mutate()
:
%>%
mtcars mutate(mpg_double = mpg * 2)
# A tidytable: 32 × 12
mpg cyl disp hp drat wt qsec vs am gear carb mpg_double
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 21 6 160 110 3.9 2.62 16.5 0 1 4 4 42
2 21 6 160 110 3.9 2.88 17.0 0 1 4 4 42
3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1 45.6
4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1 42.8
5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2 37.4
6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1 36.2
7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4 28.6
8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2 48.8
9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2 45.6
10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4 38.4
# ℹ 22 more rows
Group operations can be directly specified with .by
:
%>%
mtcars mutate(avg_mpg = mean(mpg), .by = cyl)
# A tidytable: 32 × 12
mpg cyl disp hp drat wt qsec vs am gear carb avg_mpg
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 21 6 160 110 3.9 2.62 16.5 0 1 4 4 19.7
2 21 6 160 110 3.9 2.88 17.0 0 1 4 4 19.7
3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1 26.7
4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1 19.7
5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2 15.1
6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1 19.7
7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4 15.1
8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2 26.7
9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2 26.7
10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4 19.7
# ℹ 22 more rows
3. Summarizing Data
Generate grouped summaries:
%>%
mtcars summarize(avg_mpg = mean(mpg), max_hp = max(hp), .by = cyl)
# A tidytable: 3 × 3
cyl avg_mpg max_hp
<dbl> <dbl> <dbl>
1 4 26.7 113
2 6 19.7 175
3 8 15.1 335
4. Joining Tables
Combine data tables with joins:
<- mtcars %>%
table1 summarize(avg_mpg = mean(mpg), .by = cyl)
<- mtcars %>%
table2 summarize(median_hp = median(hp), .by = cyl)
left_join(table1, table2, by = "cyl")
# A tidytable: 3 × 3
cyl avg_mpg median_hp
<dbl> <dbl> <dbl>
1 4 26.7 91
2 6 19.7 110
3 8 15.1 192.
5. Reshaping Data
Tidytable makes pivoting intuitive:
<- as_tidytable(iris)
iris
%>%
iris pivot_longer(
cols = c(Sepal.Length, Sepal.Width),
names_to = "measurement",
values_to = "value"
)
# A tidytable: 300 × 5
Petal.Length Petal.Width Species measurement value
<dbl> <dbl> <fct> <chr> <dbl>
1 1.4 0.2 setosa Sepal.Length 5.1
2 1.4 0.2 setosa Sepal.Length 4.9
3 1.3 0.2 setosa Sepal.Length 4.7
4 1.5 0.2 setosa Sepal.Length 4.6
5 1.4 0.2 setosa Sepal.Length 5
6 1.7 0.4 setosa Sepal.Length 5.4
7 1.4 0.3 setosa Sepal.Length 4.6
8 1.5 0.2 setosa Sepal.Length 5
9 1.4 0.2 setosa Sepal.Length 4.4
10 1.5 0.1 setosa Sepal.Length 4.9
# ℹ 290 more rows
6. Row-wise Computations
Use c_across()
for row-wise operations:
%>%
iris rowwise() %>%
mutate(
Sepal.Sum = sum(c_across(Sepal.Length:Sepal.Width)),
Petal.Sum = sum(c_across(Petal.Length:Petal.Width))
)
# A rowwise tidytable: 150 × 7
Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Sum Petal.Sum
<dbl> <dbl> <dbl> <dbl> <fct> <dbl> <dbl>
1 5.1 3.5 1.4 0.2 setosa 8.6 1.6
2 4.9 3 1.4 0.2 setosa 7.9 1.6
3 4.7 3.2 1.3 0.2 setosa 7.9 1.5
4 4.6 3.1 1.5 0.2 setosa 7.7 1.7
5 5 3.6 1.4 0.2 setosa 8.6 1.6
6 5.4 3.9 1.7 0.4 setosa 9.3 2.1
7 4.6 3.4 1.4 0.3 setosa 8 1.7
8 5 3.4 1.5 0.2 setosa 8.4 1.7
9 4.4 2.9 1.4 0.2 setosa 7.3 1.6
10 4.9 3.1 1.5 0.1 setosa 8 1.6
# ℹ 140 more rows
Advanced Usage
Performance on Large Datasets
tiddytable
leverages data.table
for its backend, providing significant speed advantages over dplyr
:
- Faster grouped operations with
.by
. - Efficient memory usage for large datasets.
Integration with Pipes
Works seamlessly with %>%
or |>
:
%>%
mtcars filter(mpg > 20) %>%
summarize(avg_hp = mean(hp), .by = c(cyl))
# A tidytable: 2 × 2
cyl avg_hp
<dbl> <dbl>
1 4 82.6
2 6 110
Comparing tidytable to Alternatives
Feature | tidytable | tidyverse | data.table |
---|---|---|---|
Syntax Familiarity | Tidyverse-like | Intuitive and readable | Concise but unique |
Performance | Optimized (data.table ) |
Slower for large data | Highly optimized |
Pipe Compatibility | Fully supported | Fully supported | Requires %>% or native |
Practical Examples
Filtering and Transformation
%>%
mtcars filter(cyl == 6) %>%
mutate(mpg_ratio = mpg / wt)
# A tidytable: 7 × 12
mpg cyl disp hp drat wt qsec vs am gear carb mpg_ratio
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 21 6 160 110 3.9 2.62 16.5 0 1 4 4 8.02
2 21 6 160 110 3.9 2.88 17.0 0 1 4 4 7.30
3 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1 6.66
4 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1 5.23
5 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4 5.58
6 17.8 6 168. 123 3.92 3.44 18.9 1 0 4 4 5.17
7 19.7 6 145 175 3.62 2.77 15.5 0 1 5 6 7.11
Grouped Summaries
%>%
mtcars summarize(avg_mpg = mean(mpg), max_hp = max(hp), .by = cyl)
# A tidytable: 3 × 3
cyl avg_mpg max_hp
<dbl> <dbl> <dbl>
1 4 26.7 113
2 6 19.7 175
3 8 15.1 335
Pivoting Data
%>%
iris pivot_longer(cols = starts_with("Sepal"),
names_to = "measurement",
values_to = "value")
# A tidytable: 300 × 5
Petal.Length Petal.Width Species measurement value
<dbl> <dbl> <fct> <chr> <dbl>
1 1.4 0.2 setosa Sepal.Length 5.1
2 1.4 0.2 setosa Sepal.Length 4.9
3 1.3 0.2 setosa Sepal.Length 4.7
4 1.5 0.2 setosa Sepal.Length 4.6
5 1.4 0.2 setosa Sepal.Length 5
6 1.7 0.4 setosa Sepal.Length 5.4
7 1.4 0.3 setosa Sepal.Length 4.6
8 1.5 0.2 setosa Sepal.Length 5
9 1.4 0.2 setosa Sepal.Length 4.4
10 1.5 0.1 setosa Sepal.Length 4.9
# ℹ 290 more rows
Why Use tidytable?
- Ease of Use: Familiar tidyverse syntax reduces the learning curve.
- High Performance: Data processing is faster thanks to
data.table
. - Scalable: Handles both small and large datasets efficiently.
- Flexible Grouping: Inline grouping via
.by
simplifies operations.
Conclusion
tiddytable
blends the best of both worlds: intuitive syntax and high performance. It’s a perfect tool for R users seeking tidyverse simplicity with data.table efficiency. Whether you’re cleaning, transforming, or reshaping data, tidytable
has you covered.