Tidytable in R

Tidyverse Simplicity with Data.Table Performance

Author

Raju Rimal

Published

November 30, 2024

Modified

March 19, 2025

Introduction

Data manipulation in R often requires balancing user-friendly syntax with computational efficiency. The tidytable package bridges this gap, combining the clear, pipe-friendly syntax of the tidyverse with the speed of data.table. If you’ve worked with dplyr or data.table, tidytable offers a familiar yet high-performance alternative.

In this article, we’ll explore the core features of tidytable, dive into its syntax, and demonstrate its utility through practical examples.


What is tidytable?

tidytable integrates:

  1. Tidyverse Syntax: Use functions like filter(), mutate(), and summarize() seamlessly.
  2. data.table Backend: Achieve faster processing, especially on large datasets.
  3. Pipe Compatibility: Works perfectly with %>% or the native |> pipe.

Getting Started with tidytable

Installation

Install tidytable from CRAN:

install.packages("tidytable")

Creating a Tidytable

You can create a tidytable directly or convert existing data frames:

library(tidytable)

# From vectors
tt <- tidytable(x = 1:5, y = letters[1:5])

# Converting from a data.frame
df <- data.frame(x = 1:5, y = letters[1:5])
tt <- as_tidytable(df)

Core Features and Syntax

1. Selecting and Filtering Rows

Use filter() for subsetting data:

mtcars <- as_tidytable(mtcars)
mtcars %>%
  filter(cyl == 6)
# A tidytable: 7 × 11
    mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1  21       6  160    110  3.9   2.62  16.5     0     1     4     4
2  21       6  160    110  3.9   2.88  17.0     0     1     4     4
3  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1
4  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1
5  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4
6  17.8     6  168.   123  3.92  3.44  18.9     1     0     4     4
7  19.7     6  145    175  3.62  2.77  15.5     0     1     5     6

2. Adding or Modifying Columns

Add new variables with mutate():

mtcars %>%
  mutate(mpg_double = mpg * 2)
# A tidytable: 32 × 12
     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb mpg_double
   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>      <dbl>
 1  21       6  160    110  3.9   2.62  16.5     0     1     4     4       42  
 2  21       6  160    110  3.9   2.88  17.0     0     1     4     4       42  
 3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1       45.6
 4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1       42.8
 5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2       37.4
 6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1       36.2
 7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4       28.6
 8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2       48.8
 9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2       45.6
10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4       38.4
# ℹ 22 more rows

Group operations can be directly specified with .by:

mtcars %>%
  mutate(avg_mpg = mean(mpg), .by = cyl)
# A tidytable: 32 × 12
     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb avg_mpg
   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl>
 1  21       6  160    110  3.9   2.62  16.5     0     1     4     4    19.7
 2  21       6  160    110  3.9   2.88  17.0     0     1     4     4    19.7
 3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1    26.7
 4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1    19.7
 5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2    15.1
 6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1    19.7
 7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4    15.1
 8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2    26.7
 9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2    26.7
10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4    19.7
# ℹ 22 more rows

3. Summarizing Data

Generate grouped summaries:

mtcars %>%
  summarize(avg_mpg = mean(mpg), max_hp = max(hp), .by = cyl)
# A tidytable: 3 × 3
    cyl avg_mpg max_hp
  <dbl>   <dbl>  <dbl>
1     4    26.7    113
2     6    19.7    175
3     8    15.1    335

4. Joining Tables

Combine data tables with joins:

table1 <- mtcars %>% 
  summarize(avg_mpg = mean(mpg), .by = cyl)
table2 <- mtcars %>% 
  summarize(median_hp = median(hp), .by = cyl)

left_join(table1, table2, by = "cyl")
# A tidytable: 3 × 3
    cyl avg_mpg median_hp
  <dbl>   <dbl>     <dbl>
1     4    26.7       91 
2     6    19.7      110 
3     8    15.1      192.

5. Reshaping Data

Tidytable makes pivoting intuitive:

iris <- as_tidytable(iris)

iris %>%
  pivot_longer(
    cols = c(Sepal.Length, Sepal.Width),
    names_to = "measurement",
    values_to = "value"
  )
# A tidytable: 300 × 5
   Petal.Length Petal.Width Species measurement  value
          <dbl>       <dbl> <fct>   <chr>        <dbl>
 1          1.4         0.2 setosa  Sepal.Length   5.1
 2          1.4         0.2 setosa  Sepal.Length   4.9
 3          1.3         0.2 setosa  Sepal.Length   4.7
 4          1.5         0.2 setosa  Sepal.Length   4.6
 5          1.4         0.2 setosa  Sepal.Length   5  
 6          1.7         0.4 setosa  Sepal.Length   5.4
 7          1.4         0.3 setosa  Sepal.Length   4.6
 8          1.5         0.2 setosa  Sepal.Length   5  
 9          1.4         0.2 setosa  Sepal.Length   4.4
10          1.5         0.1 setosa  Sepal.Length   4.9
# ℹ 290 more rows

6. Row-wise Computations

Use c_across() for row-wise operations:

iris %>%
  rowwise() %>% 
  mutate(
    Sepal.Sum = sum(c_across(Sepal.Length:Sepal.Width)),
    Petal.Sum = sum(c_across(Petal.Length:Petal.Width))
  )
# A rowwise tidytable: 150 × 7
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Sum Petal.Sum
          <dbl>       <dbl>        <dbl>       <dbl> <fct>       <dbl>     <dbl>
 1          5.1         3.5          1.4         0.2 setosa        8.6       1.6
 2          4.9         3            1.4         0.2 setosa        7.9       1.6
 3          4.7         3.2          1.3         0.2 setosa        7.9       1.5
 4          4.6         3.1          1.5         0.2 setosa        7.7       1.7
 5          5           3.6          1.4         0.2 setosa        8.6       1.6
 6          5.4         3.9          1.7         0.4 setosa        9.3       2.1
 7          4.6         3.4          1.4         0.3 setosa        8         1.7
 8          5           3.4          1.5         0.2 setosa        8.4       1.7
 9          4.4         2.9          1.4         0.2 setosa        7.3       1.6
10          4.9         3.1          1.5         0.1 setosa        8         1.6
# ℹ 140 more rows

Advanced Usage

Performance on Large Datasets

tiddytable leverages data.table for its backend, providing significant speed advantages over dplyr:

  • Faster grouped operations with .by.
  • Efficient memory usage for large datasets.

Integration with Pipes

Works seamlessly with %>% or |>:

mtcars %>%
  filter(mpg > 20) %>%
  summarize(avg_hp = mean(hp), .by = c(cyl))
# A tidytable: 2 × 2
    cyl avg_hp
  <dbl>  <dbl>
1     4   82.6
2     6  110  

Comparing tidytable to Alternatives

Feature tidytable tidyverse data.table
Syntax Familiarity Tidyverse-like Intuitive and readable Concise but unique
Performance Optimized (data.table) Slower for large data Highly optimized
Pipe Compatibility Fully supported Fully supported Requires %>% or native

Practical Examples

Filtering and Transformation

mtcars %>%
  filter(cyl == 6) %>%
  mutate(mpg_ratio = mpg / wt)
# A tidytable: 7 × 12
    mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb mpg_ratio
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>     <dbl>
1  21       6  160    110  3.9   2.62  16.5     0     1     4     4      8.02
2  21       6  160    110  3.9   2.88  17.0     0     1     4     4      7.30
3  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1      6.66
4  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1      5.23
5  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4      5.58
6  17.8     6  168.   123  3.92  3.44  18.9     1     0     4     4      5.17
7  19.7     6  145    175  3.62  2.77  15.5     0     1     5     6      7.11

Grouped Summaries

mtcars %>%
  summarize(avg_mpg = mean(mpg), max_hp = max(hp), .by = cyl)
# A tidytable: 3 × 3
    cyl avg_mpg max_hp
  <dbl>   <dbl>  <dbl>
1     4    26.7    113
2     6    19.7    175
3     8    15.1    335

Pivoting Data

iris %>%
  pivot_longer(cols = starts_with("Sepal"),
               names_to = "measurement",
               values_to = "value")
# A tidytable: 300 × 5
   Petal.Length Petal.Width Species measurement  value
          <dbl>       <dbl> <fct>   <chr>        <dbl>
 1          1.4         0.2 setosa  Sepal.Length   5.1
 2          1.4         0.2 setosa  Sepal.Length   4.9
 3          1.3         0.2 setosa  Sepal.Length   4.7
 4          1.5         0.2 setosa  Sepal.Length   4.6
 5          1.4         0.2 setosa  Sepal.Length   5  
 6          1.7         0.4 setosa  Sepal.Length   5.4
 7          1.4         0.3 setosa  Sepal.Length   4.6
 8          1.5         0.2 setosa  Sepal.Length   5  
 9          1.4         0.2 setosa  Sepal.Length   4.4
10          1.5         0.1 setosa  Sepal.Length   4.9
# ℹ 290 more rows

Why Use tidytable?

  1. Ease of Use: Familiar tidyverse syntax reduces the learning curve.
  2. High Performance: Data processing is faster thanks to data.table.
  3. Scalable: Handles both small and large datasets efficiently.
  4. Flexible Grouping: Inline grouping via .by simplifies operations.

Conclusion

tiddytable blends the best of both worlds: intuitive syntax and high performance. It’s a perfect tool for R users seeking tidyverse simplicity with data.table efficiency. Whether you’re cleaning, transforming, or reshaping data, tidytable has you covered.