Power of purrr

Iterating Over Lists Like a Pro

Author

Raju Rimal

Published

December 10, 2024

Modified

March 19, 2025

In R, iterating over lists or complex data structures can sometimes become tedious, especially if you have to use loops or apply functions repeatedly. This is where the purrr package shines. purrr is part of the tidyverse, and it provides a set of powerful functions designed to make iterating over lists and other data structures not only more efficient but also more readable. In this guide, we’ll explore how to use purrr’s map functions to handle list operations efficiently and how you can use them to parse JSON files or handle nested structures.


1. Using map Functions for Efficient List Operations

The map functions from purrr allow you to apply a function to each element of a list (or vector), similar to how lapply() works, but with added advantages like better handling of different types of outputs and more readable syntax.

The map() Family of Functions

The basic map() function applies a function to each element of a list. It returns a list of the same length as the input.

  • Example:
library(purrr)

# Create a list of numbers
num_list <- list(1, 2, 3, 4, 5)

# Use map() to square each number
squared_numbers <- map(num_list, function(x) x^2)
print(squared_numbers)
[[1]]
[1] 1

[[2]]
[1] 4

[[3]]
[1] 9

[[4]]
[1] 16

[[5]]
[1] 25

This will output a list of squared numbers.

Variations of map()

  • map_lgl(): Returns a logical vector.
  • map_int(): Returns an integer vector.
  • map_dbl(): Returns a numeric vector.
  • map_chr(): Returns a character vector.

These functions are useful when you know the type of output you expect from applying a function, and they allow you to work more efficiently with the results.

  • Example with map_dbl():
# Create a list of numbers
num_list <- list(1.1, 2.2, 3.3, 4.4)

# Use map_dbl() to round each number
rounded_numbers <- map_dbl(num_list, round)
print(rounded_numbers)
[1] 1 2 3 4

This will output a numeric vector of rounded values.

Using map() with Named Lists

When working with named lists (e.g., lists containing data frames or more complex elements), map() retains the names of the list elements.

  • Example:
# Create a named list
named_list <- list(Alice = 25, Bob = 30, Charlie = 35)

# Use map() to add 5 to each person's age
updated_ages <- map(named_list, ~ .x + 5)
print(updated_ages)
$Alice
[1] 30

$Bob
[1] 35

$Charlie
[1] 40

The output will retain the names of the list elements, making the results more intuitive and easier to work with.


2. Parsing JSON Files with Nested Structures

JSON (JavaScript Object Notation) is a common format for representing nested data, often used in web APIs or data exchange between systems. Parsing and working with JSON data in R can be tricky due to the nested nature of the data. However, purrr makes this much easier by allowing you to efficiently iterate through nested structures and extract relevant information.

Using purrr to Parse and Process JSON

Let’s walk through an example where you need to parse a JSON file containing user information and then extract specific details from nested lists.

  • Example:
library(purrr)
library(jsonlite)

Attaching package: 'jsonlite'
The following object is masked from 'package:purrr':

    flatten
# Example JSON string (mimicking data from an API)
json_data <- '{"user": {"name": "Alice", "age": 30, "location": {"city": "New York", "state": "NY"}}, "orders": [{"item": "Laptop", "price": 1200}, {"item": "Phone", "price": 800}]}'

# Parse the JSON string into a list
parsed_data <- fromJSON(json_data)

# Extract user information using map()
user_location <- pluck(parsed_data, "user", "location")
print(user_location)
$city
[1] "New York"

$state
[1] "NY"
# Extract order details (nested list) and apply a function to get total price
total_price <- map_chr(pluck(parsed_data, "orders", "price"), ~paste(.x/1000, "K"))
print(total_price) # Total price of all order    
[1] "1.2 K" "0.8 K"

In this example:

  • We parsed a JSON string using the fromJSON() function from the jsonlite package.
  • We used map() to extract the user’s details (e.g., name, age, etc.).
  • We applied map_dbl() to extract the prices of the orders and then used sum() to calculate the total price of all items.

Working with More Complex Nested Structures

In more complex JSON data, where there are deeper levels of nesting, you can use purrr’s map functions recursively to navigate and manipulate the data.

  • Example:
# JSON with more nesting
json_data <- '{"company": {"name": "TechCorp", "employees": [{"name": "Alice", "role": "Developer"}, {"name": "Bob", "role": "Manager"}]}}'

# Parse the JSON data
parsed_data <- fromJSON(json_data)

# Extract employee names and roles using map()
employee_info <- pmap_chr(
  pluck(parsed_data, "company", "employees"),
  paste, sep = ", "
)

print(employee_info)
[1] "Alice, Developer" "Bob, Manager"    

Here, the nested employees list is processed to extract the names of employees, even though it is contained within multiple levels of nesting.


3. Why Use purrr for Iterating?

The purrr package simplifies the process of iterating over lists and other data structures in R by providing:

  • Cleaner syntax: purrr allows for more concise and readable code.
  • Consistency: With functions like map(), you don’t need to worry about the types of outputs. You can specify the expected return type (e.g., map_dbl() for numeric vectors).
  • Efficient handling of nested data: Working with complex nested structures is much easier with purrr than with base R loops or apply() functions.

Advantages of purrr over Traditional Loops

  • Readable and concise code: Rather than writing long for-loops, purrr functions make your code more declarative.
  • Vectorized operations: purrr functions are optimized for performance, especially when working with large datasets.
  • Easier to debug: Using functional programming constructs in purrr often leads to fewer side effects, making it easier to debug.

Summary

In this guide, we’ve covered how to use the purrr package to iterate over lists and other complex data structures efficiently:

  • Using map() and its variations to apply functions to each element of a list or vector, with outputs tailored to specific data types.
  • Parsing JSON: We’ve demonstrated how purrr can be used to work with nested data structures, such as JSON files, making it easy to extract and manipulate data.
  • Advantages of purrr: We highlighted the benefits of using purrr over traditional loops, including cleaner syntax, better performance, and ease of working with nested data.