Rectangling is the art and craft of taking a deeply nested list (often sourced from wild caught JSON or XML) and taming it into a tidy data set of rows and columns. There are three functions from tidyr that are particularly useful for rectangling:
unnest_longer() takes each element of a list-column and
makes a new row.unnest_wider() takes each element of a list-column and
makes a new column.unnest_auto() guesses whether you want
unnest_longer() or unnest_wider().hoist() is similar to unnest_wider() but
only plucks out selected components, and can reach down multiple
levels.A very large number of data rectangling problems can be solved by
combining these functions with a splash of dplyr (largely eliminating
prior approaches that combined mutate() with multiple
purrr::map()s).
To illustrate these techniques, we’ll use the repurrrsive package, which provides a number deeply nested lists originally mostly captured from web APIs.
library(tidyr)
library(dplyr)
library(repurrrsive)We’ll start with gh_users, a list which contains
information about six GitHub users. To begin, we put the
gh_users list into a data frame:
users <- tibble(user = gh_users)This seems a bit counter-intuitive: why is the first step in making a list simpler to make it more complicated? But a data frame has a big advantage: it bundles together multiple vectors so that everything is tracked together in a single object.
Each user is a named list, where each element represents
a column.
names(users$user[[1]])
#> [1] "login" "id" "avatar_url"
#> [4] "gravatar_id" "url" "html_url"
#> [7] "followers_url" "following_url" "gists_url"
#> [10] "starred_url" "subscriptions_url" "organizations_url"
#> [13] "repos_url" "events_url" "received_events_url"
#> [16] "type" "site_admin" "name"
#> [19] "company" "blog" "location"
#> [22] "email" "hireable" "bio"
#> [25] "public_repos" "public_gists" "followers"
#> [28] "following" "created_at" "updated_at"There are two ways to turn the list components into columns.
unnest_wider() takes every component and makes a new
column:
users %>% unnest_wider(user)
#> # A tibble: 6 × 30
#> login id avata…¹ grava…² url html_…³ follo…⁴ follo…⁵ gists…⁶ starr…⁷
#> <chr> <int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 gaborcsa… 6.60e5 https:… "" http… https:… https:… https:… https:… https:…
#> 2 jennybc 5.99e5 https:… "" http… https:… https:… https:… https:… https:…
#> 3 jtleek 1.57e6 https:… "" http… https:… https:… https:… https:… https:…
#> 4 juliasil… 1.25e7 https:… "" http… https:… https:… https:… https:… https:…
#> 5 leeper 3.51e6 https:… "" http… https:… https:… https:… https:… https:…
#> 6 masalmon 8.36e6 https:… "" http… https:… https:… https:… https:… https:…
#> # … with 20 more variables: subscriptions_url <chr>, organizations_url <chr>,
#> # repos_url <chr>, events_url <chr>, received_events_url <chr>, type <chr>,
#> # site_admin <lgl>, name <chr>, company <chr>, blog <chr>, location <chr>,
#> # email <chr>, hireable <lgl>, bio <chr>, public_repos <int>,
#> # public_gists <int>, followers <int>, following <int>, created_at <chr>,
#> # updated_at <chr>, and abbreviated variable names ¹avatar_url, ²gravatar_id,
#> # ³html_url, ⁴followers_url, ⁵following_url, ⁶gists_url, ⁷starred_urlBut in this case, there are many components and we don’t need most of
them so we can instead use hoist(). hoist()
allows us to pull out selected components using the same syntax as
purrr::pluck():
users %>% hoist(user,
followers = "followers",
login = "login",
url = "html_url"
)
#> # A tibble: 6 × 4
#> followers login url user
#> <int> <chr> <chr> <list>
#> 1 303 gaborcsardi https://github.com/gaborcsardi <named list [27]>
#> 2 780 jennybc https://github.com/jennybc <named list [27]>
#> 3 3958 jtleek https://github.com/jtleek <named list [27]>
#> 4 115 juliasilge https://github.com/juliasilge <named list [27]>
#> 5 213 leeper https://github.com/leeper <named list [27]>
#> 6 34 masalmon https://github.com/masalmon <named list [27]>hoist() removes the named components from the
user list-column, so you can think of it as moving
components out of the inner list into the top-level data frame.
We start off gh_repos similarly, by putting it in a
tibble:
repos <- tibble(repo = gh_repos)
repos
#> # A tibble: 6 × 1
#> repo
#> <list>
#> 1 <list [30]>
#> 2 <list [30]>
#> 3 <list [30]>
#> 4 <list [26]>
#> 5 <list [30]>
#> 6 <list [30]>This time the elements of user are a list of
repositories that belong to that user. These are observations, so should
become new rows, so we use unnest_longer() rather than
unnest_wider():
repos <- repos %>% unnest_longer(repo)
repos
#> # A tibble: 176 × 1
#> repo
#> <list>
#> 1 <named list [68]>
#> 2 <named list [68]>
#> 3 <named list [68]>
#> 4 <named list [68]>
#> 5 <named list [68]>
#> 6 <named list [68]>
#> # … with 170 more rowsThen we can use unnest_wider() or
hoist():
repos %>% hoist(repo,
login = c("owner", "login"),
name = "name",
homepage = "homepage",
watchers = "watchers_count"
)
#> # A tibble: 176 × 5
#> login name homepage watchers repo
#> <chr> <chr> <chr> <int> <list>
#> 1 gaborcsardi after <NA> 5 <named list [65]>
#> 2 gaborcsardi argufy <NA> 19 <named list [65]>
#> 3 gaborcsardi ask <NA> 5 <named list [65]>
#> 4 gaborcsardi baseimports <NA> 0 <named list [65]>
#> 5 gaborcsardi citest <NA> 0 <named list [65]>
#> 6 gaborcsardi clisymbols "" 18 <named list [65]>
#> # … with 170 more rowsNote the use of c("owner", "login"): this allows us to
reach two levels deep inside of a list. An alternative approach would be
to pull out just owner and then put each element of it in a
column:
repos %>%
hoist(repo, owner = "owner") %>%
unnest_wider(owner)
#> # A tibble: 176 × 18
#> login id avata…¹ grava…² url html_…³ follo…⁴ follo…⁵ gists…⁶ starr…⁷
#> <chr> <int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 gaborcsa… 660288 https:… "" http… https:… https:… https:… https:… https:…
#> 2 gaborcsa… 660288 https:… "" http… https:… https:… https:… https:… https:…
#> 3 gaborcsa… 660288 https:… "" http… https:… https:… https:… https:… https:…
#> 4 gaborcsa… 660288 https:… "" http… https:… https:… https:… https:… https:…
#> 5 gaborcsa… 660288 https:… "" http… https:… https:… https:… https:… https:…
#> 6 gaborcsa… 660288 https:… "" http… https:… https:… https:… https:… https:…
#> # … with 170 more rows, 8 more variables: subscriptions_url <chr>,
#> # organizations_url <chr>, repos_url <chr>, events_url <chr>,
#> # received_events_url <chr>, type <chr>, site_admin <lgl>, repo <list>, and
#> # abbreviated variable names ¹avatar_url, ²gravatar_id, ³html_url,
#> # ⁴followers_url, ⁵following_url, ⁶gists_url, ⁷starred_urlInstead of looking at the list and carefully thinking about whether
it needs to become rows or columns, you can use
unnest_auto(). It uses a handful of heuristics to figure
out whether unnest_longer() or unnest_wider()
is appropriate, and tells you about its reasoning.
tibble(repo = gh_repos) %>%
unnest_auto(repo) %>%
unnest_auto(repo)
#> Using `unnest_longer(repo, indices_include = FALSE)`; no element has names
#> Using `unnest_wider(repo)`; elements have 68 names in common
#> # A tibble: 176 × 68
#> id name full_…¹ owner private html_…² descr…³ fork url forks…⁴
#> <int> <chr> <chr> <list> <lgl> <chr> <chr> <lgl> <chr> <chr>
#> 1 6.12e7 after gaborc… <named list> FALSE https:… Run Co… FALSE http… https:…
#> 2 4.05e7 argu… gaborc… <named list> FALSE https:… Declar… FALSE http… https:…
#> 3 3.64e7 ask gaborc… <named list> FALSE https:… Friend… FALSE http… https:…
#> 4 3.49e7 base… gaborc… <named list> FALSE https:… Do we … FALSE http… https:…
#> 5 6.16e7 cite… gaborc… <named list> FALSE https:… Test R… TRUE http… https:…
#> 6 3.39e7 clis… gaborc… <named list> FALSE https:… Unicod… FALSE http… https:…
#> # … with 170 more rows, 58 more variables: keys_url <chr>,
#> # collaborators_url <chr>, teams_url <chr>, hooks_url <chr>,
#> # issue_events_url <chr>, events_url <chr>, assignees_url <chr>,
#> # branches_url <chr>, tags_url <chr>, blobs_url <chr>, git_tags_url <chr>,
#> # git_refs_url <chr>, trees_url <chr>, statuses_url <chr>,
#> # languages_url <chr>, stargazers_url <chr>, contributors_url <chr>,
#> # subscribers_url <chr>, subscription_url <chr>, commits_url <chr>, …got_chars has a similar structure to
gh_users: it’s a list of named lists, where each element of
the inner list describes some attribute of a GoT character. We start in
the same way, first by creating a data frame and then by unnesting each
component into a column:
chars <- tibble(char = got_chars)
chars
#> # A tibble: 30 × 1
#> char
#> <list>
#> 1 <named list [18]>
#> 2 <named list [18]>
#> 3 <named list [18]>
#> 4 <named list [18]>
#> 5 <named list [18]>
#> 6 <named list [18]>
#> # … with 24 more rows
chars2 <- chars %>% unnest_wider(char)
chars2
#> # A tibble: 30 × 18
#> url id name gender culture born died alive titles aliases father
#> <chr> <int> <chr> <chr> <chr> <chr> <chr> <lgl> <list> <list> <chr>
#> 1 https://ww… 1022 Theo… Male "Ironb… "In … "" TRUE <chr> <chr> ""
#> 2 https://ww… 1052 Tyri… Male "" "In … "" TRUE <chr> <chr> ""
#> 3 https://ww… 1074 Vict… Male "Ironb… "In … "" TRUE <chr> <chr> ""
#> 4 https://ww… 1109 Will Male "" "" "In … FALSE <chr> <chr> ""
#> 5 https://ww… 1166 Areo… Male "Norvo… "In … "" TRUE <chr> <chr> ""
#> 6 https://ww… 1267 Chett Male "" "At … "In … FALSE <chr> <chr> ""
#> # … with 24 more rows, and 7 more variables: mother <chr>, spouse <chr>,
#> # allegiances <list>, books <list>, povBooks <list>, tvSeries <list>,
#> # playedBy <list>This is more complex than gh_users because some
component of char are themselves a list, giving us a
collection of list-columns:
chars2 %>% select_if(is.list)
#> # A tibble: 30 × 7
#> titles aliases allegiances books povBooks tvSeries playedBy
#> <list> <list> <list> <list> <list> <list> <list>
#> 1 <chr [3]> <chr [4]> <chr [1]> <chr [3]> <chr [2]> <chr [6]> <chr [1]>
#> 2 <chr [2]> <chr [11]> <chr [1]> <chr [2]> <chr [4]> <chr [6]> <chr [1]>
#> 3 <chr [2]> <chr [1]> <chr [1]> <chr [3]> <chr [2]> <chr [1]> <chr [1]>
#> 4 <chr [1]> <chr [1]> <NULL> <chr [1]> <chr [1]> <chr [1]> <chr [1]>
#> 5 <chr [1]> <chr [1]> <chr [1]> <chr [3]> <chr [2]> <chr [2]> <chr [1]>
#> 6 <chr [1]> <chr [1]> <NULL> <chr [2]> <chr [1]> <chr [1]> <chr [1]>
#> # … with 24 more rowsWhat you do next will depend on the purposes of the analysis. Maybe you want a row for every book and TV series that the character appears in:
chars2 %>%
select(name, books, tvSeries) %>%
pivot_longer(c(books, tvSeries), names_to = "media", values_to = "value") %>%
unnest_longer(value)
#> # A tibble: 180 × 3
#> name media value
#> <chr> <chr> <chr>
#> 1 Theon Greyjoy books A Game of Thrones
#> 2 Theon Greyjoy books A Storm of Swords
#> 3 Theon Greyjoy books A Feast for Crows
#> 4 Theon Greyjoy tvSeries Season 1
#> 5 Theon Greyjoy tvSeries Season 2
#> 6 Theon Greyjoy tvSeries Season 3
#> # … with 174 more rowsOr maybe you want to build a table that lets you match title to name:
chars2 %>%
select(name, title = titles) %>%
unnest_longer(title)
#> # A tibble: 60 × 2
#> name title
#> <chr> <chr>
#> 1 Theon Greyjoy Prince of Winterfell
#> 2 Theon Greyjoy Captain of Sea Bitch
#> 3 Theon Greyjoy Lord of the Iron Islands (by law of the green lands)
#> 4 Tyrion Lannister Acting Hand of the King (former)
#> 5 Tyrion Lannister Master of Coin (former)
#> 6 Victarion Greyjoy Lord Captain of the Iron Fleet
#> # … with 54 more rows(Note that the empty titles ("") are due to an
infelicity in the input got_chars: ideally people without
titles would have a title vector of length 0, not a title vector of
length 1 containing an empty string.)
Again, we could rewrite using unnest_auto(). This is
convenient for exploration, but I wouldn’t rely on it in the long term -
unnest_auto() has the undesirable property that it will
always succeed. That means if your data structure changes,
unnest_auto() will continue to work, but might give very
different output that causes cryptic failures from downstream
functions.
tibble(char = got_chars) %>%
unnest_auto(char) %>%
select(name, title = titles) %>%
unnest_auto(title)
#> Using `unnest_wider(char)`; elements have 18 names in common
#> Using `unnest_longer(title, indices_include = FALSE)`; no element has names
#> # A tibble: 60 × 2
#> name title
#> <chr> <chr>
#> 1 Theon Greyjoy Prince of Winterfell
#> 2 Theon Greyjoy Captain of Sea Bitch
#> 3 Theon Greyjoy Lord of the Iron Islands (by law of the green lands)
#> 4 Tyrion Lannister Acting Hand of the King (former)
#> 5 Tyrion Lannister Master of Coin (former)
#> 6 Victarion Greyjoy Lord Captain of the Iron Fleet
#> # … with 54 more rowsNext we’ll tackle a more complex form of data that comes from Google’s geocoding service. It’s against the terms of service to cache this data, so I first write a very simple wrapper around the API. This relies on having an Google maps API key stored in an environment; if that’s not available these code chunks won’t be run.
has_key <- !identical(Sys.getenv("GOOGLE_MAPS_API_KEY"), "")
if (!has_key) {
message("No Google Maps API key found; code chunks will not be run")
}
# https://developers.google.com/maps/documentation/geocoding
geocode <- function(address, api_key = Sys.getenv("GOOGLE_MAPS_API_KEY")) {
url <- "https://maps.googleapis.com/maps/api/geocode/json"
url <- paste0(url, "?address=", URLencode(address), "&key=", api_key)
jsonlite::read_json(url)
}The list that this function returns is quite complex:
houston <- geocode("Houston TX")
str(houston)
#> List of 2
#> $ results:List of 1
#> ..$ :List of 5
#> .. ..$ address_components:List of 4
#> .. .. ..$ :List of 3
#> .. .. .. ..$ long_name : chr "Houston"
#> .. .. .. ..$ short_name: chr "Houston"
#> .. .. .. ..$ types :List of 2
#> .. .. .. .. ..$ : chr "locality"
#> .. .. .. .. ..$ : chr "political"
#> .. .. ..$ :List of 3
#> .. .. .. ..$ long_name : chr "Harris County"
#> .. .. .. ..$ short_name: chr "Harris County"
#> .. .. .. ..$ types :List of 2
#> .. .. .. .. ..$ : chr "administrative_area_level_2"
#> .. .. .. .. ..$ : chr "political"
#> .. .. ..$ :List of 3
#> .. .. .. ..$ long_name : chr "Texas"
#> .. .. .. ..$ short_name: chr "TX"
#> .. .. .. ..$ types :List of 2
#> .. .. .. .. ..$ : chr "administrative_area_level_1"
#> .. .. .. .. ..$ : chr "political"
#> .. .. ..$ :List of 3
#> .. .. .. ..$ long_name : chr "United States"
#> .. .. .. ..$ short_name: chr "US"
#> .. .. .. ..$ types :List of 2
#> .. .. .. .. ..$ : chr "country"
#> .. .. .. .. ..$ : chr "political"
#> .. ..$ formatted_address : chr "Houston, TX, USA"
#> .. ..$ geometry :List of 4
#> .. .. ..$ bounds :List of 2
#> .. .. .. ..$ northeast:List of 2
#> .. .. .. .. ..$ lat: num 30.1
#> .. .. .. .. ..$ lng: num -95
#> .. .. .. ..$ southwest:List of 2
#> .. .. .. .. ..$ lat: num 29.5
#> .. .. .. .. ..$ lng: num -95.8
#> .. .. ..$ location :List of 2
#> .. .. .. ..$ lat: num 29.8
#> .. .. .. ..$ lng: num -95.4
#> .. .. ..$ location_type: chr "APPROXIMATE"
#> .. .. ..$ viewport :List of 2
#> .. .. .. ..$ northeast:List of 2
#> .. .. .. .. ..$ lat: num 30.1
#> .. .. .. .. ..$ lng: num -95
#> .. .. .. ..$ southwest:List of 2
#> .. .. .. .. ..$ lat: num 29.5
#> .. .. .. .. ..$ lng: num -95.8
#> .. ..$ place_id : chr "ChIJAYWNSLS4QIYROwVl894CDco"
#> .. ..$ types :List of 2
#> .. .. ..$ : chr "locality"
#> .. .. ..$ : chr "political"
#> $ status : chr "OK"Fortunately, we can attack the problem step by step with tidyr functions. To make the problem a bit harder (!) and more realistic, I’ll start by geocoding a few cities:
city <- c("Houston", "LA", "New York", "Chicago", "Springfield")
city_geo <- purrr::map(city, geocode)I’ll put these results in a tibble, next to the original city name:
loc <- tibble(city = city, json = city_geo)
loc
#> # A tibble: 5 × 2
#> city json
#> <chr> <list>
#> 1 Houston <named list [2]>
#> 2 LA <named list [2]>
#> 3 New York <named list [2]>
#> 4 Chicago <named list [2]>
#> 5 Springfield <named list [2]>The first level contains components status and
result, which we can reveal with
unnest_wider():
loc %>%
unnest_wider(json)
#> # A tibble: 5 × 3
#> city results status
#> <chr> <list> <chr>
#> 1 Houston <list [1]> OK
#> 2 LA <list [1]> OK
#> 3 New York <list [1]> OK
#> 4 Chicago <list [1]> OK
#> 5 Springfield <list [1]> OKNotice that results is a list of lists. Most of the
cities have 1 element (representing a unique match from the geocoding
API), but Springfield has two. We can pull these out into separate rows
with unnest_longer():
loc %>%
unnest_wider(json) %>%
unnest_longer(results)
#> # A tibble: 5 × 3
#> city results status
#> <chr> <list> <chr>
#> 1 Houston <named list [5]> OK
#> 2 LA <named list [5]> OK
#> 3 New York <named list [5]> OK
#> 4 Chicago <named list [5]> OK
#> 5 Springfield <named list [7]> OKNow these all have the same components, as revealed by
unnest_wider():
loc %>%
unnest_wider(json) %>%
unnest_longer(results) %>%
unnest_wider(results)
#> # A tibble: 5 × 9
#> city addre…¹ forma…² geometry place…³ types parti…⁴ plus_code status
#> <chr> <list> <chr> <list> <chr> <list> <lgl> <list> <chr>
#> 1 Houst… <list> Housto… <named list> ChIJAY… <list> NA <NULL> OK
#> 2 LA <list> Los An… <named list> ChIJE9… <list> NA <NULL> OK
#> 3 New Y… <list> New Yo… <named list> ChIJOw… <list> NA <NULL> OK
#> 4 Chica… <list> Chicag… <named list> ChIJ7c… <list> NA <NULL> OK
#> 5 Sprin… <list> Spring… <named list> ChIJ-x… <list> TRUE <named list> OK
#> # … with abbreviated variable names ¹address_components, ²formatted_address,
#> # ³place_id, ⁴partial_matchWe can find the lat and lon coordinates by unnesting
geometry:
loc %>%
unnest_wider(json) %>%
unnest_longer(results) %>%
unnest_wider(results) %>%
unnest_wider(geometry)
#> # A tibble: 5 × 12
#> city addre…¹ forma…² bounds location locat…³ viewport place…⁴
#> <chr> <list> <chr> <list> <list> <chr> <list> <chr>
#> 1 Houston <list> Housto… <named list> <named list> APPROX… <named list> ChIJAY…
#> 2 LA <list> Los An… <named list> <named list> APPROX… <named list> ChIJE9…
#> 3 New Yo… <list> New Yo… <named list> <named list> APPROX… <named list> ChIJOw…
#> 4 Chicago <list> Chicag… <named list> <named list> APPROX… <named list> ChIJ7c…
#> 5 Spring… <list> Spring… <NULL> <named list> ROOFTOP <named list> ChIJ-x…
#> # … with 4 more variables: types <list>, partial_match <lgl>, plus_code <list>,
#> # status <chr>, and abbreviated variable names ¹address_components,
#> # ²formatted_address, ³location_type, ⁴place_idAnd then location:
loc %>%
unnest_wider(json) %>%
unnest_longer(results) %>%
unnest_wider(results) %>%
unnest_wider(geometry) %>%
unnest_wider(location)
#> # A tibble: 5 × 13
#> city addre…¹ forma…² bounds lat lng locat…³ viewport place…⁴
#> <chr> <list> <chr> <list> <dbl> <dbl> <chr> <list> <chr>
#> 1 Houston <list> Housto… <named list> 29.8 -95.4 APPROX… <named list> ChIJAY…
#> 2 LA <list> Los An… <named list> 34.1 -118. APPROX… <named list> ChIJE9…
#> 3 New Yo… <list> New Yo… <named list> 40.7 -74.0 APPROX… <named list> ChIJOw…
#> 4 Chicago <list> Chicag… <named list> 41.9 -87.6 APPROX… <named list> ChIJ7c…
#> 5 Spring… <list> Spring… <NULL> 42.1 -72.6 ROOFTOP <named list> ChIJ-x…
#> # … with 4 more variables: types <list>, partial_match <lgl>, plus_code <list>,
#> # status <chr>, and abbreviated variable names ¹address_components,
#> # ²formatted_address, ³location_type, ⁴place_idAgain, unnest_auto() makes this simpler with the small
risk of failing in unexpected ways if the input structure changes:
loc %>%
unnest_auto(json) %>%
unnest_auto(results) %>%
unnest_auto(results) %>%
unnest_auto(geometry) %>%
unnest_auto(location)
#> Using `unnest_wider(json)`; elements have 2 names in common
#> Using `unnest_longer(results, indices_include = FALSE)`; no element has names
#> Using `unnest_wider(results)`; elements have 5 names in common
#> Using `unnest_wider(geometry)`; elements have 3 names in common
#> Using `unnest_wider(location)`; elements have 2 names in common
#> # A tibble: 5 × 13
#> city addre…¹ forma…² bounds lat lng locat…³ viewport place…⁴
#> <chr> <list> <chr> <list> <dbl> <dbl> <chr> <list> <chr>
#> 1 Houston <list> Housto… <named list> 29.8 -95.4 APPROX… <named list> ChIJAY…
#> 2 LA <list> Los An… <named list> 34.1 -118. APPROX… <named list> ChIJE9…
#> 3 New Yo… <list> New Yo… <named list> 40.7 -74.0 APPROX… <named list> ChIJOw…
#> 4 Chicago <list> Chicag… <named list> 41.9 -87.6 APPROX… <named list> ChIJ7c…
#> 5 Spring… <list> Spring… <NULL> 42.1 -72.6 ROOFTOP <named list> ChIJ-x…
#> # … with 4 more variables: types <list>, partial_match <lgl>, plus_code <list>,
#> # status <chr>, and abbreviated variable names ¹address_components,
#> # ²formatted_address, ³location_type, ⁴place_idWe could also just look at the first address for each city:
loc %>%
unnest_wider(json) %>%
hoist(results, first_result = 1) %>%
unnest_wider(first_result) %>%
unnest_wider(geometry) %>%
unnest_wider(location)
#> # A tibble: 5 × 13
#> city addre…¹ forma…² bounds lat lng locat…³ viewport place…⁴
#> <chr> <list> <chr> <list> <dbl> <dbl> <chr> <list> <chr>
#> 1 Houston <list> Housto… <named list> 29.8 -95.4 APPROX… <named list> ChIJAY…
#> 2 LA <list> Los An… <named list> 34.1 -118. APPROX… <named list> ChIJE9…
#> 3 New Yo… <list> New Yo… <named list> 40.7 -74.0 APPROX… <named list> ChIJOw…
#> 4 Chicago <list> Chicag… <named list> 41.9 -87.6 APPROX… <named list> ChIJ7c…
#> 5 Spring… <list> Spring… <NULL> 42.1 -72.6 ROOFTOP <named list> ChIJ-x…
#> # … with 4 more variables: types <list>, partial_match <lgl>, plus_code <list>,
#> # status <chr>, and abbreviated variable names ¹address_components,
#> # ²formatted_address, ³location_type, ⁴place_idOr use hoist() to dive deeply to get directly to
lat and lng:
loc %>%
hoist(json,
lat = list("results", 1, "geometry", "location", "lat"),
lng = list("results", 1, "geometry", "location", "lng")
)
#> # A tibble: 5 × 4
#> city lat lng json
#> <chr> <dbl> <dbl> <list>
#> 1 Houston 29.8 -95.4 <named list [2]>
#> 2 LA 34.1 -118. <named list [2]>
#> 3 New York 40.7 -74.0 <named list [2]>
#> 4 Chicago 41.9 -87.6 <named list [2]>
#> 5 Springfield 42.1 -72.6 <named list [2]>I’d normally use readr::parse_datetime() or
lubridate::ymd_hms(), but I can’t here because it’s a
vignette and I don’t want to add a dependency to tidyr just to simplify
one example.↩︎