tidyverse
grids
adventdrob
⭐⭐
Author

Ella Kaye

Published

December 3, 2023

Setup

The original challenge

My data

Part 1

The input for this puzzle represents a grid. When I was writing the aochelpers package, I came across David Robinson’s adventdrob package, which contains functions that he finds useful for working with grids in Advent of Code challenges, in particular grid_tidy() and adjacent_join(). I have struggled with grid puzzles in the past, so when I saw today’s, I immediately thought of using these functions. I didn’t have much time today to dedicate to Advent of Code, so instead of attempting to solve the puzzle myself, instead I decided to spend that time unpicking David’s solution. He had already posted it, and, unsurprisingly, used his package to solve it. I hope that this makes it easier for me to approach future Advent of Code grid puzzles.

Since I’m writing up this post as a record of understanding someone else’s code, I’m going to work through it with the example input:

467..114..
...*......
..35..633.
......#...
617*......
.....+.58.
..592.....
......755.
...$.*....
.664.598..
The crux of the puzzle

Find the sum of all the numbers that are adjacent (including diagonally) to a symbol.

First, I used the “paste as tribble” RStudio addin from the datapasta package get the example input into R:

Toggle the code
input <- tibble::tribble(
  ~ X1,
   "467..114..",
   "...*......",
   "..35..633.",
   "......#...",
   "617*......",
   ".....+.58.",
   "..592.....",
   "......755.",
   "...$.*....",
   ".664.598.."
  )
Warning

The datapasta addin turns the first row into a column header, so I needed to adjust that. I called it X1 as that’ll be the colname when reading in the full input with aoc_input_data_frame(),

Also, and this really caught me out, it interprets # as signifying a comment, so doesn’t copy that, or anything to its right. That meant row 4 was pasted as "......" and I had to write in the rest of the line.

I’ll break down David’s solution more-or-less line by line. grid_tidy() takes a data frame and, for a given column, produces a data frame with one row per character, with the value of that character, and a column for each of its row and column positions in the grid:

Toggle the code
library(tidyverse)
library(adventdrob)
g <- input |> 
  adventdrob::grid_tidy(X1) 
g
# A tibble: 100 × 3
     row value   col
   <int> <chr> <int>
 1     1 4         1
 2     1 6         2
 3     1 7         3
 4     1 .         4
 5     1 .         5
 6     1 1         6
 7     1 1         7
 8     1 4         8
 9     1 .         9
10     1 .        10
# ℹ 90 more rows

We need to be able to identify the numbers. At various point in what follows, that means both for individual characters, but also being able to know what part number they’re part of. The last line of the following is a clever trick for identifying which digits group together:

Toggle the code
g <- g |> 
  mutate(is_digit = str_detect(value, "\\d")) |>
  group_by(row) |> 
  mutate(number_id = paste0(row, ".", consecutive_id(is_digit))) 
g
# A tibble: 100 × 5
# Groups:   row [10]
     row value   col is_digit number_id
   <int> <chr> <int> <lgl>    <chr>    
 1     1 4         1 TRUE     1.1      
 2     1 6         2 TRUE     1.1      
 3     1 7         3 TRUE     1.1      
 4     1 .         4 FALSE    1.2      
 5     1 .         5 FALSE    1.2      
 6     1 1         6 TRUE     1.3      
 7     1 1         7 TRUE     1.3      
 8     1 4         8 TRUE     1.3      
 9     1 .         9 FALSE    1.4      
10     1 .        10 FALSE    1.4      
# ℹ 90 more rows

The new consecutive_id function from dplyr is doing a lot of heavy lifting here. It generates a unique identifier that increments every time a variable (or combination of variables) changes.1

Now that we know which digits belong together, we need to keep track of what the actual part number is:

Toggle the code
g <- g |> 
  group_by(number_id) |> 
  mutate(part_number = as.numeric(paste(value, collapse = ""))) |> 
  ungroup() 
g
# A tibble: 100 × 6
     row value   col is_digit number_id part_number
   <int> <chr> <int> <lgl>    <chr>           <dbl>
 1     1 4         1 TRUE     1.1               467
 2     1 6         2 TRUE     1.1               467
 3     1 7         3 TRUE     1.1               467
 4     1 .         4 FALSE    1.2                NA
 5     1 .         5 FALSE    1.2                NA
 6     1 1         6 TRUE     1.3               114
 7     1 1         7 TRUE     1.3               114
 8     1 4         8 TRUE     1.3               114
 9     1 .         9 FALSE    1.4                NA
10     1 .        10 FALSE    1.4                NA
# ℹ 90 more rows

This is the first time I’d seen this this trick to combine the values in several rows into one, and I feel like it’s a good one to know now!

This data frame, g, is what we need for both parts, so let’s leave it as that, and use it to start a new data frame for Part 1.

We want to find the part numbers that are adjacent to symbols. So first, we narrow our attention just to the rows that represent part numbers, then look at what’s around them. adjacent_join() from adventdrob is designed exactly for this:

Toggle the code
p1 <- g |> 
  filter(!is.na(part_number)) |> 
  adventdrob::adjacent_join(g, diagonal = TRUE) |> 
  arrange(row, col)
p1
# A tibble: 183 × 12
     row value   col is_digit number_id part_number  row2  col2 value2 is_digit2
   <int> <chr> <int> <lgl>    <chr>           <dbl> <dbl> <dbl> <chr>  <lgl>    
 1     1 4         1 TRUE     1.1               467     1     2 6      TRUE     
 2     1 4         1 TRUE     1.1               467     2     1 .      FALSE    
 3     1 4         1 TRUE     1.1               467     2     2 .      FALSE    
 4     1 6         2 TRUE     1.1               467     1     1 4      TRUE     
 5     1 6         2 TRUE     1.1               467     1     3 7      TRUE     
 6     1 6         2 TRUE     1.1               467     2     1 .      FALSE    
 7     1 6         2 TRUE     1.1               467     2     2 .      FALSE    
 8     1 6         2 TRUE     1.1               467     2     3 .      FALSE    
 9     1 7         3 TRUE     1.1               467     1     2 6      TRUE     
10     1 7         3 TRUE     1.1               467     1     4 .      FALSE    
# ℹ 173 more rows
# ℹ 2 more variables: number_id2 <chr>, part_number2 <dbl>

It’s a little tricky to see how this works from the first 10 rows, so let’s look at, for example, all eight characters around the “3” in the third column of the third row, and their row and column positions in the grid:

Toggle the code
p1 |> 
  filter(row == 3 & col == 3) |> 
  select(value2, row2, col2)
# A tibble: 8 × 3
  value2  row2  col2
  <chr>  <dbl> <dbl>
1 .          2     2
2 .          2     3
3 *          2     4
4 .          3     2
5 5          3     4
6 .          4     2
7 .          4     3
8 .          4     4

We only need to know which values are adjacent to symbols, so we discard the the adjectent elements (value2) that are either digits or “.”:

Toggle the code
p1 <- p1 |> 
  filter(value2 != ".", !is_digit2)
p1
# A tibble: 12 × 12
     row value   col is_digit number_id part_number  row2  col2 value2 is_digit2
   <int> <chr> <int> <lgl>    <chr>           <dbl> <dbl> <dbl> <chr>  <lgl>    
 1     1 7         3 TRUE     1.1               467     2     4 *      FALSE    
 2     3 3         3 TRUE     3.2                35     2     4 *      FALSE    
 3     3 5         4 TRUE     3.2                35     2     4 *      FALSE    
 4     3 6         7 TRUE     3.4               633     4     7 #      FALSE    
 5     3 3         8 TRUE     3.4               633     4     7 #      FALSE    
 6     5 7         3 TRUE     5.1               617     5     4 *      FALSE    
 7     7 2         5 TRUE     7.2               592     6     6 +      FALSE    
 8     8 7         7 TRUE     8.2               755     9     6 *      FALSE    
 9    10 6         3 TRUE     10.2              664     9     4 $      FALSE    
10    10 4         4 TRUE     10.2              664     9     4 $      FALSE    
11    10 5         6 TRUE     10.4              598     9     6 *      FALSE    
12    10 9         7 TRUE     10.4              598     9     6 *      FALSE    
# ℹ 2 more variables: number_id2 <chr>, part_number2 <dbl>

Almost there. Some part numbers have more than one character adjacent to a symbol, e.g. both the 3 and 5 of the 35 are adjacent to the *, so we need to discard duplicate part numbers, then sum over them:

Toggle the code
p1 |> 
  distinct(number_id, .keep_all = TRUE) |> 
  summarise(sum_part_numbers = sum(part_number)) |> 
  pull(sum_part_numbers)  
[1] 4361

Done! I’ll come back to my input at the end of this post, putting everything together, once we’ve gone through both parts.

Part 2

The crux of the puzzle

A gear is any * symbol that is adjacent to exactly two part numbers. Its gear ratio is the result of multiplying those two numbers together. Find the sum of all our gear ratios.

Let’s go back to g and this time start by find everything that’s next to a gear. I’m also going to discard columns that we don’t need (so all columns get printed):

Toggle the code
p2 <- g |> 
  filter(value == "*") |> 
  adventdrob::adjacent_join(g, diagonal = TRUE) |> 
  select(-value, -part_number, -is_digit, -is_digit2)
p2
# A tibble: 24 × 8
     row   col number_id  row2  col2 value2 number_id2 part_number2
   <int> <int> <chr>     <dbl> <dbl> <chr>  <chr>             <dbl>
 1     2     4 2.1           1     3 7      1.1                 467
 2     2     4 2.1           1     4 .      1.2                  NA
 3     2     4 2.1           1     5 .      1.2                  NA
 4     2     4 2.1           2     3 .      2.1                  NA
 5     2     4 2.1           2     5 .      2.1                  NA
 6     2     4 2.1           3     3 3      3.2                  35
 7     2     4 2.1           3     4 5      3.2                  35
 8     2     4 2.1           3     5 .      3.3                  NA
 9     5     4 5.2           4     3 .      4.1                  NA
10     5     4 5.2           4     4 .      4.1                  NA
# ℹ 14 more rows

We only need the gears that are next to numbers:

Toggle the code
p2 <- p2 |> 
  arrange(row, col) |> 
  filter(!is.na(part_number2)) 
p2
# A tibble: 7 × 8
    row   col number_id  row2  col2 value2 number_id2 part_number2
  <int> <int> <chr>     <dbl> <dbl> <chr>  <chr>             <dbl>
1     2     4 2.1           1     3 7      1.1                 467
2     2     4 2.1           3     3 3      3.2                  35
3     2     4 2.1           3     4 5      3.2                  35
4     5     4 5.2           5     3 7      5.1                 617
5     9     6 9.1           8     7 7      8.2                 755
6     9     6 9.1          10     6 5      10.4                598
7     9     6 9.1          10     7 9      10.4                598

There’s some duplication here, for example the gear in the 2nd row is next to both the “3” and the “5”, so we filter appropriately, then summarise to create the information we’ll need to filter on and use in computation later:

Toggle the code
p2 <- p2 |> 
  distinct(number_id2, .keep_all = TRUE) |> 
  group_by(row, col) |> 
  summarise(n_adjacent_numbers = n(),
            gear_ratio = prod(part_number2),
            .groups = "drop")
p2
# A tibble: 3 × 4
    row   col n_adjacent_numbers gear_ratio
  <int> <int>              <int>      <dbl>
1     2     4                  2      16345
2     5     4                  1        617
3     9     6                  2     451490

.groups is an experiment feature in dplyr that controls the grouping structure of the output, and "drop" says all grouping levels are dropped. I wondered why David had used group_by rather than .by within summarise, but it turns out you can’t use both .by and .groups in the same call.

We finish by finding the gears (i.e. where the * is adjacent to just two parts), then summing their ratios:

Toggle the code
p2 |> 
  filter(n_adjacent_numbers == 2) |>
  summarise(sum_gear_ratios = sum(gear_ratio)) |> 
  pull(sum_gear_ratios)  
[1] 467835

My input

Let’s put that all together, using my input.

Toggle the code
library(aochelpers)
input <- aoc_input_data_frame(3, 2023)
Toggle the code
g <- input |> 
  adventdrob::grid_tidy(X1) |> 
  mutate(is_digit = str_detect(value, "\\d")) |>
  group_by(row) |> 
  mutate(number_id = paste0(row, ".", consecutive_id(is_digit))) |>
  group_by(number_id) |> 
  mutate(part_number = as.numeric(paste(value, collapse = ""))) |> 
  ungroup() 

Part 1

Toggle the code
g |> 
  filter(!is.na(part_number)) |> 
  adventdrob::adjacent_join(g, diagonal = TRUE) |> 
  arrange(row, col) |> 
  filter(value2 != ".", !is_digit2) |> 
  distinct(number_id, .keep_all = TRUE) |> 
  summarise(sum_part_numbers = sum(part_number)) |> 
  pull(sum_part_numbers)
[1] 507214

Part 2

Toggle the code
g |> 
  filter(value == "*") |> 
  adventdrob::adjacent_join(g, diagonal = TRUE) |> 
  arrange(row, col) |> 
  filter(!is.na(part_number2)) |>
  distinct(row, col, number_id2, .keep_all = TRUE) |>
  group_by(row, col) |> 
  summarise(n_adjacent_numbers = n(),
            gear_ratio = prod(part_number2),
            .groups = "drop") |> 
  filter(n_adjacent_numbers == 2) |>
  summarise(sum_gear_ratios = sum(gear_ratio)) |> 
  pull(sum_gear_ratios)
[1] 72553319

Session info

Toggle
─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.3.2 (2023-10-31)
 os       macOS Sonoma 14.1
 system   aarch64, darwin20
 ui       X11
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       Europe/London
 date     2023-12-06
 pandoc   3.1.1 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/ (via rmarkdown)
 quarto   1.4.515 @ /usr/local/bin/quarto

─ Packages ───────────────────────────────────────────────────────────────────
 package     * version    date (UTC) lib source
 adventdrob  * 0.0.1      2023-11-19 [1] Github (dgrtwo/adventdrob@491050f)
 aochelpers  * 0.1.0.9000 2023-12-06 [1] local
 dplyr       * 1.1.4      2023-11-17 [1] CRAN (R 4.3.1)
 forcats     * 1.0.0      2023-01-29 [1] CRAN (R 4.3.0)
 ggplot2     * 3.4.4      2023-10-12 [1] CRAN (R 4.3.1)
 lubridate   * 1.9.3      2023-09-27 [1] CRAN (R 4.3.1)
 purrr       * 1.0.2      2023-08-10 [1] CRAN (R 4.3.0)
 readr       * 2.1.4      2023-02-10 [1] CRAN (R 4.3.0)
 sessioninfo * 1.2.2      2021-12-06 [1] CRAN (R 4.3.0)
 stringr     * 1.5.1      2023-11-14 [1] CRAN (R 4.3.1)
 tibble      * 3.2.1      2023-03-20 [1] CRAN (R 4.3.0)
 tidyr       * 1.3.0      2023-01-24 [1] CRAN (R 4.3.0)
 tidyverse   * 2.0.0      2023-02-22 [1] CRAN (R 4.3.0)

 [1] /Users/ellakaye/Library/R/arm64/4.3/library
 [2] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library

──────────────────────────────────────────────────────────────────────────────

Footnotes

  1. David actually did something much fiddlier to get the same thing. It was Tan Ho, in the Advent of Code channel on the R4DS Slack, who later suggested this much neater option.↩︎