Toggle the code
<- tibble::tribble(
input ~ X1,
"467..114..",
"...*......",
"..35..633.",
"......#...",
"617*......",
".....+.58.",
"..592.....",
"......755.",
"...$.*....",
".664.598.."
)
Ella Kaye
December 3, 2023
The input for this puzzle represents a grid. When I was writing the aochelpers package, I came across David Robinson’s adventdrob package, which contains functions that he finds useful for working with grids in Advent of Code challenges, in particular grid_tidy()
and adjacent_join()
. I have struggled with grid puzzles in the past, so when I saw today’s, I immediately thought of using these functions. I didn’t have much time today to dedicate to Advent of Code, so instead of attempting to solve the puzzle myself, instead I decided to spend that time unpicking David’s solution. He had already posted it, and, unsurprisingly, used his package to solve it. I hope that this makes it easier for me to approach future Advent of Code grid puzzles.
Since I’m writing up this post as a record of understanding someone else’s code, I’m going to work through it with the example input:
467..114..
...*......
..35..633.
......#...
617*......
.....+.58.
..592.....
......755.
...$.*....
.664.598..
Find the sum of all the numbers that are adjacent (including diagonally) to a symbol.
First, I used the “paste as tribble” RStudio addin from the datapasta package get the example input into R:
The datapasta addin turns the first row into a column header, so I needed to adjust that. I called it X1
as that’ll be the colname when reading in the full input with aoc_input_data_frame()
,
Also, and this really caught me out, it interprets #
as signifying a comment, so doesn’t copy that, or anything to its right. That meant row 4 was pasted as "......"
and I had to write in the rest of the line.
I’ll break down David’s solution more-or-less line by line. grid_tidy()
takes a data frame and, for a given column, produces a data frame with one row per character, with the value of that character, and a column for each of its row and column positions in the grid:
# A tibble: 100 × 3
row value col
<int> <chr> <int>
1 1 4 1
2 1 6 2
3 1 7 3
4 1 . 4
5 1 . 5
6 1 1 6
7 1 1 7
8 1 4 8
9 1 . 9
10 1 . 10
# ℹ 90 more rows
We need to be able to identify the numbers. At various point in what follows, that means both for individual characters, but also being able to know what part number they’re part of. The last line of the following is a clever trick for identifying which digits group together:
# A tibble: 100 × 5
# Groups: row [10]
row value col is_digit number_id
<int> <chr> <int> <lgl> <chr>
1 1 4 1 TRUE 1.1
2 1 6 2 TRUE 1.1
3 1 7 3 TRUE 1.1
4 1 . 4 FALSE 1.2
5 1 . 5 FALSE 1.2
6 1 1 6 TRUE 1.3
7 1 1 7 TRUE 1.3
8 1 4 8 TRUE 1.3
9 1 . 9 FALSE 1.4
10 1 . 10 FALSE 1.4
# ℹ 90 more rows
The new consecutive_id
function from dplyr is doing a lot of heavy lifting here. It generates a unique identifier that increments every time a variable (or combination of variables) changes.1
Now that we know which digits belong together, we need to keep track of what the actual part number is:
# A tibble: 100 × 6
row value col is_digit number_id part_number
<int> <chr> <int> <lgl> <chr> <dbl>
1 1 4 1 TRUE 1.1 467
2 1 6 2 TRUE 1.1 467
3 1 7 3 TRUE 1.1 467
4 1 . 4 FALSE 1.2 NA
5 1 . 5 FALSE 1.2 NA
6 1 1 6 TRUE 1.3 114
7 1 1 7 TRUE 1.3 114
8 1 4 8 TRUE 1.3 114
9 1 . 9 FALSE 1.4 NA
10 1 . 10 FALSE 1.4 NA
# ℹ 90 more rows
This is the first time I’d seen this this trick to combine the values in several rows into one, and I feel like it’s a good one to know now!
This data frame, g
, is what we need for both parts, so let’s leave it as that, and use it to start a new data frame for Part 1.
We want to find the part numbers that are adjacent to symbols. So first, we narrow our attention just to the rows that represent part numbers, then look at what’s around them. adjacent_join()
from adventdrob is designed exactly for this:
# A tibble: 183 × 12
row value col is_digit number_id part_number row2 col2 value2 is_digit2
<int> <chr> <int> <lgl> <chr> <dbl> <dbl> <dbl> <chr> <lgl>
1 1 4 1 TRUE 1.1 467 1 2 6 TRUE
2 1 4 1 TRUE 1.1 467 2 1 . FALSE
3 1 4 1 TRUE 1.1 467 2 2 . FALSE
4 1 6 2 TRUE 1.1 467 1 1 4 TRUE
5 1 6 2 TRUE 1.1 467 1 3 7 TRUE
6 1 6 2 TRUE 1.1 467 2 1 . FALSE
7 1 6 2 TRUE 1.1 467 2 2 . FALSE
8 1 6 2 TRUE 1.1 467 2 3 . FALSE
9 1 7 3 TRUE 1.1 467 1 2 6 TRUE
10 1 7 3 TRUE 1.1 467 1 4 . FALSE
# ℹ 173 more rows
# ℹ 2 more variables: number_id2 <chr>, part_number2 <dbl>
It’s a little tricky to see how this works from the first 10 rows, so let’s look at, for example, all eight characters around the “3” in the third column of the third row, and their row and column positions in the grid:
# A tibble: 8 × 3
value2 row2 col2
<chr> <dbl> <dbl>
1 . 2 2
2 . 2 3
3 * 2 4
4 . 3 2
5 5 3 4
6 . 4 2
7 . 4 3
8 . 4 4
We only need to know which values are adjacent to symbols, so we discard the the adjectent elements (value2
) that are either digits or “.”:
# A tibble: 12 × 12
row value col is_digit number_id part_number row2 col2 value2 is_digit2
<int> <chr> <int> <lgl> <chr> <dbl> <dbl> <dbl> <chr> <lgl>
1 1 7 3 TRUE 1.1 467 2 4 * FALSE
2 3 3 3 TRUE 3.2 35 2 4 * FALSE
3 3 5 4 TRUE 3.2 35 2 4 * FALSE
4 3 6 7 TRUE 3.4 633 4 7 # FALSE
5 3 3 8 TRUE 3.4 633 4 7 # FALSE
6 5 7 3 TRUE 5.1 617 5 4 * FALSE
7 7 2 5 TRUE 7.2 592 6 6 + FALSE
8 8 7 7 TRUE 8.2 755 9 6 * FALSE
9 10 6 3 TRUE 10.2 664 9 4 $ FALSE
10 10 4 4 TRUE 10.2 664 9 4 $ FALSE
11 10 5 6 TRUE 10.4 598 9 6 * FALSE
12 10 9 7 TRUE 10.4 598 9 6 * FALSE
# ℹ 2 more variables: number_id2 <chr>, part_number2 <dbl>
Almost there. Some part numbers have more than one character adjacent to a symbol, e.g. both the 3 and 5 of the 35 are adjacent to the *, so we need to discard duplicate part numbers, then sum over them:
[1] 4361
Done! I’ll come back to my input at the end of this post, putting everything together, once we’ve gone through both parts.
A gear is any *
symbol that is adjacent to exactly two part numbers. Its gear ratio is the result of multiplying those two numbers together. Find the sum of all our gear ratios.
Let’s go back to g
and this time start by find everything that’s next to a gear. I’m also going to discard columns that we don’t need (so all columns get printed):
# A tibble: 24 × 8
row col number_id row2 col2 value2 number_id2 part_number2
<int> <int> <chr> <dbl> <dbl> <chr> <chr> <dbl>
1 2 4 2.1 1 3 7 1.1 467
2 2 4 2.1 1 4 . 1.2 NA
3 2 4 2.1 1 5 . 1.2 NA
4 2 4 2.1 2 3 . 2.1 NA
5 2 4 2.1 2 5 . 2.1 NA
6 2 4 2.1 3 3 3 3.2 35
7 2 4 2.1 3 4 5 3.2 35
8 2 4 2.1 3 5 . 3.3 NA
9 5 4 5.2 4 3 . 4.1 NA
10 5 4 5.2 4 4 . 4.1 NA
# ℹ 14 more rows
We only need the gears that are next to numbers:
# A tibble: 7 × 8
row col number_id row2 col2 value2 number_id2 part_number2
<int> <int> <chr> <dbl> <dbl> <chr> <chr> <dbl>
1 2 4 2.1 1 3 7 1.1 467
2 2 4 2.1 3 3 3 3.2 35
3 2 4 2.1 3 4 5 3.2 35
4 5 4 5.2 5 3 7 5.1 617
5 9 6 9.1 8 7 7 8.2 755
6 9 6 9.1 10 6 5 10.4 598
7 9 6 9.1 10 7 9 10.4 598
There’s some duplication here, for example the gear in the 2nd row is next to both the “3” and the “5”, so we filter appropriately, then summarise to create the information we’ll need to filter on and use in computation later:
# A tibble: 3 × 4
row col n_adjacent_numbers gear_ratio
<int> <int> <int> <dbl>
1 2 4 2 16345
2 5 4 1 617
3 9 6 2 451490
.groups
is an experiment feature in dplyr that controls the grouping structure of the output, and "drop"
says all grouping levels are dropped. I wondered why David had used group_by
rather than .by
within summarise, but it turns out you can’t use both .by
and .groups
in the same call.
We finish by finding the gears (i.e. where the *
is adjacent to just two parts), then summing their ratios:
Let’s put that all together, using my input.
g |>
filter(value == "*") |>
adventdrob::adjacent_join(g, diagonal = TRUE) |>
arrange(row, col) |>
filter(!is.na(part_number2)) |>
distinct(row, col, number_id2, .keep_all = TRUE) |>
group_by(row, col) |>
summarise(n_adjacent_numbers = n(),
gear_ratio = prod(part_number2),
.groups = "drop") |>
filter(n_adjacent_numbers == 2) |>
summarise(sum_gear_ratios = sum(gear_ratio)) |>
pull(sum_gear_ratios)
[1] 72553319
─ Session info ───────────────────────────────────────────────────────────────
setting value
version R version 4.3.2 (2023-10-31)
os macOS Sonoma 14.1
system aarch64, darwin20
ui X11
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz Europe/London
date 2023-12-06
pandoc 3.1.1 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/ (via rmarkdown)
quarto 1.4.515 @ /usr/local/bin/quarto
─ Packages ───────────────────────────────────────────────────────────────────
package * version date (UTC) lib source
adventdrob * 0.0.1 2023-11-19 [1] Github (dgrtwo/adventdrob@491050f)
aochelpers * 0.1.0.9000 2023-12-06 [1] local
dplyr * 1.1.4 2023-11-17 [1] CRAN (R 4.3.1)
forcats * 1.0.0 2023-01-29 [1] CRAN (R 4.3.0)
ggplot2 * 3.4.4 2023-10-12 [1] CRAN (R 4.3.1)
lubridate * 1.9.3 2023-09-27 [1] CRAN (R 4.3.1)
purrr * 1.0.2 2023-08-10 [1] CRAN (R 4.3.0)
readr * 2.1.4 2023-02-10 [1] CRAN (R 4.3.0)
sessioninfo * 1.2.2 2021-12-06 [1] CRAN (R 4.3.0)
stringr * 1.5.1 2023-11-14 [1] CRAN (R 4.3.1)
tibble * 3.2.1 2023-03-20 [1] CRAN (R 4.3.0)
tidyr * 1.3.0 2023-01-24 [1] CRAN (R 4.3.0)
tidyverse * 2.0.0 2023-02-22 [1] CRAN (R 4.3.0)
[1] /Users/ellakaye/Library/R/arm64/4.3/library
[2] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library
──────────────────────────────────────────────────────────────────────────────
David actually did something much fiddlier to get the same thing. It was Tan Ho, in the Advent of Code channel on the R4DS Slack, who later suggested this much neater option.↩︎
---
title: "2023: Day 3"
date: 2023-12-3
author:
- name: Ella Kaye
categories: [tidyverse, grids, adventdrob, ⭐⭐]
draft: false
---
## Setup
[The original challenge](https://adventofcode.com/2023/day/3)
[My data](input){target="_blank"}
## Part 1
The input for this puzzle represents a grid.
When I was writing the [**aochelpers**](https://ellakaye.github.io/aochelpers) package,
I came across David Robinson's [**adventdrob**](https://github.com/dgrtwo/adventdrob) package,
which contains functions that he finds useful for working with grids in Advent of Code challenges,
in particular `grid_tidy()` and `adjacent_join()`.
I have struggled with grid puzzles in the past, so when I saw today's,
I immediately thought of using these functions.
I didn't have much time today to dedicate to Advent of Code,
so instead of attempting to solve the puzzle myself,
instead I decided to spend that time unpicking David's [solution](https://fosstodon.org/@drob/111514861427443202).
He had already posted it, and, unsurprisingly, used his package to solve it.
I hope that this makes it easier for me to approach future Advent of Code grid puzzles.
Since I'm writing up this post as a record of understanding someone else's code,
I'm going to work through it with the example input:
```
467..114..
...*......
..35..633.
......#...
617*......
.....+.58.
..592.....
......755.
...$.*....
.664.598..
```
::: {.callout-note collapse="false" icon="false"}
## The crux of the puzzle
Find the sum of all the numbers that are adjacent (including diagonally) to a symbol.
:::
First, I used the "paste as tribble" RStudio addin from the [**datapasta**](https://milesmcbain.github.io/datapasta/) package get the example input into R:
```{r}
input <- tibble::tribble(
~ X1,
"467..114..",
"...*......",
"..35..633.",
"......#...",
"617*......",
".....+.58.",
"..592.....",
"......755.",
"...$.*....",
".664.598.."
)
```
::: {.callout-warning}
The **datapasta** addin turns the first row into a column header,
so I needed to adjust that.
I called it `X1` as that'll be the colname when reading in the full input with `aoc_input_data_frame()`,
Also, and this really caught me out,
it interprets `#` as signifying a comment, so doesn't copy that,
or anything to its right. That meant row 4 was pasted as `"......"`
and I had to write in the rest of the line.
:::
I'll break down David's solution more-or-less line by line.
`grid_tidy()` takes a data frame and, for a given column,
produces a data frame with one row per character,
with the value of that character, and a column for each of its row and column positions in the grid:
```{r}
library(tidyverse)
library(adventdrob)
g <- input |>
adventdrob::grid_tidy(X1)
g
```
We need to be able to identify the numbers.
At various point in what follows, that means both for individual characters,
but also being able to know what part number they're part of.
The last line of the following is a clever trick for identifying which digits group together:
```{r}
g <- g |>
mutate(is_digit = str_detect(value, "\\d")) |>
group_by(row) |>
mutate(number_id = paste0(row, ".", consecutive_id(is_digit)))
g
```
The new `consecutive_id` function from **dplyr** is doing a lot of heavy lifting here.
It generates a unique identifier that increments every time a variable (or combination of variables) changes.^[David actually did something much fiddlier to get the same thing.
It was Tan Ho, in the Advent of Code channel on the R4DS Slack,
who later suggested this much neater option.]
Now that we know which digits belong together,
we need to keep track of what the actual part number is:
```{r}
g <- g |>
group_by(number_id) |>
mutate(part_number = as.numeric(paste(value, collapse = ""))) |>
ungroup()
g
```
This is the first time I'd seen this this trick to combine the values in several rows into one, and I feel like it's a good one to know now!
This data frame, `g`, is what we need for both parts,
so let's leave it as that, and use it to start a new data frame for Part 1.
We want to find the part numbers that are adjacent to symbols.
So first, we narrow our attention just to the rows that represent part numbers, then look at what's around them.
`adjacent_join()` from **adventdrob** is designed exactly for this:
```{r}
p1 <- g |>
filter(!is.na(part_number)) |>
adventdrob::adjacent_join(g, diagonal = TRUE) |>
arrange(row, col)
p1
```
It's a little tricky to see how this works from the first 10 rows, so let's look at, for example, all eight characters around the "3" in the third column of the third row, and their row and column positions in the grid:
```{r}
p1 |>
filter(row == 3 & col == 3) |>
select(value2, row2, col2)
```
We only need to know which values are adjacent to symbols,
so we discard the the adjectent elements (`value2`) that are either digits or ".":
```{r}
p1 <- p1 |>
filter(value2 != ".", !is_digit2)
p1
```
Almost there. Some part numbers have more than one character adjacent to a symbol,
e.g. both the 3 and 5 of the 35 are adjacent to the *, so we need to discard duplicate part numbers, then sum over them:
```{r}
p1 |>
distinct(number_id, .keep_all = TRUE) |>
summarise(sum_part_numbers = sum(part_number)) |>
pull(sum_part_numbers)
```
Done! I'll come back to my input at the end of this post,
putting everything together,
once we've gone through both parts.
## Part 2
::: {.callout-note collapse="false" icon="false"}
## The crux of the puzzle
A **gear** is any `*` symbol that is adjacent to **exactly two part numbers**. Its **gear ratio** is the result of multiplying those two numbers together. Find the sum of all our gear ratios.
:::
Let's go back to `g` and this time start by find everything that's next to a gear.
I'm also going to discard columns that we don't need (so all columns get printed):
```{r}
p2 <- g |>
filter(value == "*") |>
adventdrob::adjacent_join(g, diagonal = TRUE) |>
select(-value, -part_number, -is_digit, -is_digit2)
p2
```
We only need the gears that are next to numbers:
```{r}
p2 <- p2 |>
arrange(row, col) |>
filter(!is.na(part_number2))
p2
```
There's some duplication here,
for example the gear in the 2nd row is next to both the "3" and the "5",
so we filter appropriately, then summarise to create the information we'll need to filter on and use in computation later:
```{r}
p2 <- p2 |>
distinct(number_id2, .keep_all = TRUE) |>
group_by(row, col) |>
summarise(n_adjacent_numbers = n(),
gear_ratio = prod(part_number2),
.groups = "drop")
p2
```
`.groups` is an experiment feature in **dplyr** that controls the grouping structure of the output, and `"drop"` says all grouping levels are dropped.
I wondered why David had used `group_by` rather than `.by` within summarise,
but it turns out you can't use both `.by` and `.groups` in the same call.
We finish by finding the gears (i.e. where the `*` is adjacent to just two parts),
then summing their ratios:
```{r}
p2 |>
filter(n_adjacent_numbers == 2) |>
summarise(sum_gear_ratios = sum(gear_ratio)) |>
pull(sum_gear_ratios)
```
## My input
Let's put that all together, using my input.
```{r}
#| echo: false
OK <- "2023" < 3000
# Will only evaluate next code block if an actual year has been substituted for the placeholder
```
```{r}
#| eval: !expr OK
library(aochelpers)
input <- aoc_input_data_frame(3, 2023)
```
```{r}
g <- input |>
adventdrob::grid_tidy(X1) |>
mutate(is_digit = str_detect(value, "\\d")) |>
group_by(row) |>
mutate(number_id = paste0(row, ".", consecutive_id(is_digit))) |>
group_by(number_id) |>
mutate(part_number = as.numeric(paste(value, collapse = ""))) |>
ungroup()
```
### Part 1
```{r}
g |>
filter(!is.na(part_number)) |>
adventdrob::adjacent_join(g, diagonal = TRUE) |>
arrange(row, col) |>
filter(value2 != ".", !is_digit2) |>
distinct(number_id, .keep_all = TRUE) |>
summarise(sum_part_numbers = sum(part_number)) |>
pull(sum_part_numbers)
```
### Part 2
```{r}
g |>
filter(value == "*") |>
adventdrob::adjacent_join(g, diagonal = TRUE) |>
arrange(row, col) |>
filter(!is.na(part_number2)) |>
distinct(row, col, number_id2, .keep_all = TRUE) |>
group_by(row, col) |>
summarise(n_adjacent_numbers = n(),
gear_ratio = prod(part_number2),
.groups = "drop") |>
filter(n_adjacent_numbers == 2) |>
summarise(sum_gear_ratios = sum(gear_ratio)) |>
pull(sum_gear_ratios)
```
##### Session info {.appendix}
<details><summary>Toggle</summary>
```{r}
#| echo: false
library(sessioninfo)
# save the session info as an object
pkg_session <- session_info(pkgs = "attached")
# get the quarto version
quarto_version <- system("quarto --version", intern = TRUE)
# inject the quarto info
pkg_session$platform$quarto <- paste(
system("quarto --version", intern = TRUE),
"@",
quarto::quarto_path()
)
# print it out
pkg_session
```
</details>