library(dplyr)
library(tidyr)
library(stringr)
2020: Day 4
Setup
Part 1
Using readr::read_tsv()
off the bat removes the blank lines, making it impossible to identify the different passports, but reading in the data via readLines()
then converting as_tibble()
preserves them, and then allows us to use tidyverse
functions for the remaining tidying. cumsum()
on a logical vectors takes advantage of FALSE
having a numeric value of zero and TRUE
having a numeric value of one.
<-
passports readLines(here::here("2020", "day", "4", "input")) %>%
as_tibble() %>%
separate_rows(value, sep = " ") %>%
mutate(new_passport = value == "") %>%
mutate(ID = cumsum(new_passport) + 1) %>%
filter(!new_passport) %>%
select(-new_passport) %>%
separate(value, c("key", "value"), sep = ":") %>%
relocate(ID)
Our data is now in three columns, with ID, key and value, so now we need to find the number of passports with all seven fields once cid
is excluded:
%>%
passports filter(key != "cid") %>%
count(ID) %>%
filter(n == 7) %>%
nrow()
[1] 210
Part 2: Valid passports
Now we need to add data validation checks:
- byr (Birth Year) - four digits; at least 1920 and at most 2002.
- iyr (Issue Year) - four digits; at least 2010 and at most 2020.
- eyr (Expiration Year) - four digits; at least 2020 and at most 2030.
- hgt (Height) - a number followed by either cm or in: - If cm, the number must be at least 150 and at most 193. - If in, the number must be at least 59 and at most 76.
- hcl (Hair Color) - a # followed by exactly six characters 0-9 or a-f.
- ecl (Eye Color) - exactly one of: amb blu brn gry grn hzl oth.
- pid (Passport ID) - a nine-digit number, including leading zeroes.
- cid (Country ID) - ignored, missing or not.
Ignoring the cid
field, we narrow down on passports that at least have the right number of fields, and extract the number from the hgt
column:
<- passports %>%
complete_passports filter(key != "cid") %>%
add_count(ID) %>%
filter(n == 7) %>%
select(-n) %>%
mutate(hgt_value = case_when(
== "hgt" ~ readr::parse_number(value),
key TRUE ~ NA_real_)) %>%
ungroup()
Then we create a check
column, which is TRUE
when the value for each key meets the required conditions. Those with 7 TRUE
s are valid. Note that with case_when()
we’ve left the check column as NA
when the condition is FALSE
, requiring na.rm = TRUE
in the call to sum()
. We can get round that by adding a final line to the case_when()
condition stating TRUE ~ FALSE
. TRUE
here is a catch-all for all remaining rows not covered by the conditions above, and then we set them to FALSE
, but I find the line TRUE ~ FALSE
unintuitive.
%>%
complete_passports mutate(check = case_when(
== "byr" & value >= 1920) & (key == "byr" & value <= 2002) ~ TRUE,
(key == "iyr" & value >= 2010) & (key == "iyr" & value <= 2020) ~ TRUE,
(key == "eyr" & value >= 2020) & (key == "eyr" & value <= 2030) ~ TRUE,
(key == "hgt" & str_detect(value, "cm") & hgt_value >= 150 & hgt_value <= 193 ~ TRUE,
key == "hgt" & str_detect(value, "in") & hgt_value >= 59 & hgt_value <= 76 ~ TRUE,
key == "hcl" & str_detect(value, "^#[a-f0-9]{6}$") ~ TRUE,
key == "ecl" & value %in% c("amb", "blu", "brn", "gry", "grn", "hzl", "oth") ~ TRUE,
key == "pid" & str_detect(value, "^[0-9]{9}$") ~ TRUE
key %>%
)) group_by(ID) %>%
summarise(check_all = sum(check, na.rm = TRUE)) %>%
filter(check_all == 7) %>%
nrow()
[1] 131
Session info
Toggle
─ Session info ───────────────────────────────────────────────────────────────
setting value
version R version 4.3.1 (2023-06-16)
os macOS Sonoma 14.0
system aarch64, darwin20
ui X11
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz Europe/London
date 2023-11-06
pandoc 3.1.1 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/ (via rmarkdown)
quarto 1.4.466 @ /usr/local/bin/quarto
─ Packages ───────────────────────────────────────────────────────────────────
package * version date (UTC) lib source
dplyr * 1.1.2 2023-04-20 [1] CRAN (R 4.3.0)
sessioninfo * 1.2.2 2021-12-06 [1] CRAN (R 4.3.0)
stringr * 1.5.0 2022-12-02 [1] CRAN (R 4.3.0)
tidyr * 1.3.0 2023-01-24 [1] CRAN (R 4.3.0)
[1] /Users/ellakaye/Library/R/arm64/4.3/library
[2] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library
──────────────────────────────────────────────────────────────────────────────