我使用tidyr::separate_rows
将我的数据分成行,但我有长度不等的字符串。参见下面的示例数据:
`
Df <- data.frame(Id <- c(1, 2, 3),
suburb <- (‘orange, yellow’, ‘blue’, ‘green, yellow’),
postcodes <- (’a9, b9’, ‘c9, a9’, b9', ‘d9, b9, a9’))`
是否有一种方法可以在邮政编码列中删除字符串,如果它们超过了郊区列的字符串长度?例如,如果我有一个suburb
和3个postcodes
,是否可以删除其他2个多余的邮政编码?我已经搜索了其他答案,但没有找到任何类似的。
我认为数据是
Df <- structure(list(Id = c(1, 2, 3), suburb = c("orange, yellow", "blue", "green, yellow"), postcodes = c("a9, b9", "c9, a9", "d9, b9, a9")), class = "data.frame", row.names = c(NA, -3L))
Df
# Id suburb postcodes
# 1 1 orange, yellow a9, b9
# 2 2 blue c9, a9
# 3 3 green, yellow d9, b9, a9
str(Df)
# 'data.frame': 3 obs. of 3 variables:
# $ Id : num 1 2 3
# $ suburb : chr "orange, yellow" "blue" "green, yellow"
# $ postcodes: chr "a9, b9" "c9, a9" "d9, b9, a9"
如果是这样的话,如果你打算在不平衡的情况下扩展suburb
<—>postcodes
,那么
Df %>%
mutate(
across(c(suburb, postcodes), ~ lapply(strsplit(., "[,\s]+"), trimws)),
stuff = Map(crossing, suburb=suburb, postcodes=postcodes)
) %>%
select(Id, stuff) %>%
unnest(stuff)
# # A tibble: 12 x 3
# Id suburb postcodes
# <dbl> <chr> <chr>
# 1 1 orange a9
# 2 1 orange b9
# 3 1 yellow a9
# 4 1 yellow b9
# 5 2 blue a9
# 6 2 blue c9
# 7 3 green a9
# 8 3 green b9
# 9 3 green d9
# 10 3 yellow a9
# 11 3 yellow b9
# 12 3 yellow d9