r语言 - Tidyr语言 - 分隔不相等长度的行-删除额外的字符串



我使用tidyr::separate_rows将我的数据分成行,但我有长度不等的字符串。参见下面的示例数据:

`

Df <- data.frame(Id <- c(1, 2, 3),
 suburb <- (‘orange, yellow’, ‘blue’, ‘green, yellow’),
 postcodes <- (’a9,  b9’, ‘c9, a9’, b9', ‘d9, b9, a9’))`

是否有一种方法可以在邮政编码列中删除字符串,如果它们超过了郊区列的字符串长度?例如,如果我有一个suburb和3个postcodes,是否可以删除其他2个多余的邮政编码?我已经搜索了其他答案,但没有找到任何类似的。

我认为数据是

Df <- structure(list(Id = c(1, 2, 3), suburb = c("orange, yellow", "blue", "green, yellow"), postcodes = c("a9,  b9", "c9, a9", "d9, b9, a9")), class = "data.frame", row.names = c(NA, -3L))
Df
#   Id         suburb  postcodes
# 1  1 orange, yellow    a9,  b9
# 2  2           blue     c9, a9
# 3  3  green, yellow d9, b9, a9
str(Df)
# 'data.frame': 3 obs. of  3 variables:
#  $ Id       : num  1 2 3
#  $ suburb   : chr  "orange, yellow" "blue" "green, yellow"
#  $ postcodes: chr  "a9,  b9" "c9, a9" "d9, b9, a9"

如果是这样的话,如果你打算在不平衡的情况下扩展suburb<—>postcodes,那么

Df %>%
mutate(
across(c(suburb, postcodes), ~ lapply(strsplit(., "[,\s]+"), trimws)),
stuff = Map(crossing, suburb=suburb, postcodes=postcodes)
) %>%
select(Id, stuff) %>%
unnest(stuff)
# # A tibble: 12 x 3
#       Id suburb postcodes
#    <dbl> <chr>  <chr>    
#  1     1 orange a9       
#  2     1 orange b9       
#  3     1 yellow a9       
#  4     1 yellow b9       
#  5     2 blue   a9       
#  6     2 blue   c9       
#  7     3 green  a9       
#  8     3 green  b9       
#  9     3 green  d9       
# 10     3 yellow a9       
# 11     3 yellow b9       
# 12     3 yellow d9       

最新更新