r-在行内跨多列进行比较,删除不匹配项并创建新行



我正在尝试对相同的地址进行计数并按行分组。我的距离相当近,但在特定地址的各栏之间存在细微差异。目的是从行中删除任何不匹配的地址,并将它们作为新行添加到df中。这些差异通常是街道编号或街区编号之间的差异。我已经从代码中提取了这些数字,并试图找到那些不匹配的数字,将它们删除,重新排列一行,并适当地更改计数。可以在之后更改计数,只需检查各行中是否有遗漏即可。

数据集实际上有5000行,其中一行最多有50栋建筑。这是一个样品。

df<-data.frame(bldg1 = c("26 this street, big district","block8, fancy estate, small district", "11 normal lane, district"),
bldg2 = c("27 this street, big district","block8, fancy estate, small district", "11 normal lane, district"),
bldg3 = c("26 this street, big district","block6, fancy estate, small district", "11 normal lane, district"),
bldg4 = c("26 this street, big district","block8, fancy estate, small district", NA),
bldg5 = c("26 this street, big district","block6, fancy estate, small district", "11 normal lane, district"),
bldg1strnum = c("26",NA, "11"),
bldg2strnum = c("27",NA, "11"),
bldg3strnum = c("26",NA, "11"),
bldg4strnum = c("26",NA, "11"),
bldg5strnum = c("26",NA, "11"),
bldg1blck = c(NA,"8", NA),
bldg2blck = c(NA,"8", NA),
bldg3blck = c(NA,"6", NA),
bldg4blck = c(NA,"8", NA),
bldg5blck = c(NA,"6", NA),
count = (5,5,4))

我曾想过将dplyracrosslength(unique)一起使用,但不知道如何正确运行它,尤其是如何将mutate转换为新行的长格式。

我喜欢的结局是这样的。(突变后无需街道编号和名称

df<-data.frame(bldg1 = c("26 this street, big district","block8, fancy estate, small district", "11 normal lane, district", "27 this street, big district","block6, fancy estate, small district"),
bldg2 = c(NA, "block8, fancy estate, small district", "11 normal lane, district",NA,"block6, fancy estate, small district"),
bldg3 = c("26 this street, big district",NA, "11 normal lane, district", NA, NA),
bldg4 = c("26 this street, big district","block8, fancy estate, small district", NA,NA,NA),
bldg5 = c("26 this street, big district",NA, "11 normal lane, district",NA,NA),
count = ("4","3","4","1","2"))

这就是您想要的:

df %>% 
select(bldg1, bldg2, bldg3, bldg4, bldg5) %>% 
pivot_longer(
cols = everything()
) %>% 
arrange(value) %>% 
add_count(value)

输出:

name  value                                    n
<chr> <chr>                                <int>
1 bldg1 11 normal lane, district                 4
2 bldg2 11 normal lane, district                 4
3 bldg3 11 normal lane, district                 4
4 bldg5 11 normal lane, district                 4
5 bldg1 26 this street, big district             4
6 bldg3 26 this street, big district             4
7 bldg4 26 this street, big district             4
8 bldg5 26 this street, big district             4
9 bldg2 27 this street, big district             1
10 bldg3 block6, fancy estate, small district     2
11 bldg5 block6, fancy estate, small district     2
12 bldg1 block8, fancy estate, small district     3
13 bldg2 block8, fancy estate, small district     3
14 bldg4 block8, fancy estate, small district     3
15 bldg4 NA                                       1

相关内容

最新更新