r-dplyr,如何根据代码对观察结果进行分组,计数并创建摘要变量,然后根据组内的名称添加新变量



我有多个地址要组合在一起并为其创建计数。然而,它们的格式各不相同。我已经对地址进行了地理编码,并计划使用地理编码对其进行分组。然而,在对它们进行分组时,我想创建一个新的变量,该变量至少保留一个版本的地址(或者组中每个地址都有多个变量,格式很宽,但我会为每个保留一个地址的组只保留一个变量(。

以下是一些示例数据。

address=c("big fake plaza, 12 this street,district, city", 
"Green mansion, district, city", 
"Block 7 of orange building  district, city",
"98 main street block a blue plaza, city",
"blue red mansion, 46 pearl street, city",
"12 this street, big fake plaza, district, city", 
"Green mansion, district, city", 
"orange building Block 7 district, city",
"block a 98 main street blue plaza, city",
"blue red mansion, 46 pearl street, city"
"big fake plaza, district, city", 
"Green mansion,city")
long =c("112.8838",  "111.9154", "114.9318",  "116.9318", "112.9320","111.9324",
"112.8838",  "111.9154", "114.9318",  "116.9318", "112.9320","111.9324",
"112.8838",  "111.9154")
lat = c("21.22177", "12.22177", "26.27743", "23.17651", "23.24769", "23.24771",
"21.22177", "12.22177", "26.27743", "23.17651", "23.24769", "23.24771",
"21.22177", "12.22177")
df<-cbind(address, lat, long)

我要做的是分组和计数,但不知道如何变异,只根据组中的一个地址创建命名变量。

df_agg<- df %>% 
group_by(long,lat) %>%
summarise(count = n()) %>%
mutate(bldg = ifelse(address[address==1],address, NA )) ???????

我想让它看起来像这个

long  lat  count    bldg
<dbl> <dbl> <int>   <chr>
1  112.  21.2     3    "big fake plaza, 12 this street,district, city"
2  114.  12.2     3    "Green mansion, district, city"
3  116.  26.3     2    "98 main street block a blue plaza, city"
4  112.  23.5     2    "Block 7 of orange building  district, city"
5  111.  23.5     2    "blue red mansion, 46 pearl street, city"

显然,我们不能对地址名进行分组,因为字符串之间存在差异。如果有更好的选择,很高兴听到任何其他建议。如果我们可以创建新的变量bldg1-blgd2-ect。对于每个组中的每个不同的建筑名称来说,这是很好的,但不是优先事项。

提前谢谢。

您可以在每个位置选择第一个地址。

library(dplyr)
library(tidyr)
df %>% 
group_by(long,lat) %>%
summarise(count = n(), 
address = first(address)) %>%
ungroup
#  long     lat      count address                                       
#  <chr>    <chr>    <int> <chr>                                         
#1 111.9154 12.22177     3 Green mansion, district, city                 
#2 111.9324 23.24771     2 12 this street, big fake plaza, district, city
#3 112.8838 21.22177     3 big fake plaza, 12 this street,district, city 
#4 112.9320 23.24769     2 blue red mansion, 46 pearl street, city       
#5 114.9318 26.27743     2 Block 7 of orange building  district, city    
#6 116.9318 23.17651     2 98 main street block a blue plaza, city      

如果要创建单独的列,如bldg1bldg2等,请以宽格式转换数据。

df %>% 
group_by(long,lat) %>%
mutate(row = paste0('bldg', row_number()), 
count = n()) %>%
ungroup %>%
pivot_wider(names_from = row, values_from = address)

相关内容

最新更新