我有多个地址要组合在一起并为其创建计数。然而,它们的格式各不相同。我已经对地址进行了地理编码,并计划使用地理编码对其进行分组。然而,在对它们进行分组时,我想创建一个新的变量,该变量至少保留一个版本的地址(或者组中每个地址都有多个变量,格式很宽,但我会为每个保留一个地址的组只保留一个变量(。
以下是一些示例数据。
address=c("big fake plaza, 12 this street,district, city",
"Green mansion, district, city",
"Block 7 of orange building district, city",
"98 main street block a blue plaza, city",
"blue red mansion, 46 pearl street, city",
"12 this street, big fake plaza, district, city",
"Green mansion, district, city",
"orange building Block 7 district, city",
"block a 98 main street blue plaza, city",
"blue red mansion, 46 pearl street, city"
"big fake plaza, district, city",
"Green mansion,city")
long =c("112.8838", "111.9154", "114.9318", "116.9318", "112.9320","111.9324",
"112.8838", "111.9154", "114.9318", "116.9318", "112.9320","111.9324",
"112.8838", "111.9154")
lat = c("21.22177", "12.22177", "26.27743", "23.17651", "23.24769", "23.24771",
"21.22177", "12.22177", "26.27743", "23.17651", "23.24769", "23.24771",
"21.22177", "12.22177")
df<-cbind(address, lat, long)
我要做的是分组和计数,但不知道如何变异,只根据组中的一个地址创建命名变量。
df_agg<- df %>%
group_by(long,lat) %>%
summarise(count = n()) %>%
mutate(bldg = ifelse(address[address==1],address, NA )) ???????
我想让它看起来像这个
long lat count bldg
<dbl> <dbl> <int> <chr>
1 112. 21.2 3 "big fake plaza, 12 this street,district, city"
2 114. 12.2 3 "Green mansion, district, city"
3 116. 26.3 2 "98 main street block a blue plaza, city"
4 112. 23.5 2 "Block 7 of orange building district, city"
5 111. 23.5 2 "blue red mansion, 46 pearl street, city"
显然,我们不能对地址名进行分组,因为字符串之间存在差异。如果有更好的选择,很高兴听到任何其他建议。如果我们可以创建新的变量bldg1-blgd2-ect。对于每个组中的每个不同的建筑名称来说,这是很好的,但不是优先事项。
提前谢谢。
您可以在每个位置选择第一个地址。
library(dplyr)
library(tidyr)
df %>%
group_by(long,lat) %>%
summarise(count = n(),
address = first(address)) %>%
ungroup
# long lat count address
# <chr> <chr> <int> <chr>
#1 111.9154 12.22177 3 Green mansion, district, city
#2 111.9324 23.24771 2 12 this street, big fake plaza, district, city
#3 112.8838 21.22177 3 big fake plaza, 12 this street,district, city
#4 112.9320 23.24769 2 blue red mansion, 46 pearl street, city
#5 114.9318 26.27743 2 Block 7 of orange building district, city
#6 116.9318 23.17651 2 98 main street block a blue plaza, city
如果要创建单独的列,如bldg1
、bldg2
等,请以宽格式转换数据。
df %>%
group_by(long,lat) %>%
mutate(row = paste0('bldg', row_number()),
count = n()) %>%
ungroup %>%
pivot_wider(names_from = row, values_from = address)