我有数据集
x <- data.frame(Postcode = c(1, 2, 3, 4, 5, 6),
Latitude = c(3.1, 3.2, 3.3, 3.3, 3.4, 3.4),
Longitude = c(100, 101, 102, 102, 103, 104),
Exposure = c(1, 2, 3, 4, 5, 6))
我正在尝试操纵x内的数据
x <- data.frame(Postcode = c(1, 2, 3, 4, 5, 6),
Latitude = c(3.1, 3.2, 3.3, 3.3, 3.4, 3.4),
Longitude = c(100, 101, 102, 102, 103, 104),
Exposure = c(1, 2, 3, 4, 5, 6),
coords = c("3.1, 100", "3.2, 101", "3.3, 102", "3.3, 102",
"3.4, 103", "3.4, 104"),
postcode = c("1", "2", "3,4", "3,4", "5", "6"),
exposure = c(1, 2, 7, 7, 5, 6))
新列postcode
将粘贴具有相同Latitude
和Longitude
的Postcode
。coords
将粘贴Latitude
和Longitude
,而exposure
将总和具有相同coords
的Exposure
,即相同的Latitude
和Longitude
。
我可以使用dplyr
软件包和for
循环
x <- mutate(x, coords = paste(Latitude, Longitude, sep = ", "))
x <- cbind(x, postcode = rep(0, nrow(x)), exposure = rep(0, nrow(x)))
for(i in unique(x$coords)){
x$postcode[x$coords == i] <- paste(x$Postcode[x$coords == i], collapse = ", ")
x$exposure[x$coords == i] <- sum(x$Exposure[x$coords == i])
}
如何仅使用dplyr
软件包,而不使用for
循环来实现这一目标?也许其他方法比使用for
循环更有效,因为我的实际数据集很大
library(dplyr)
library(tidyr) # unite() was used to join Lat, Lon
x %>% unite(coords, Latitude, Longitude, sep = ",", remove = FALSE) %>%
group_by(coords) %>% mutate(exposure = sum(Postcode), postcode = toString(Postcode))
这是您可以使用dplyr
:
library(dplyr)
x %>%
group_by(coords = paste(Latitude, Longitude, sep = ", ")) %>%
mutate(postcode = toString(Postcode), exposure = sum(Exposure))
# Source: local data frame [6 x 7]
# Groups: coords [5]
#
# Postcode Latitude Longitude Exposure coords postcode exposure
# <dbl> <dbl> <dbl> <dbl> <chr> <chr> <dbl>
# 1 1 3.1 100 1 3.1, 100 1 1
# 2 2 3.2 101 2 3.2, 101 2 2
# 3 3 3.3 102 3 3.3, 102 3, 4 7
# 4 4 3.3 102 4 3.3, 102 3, 4 7
# 5 5 3.4 103 5 3.4, 103 5 5
# 6 6 3.4 104 6 3.4, 104 6 6
我们可以使用 data.table
library(data.table)
setDT(x)[, coords := paste(Latitude, Longitude, sep="," )
][, c("exposure", "postcode") :=.(sum(Postcode), toString(Postcode)), coords]
x
# Postcode Latitude Longitude Exposure coords exposure postcode
#1: 1 3.1 100 1 3.1,100 1 1
#2: 2 3.2 101 2 3.2,101 2 2
#3: 3 3.3 102 3 3.3,102 7 3, 4
#4: 4 3.3 102 4 3.3,102 7 3, 4
#5: 5 3.4 103 5 3.4,103 5 5
#6: 6 3.4 104 6 3.4,104 6 6