我有一个看起来像这样的数据集:
site lat long
bras2 41.21 -115.11
tex4 45.3 -112.31
bras2 41.15 -115.15
bras2 41.12 -115.19
对于具有相同site
名称的样本,我想计算它们的中心点,然后将其作为列添加到数据集中。某些site
名称重复两次,其他重复三次,其他复制四次。
像这样:
site lat long centre_lat centre_long
bras2 41.21 -115.11 value here value here
tex4 45.3 -112.31 45.3 -112.31
bras2 41.15 -115.15 value here value here
bras2 41.12 -115.19 value here value here
我该怎么做?
如果您使用的是空间数据,您应该考虑使用sf
包。它可以很好地处理几何图形和函数。
下面的代码显示同时使用sf::st_centroid
和geosphere::centroid
。我更喜欢sf
的做事方式。
df <- read.table(header=TRUE, text= "site lat long
bras2 41.21 -115.11
tex4 45.3 -112.31
bras2 41.15 -115.15
bras2 41.12 -115.19")
library(dplyr)
library(geosphere)
library(sf)
# Using sf's st_centroid
df_sf <- st_as_sf(df, coords = c('long', 'lat'))
centroids_sf <- df_sf %>%
group_by(site) %>%
summarize(geometry = st_union(geometry)) %>%
st_centroid
# Using geosphere::centroid
centroids_geoshpere <- df_sf %>%
group_by(site) %>%
filter(n() >2) %>% ## geosphere needs polygons therefore 3+ points
st_union() %>%
st_cast('POLYGON') %>%
as('Spatial') %>% # geoshpere expects SpatialPolygons objects
centroid()
centroids_geoshpere
#> [,1] [,2]
#> [1,] -115.15 41.16001
centroids_sf
#> Simple feature collection with 2 features and 1 field
#> geometry type: POINT
#> dimension: XY
#> bbox: xmin: -115.15 ymin: 41.16 xmax: -112.31 ymax: 45.3
#> CRS: NA
#> # A tibble: 2 x 2
#> site geometry
#> * <chr> <POINT>
#> 1 bras2 (-115.15 41.16)
#> 2 tex4 (-112.31 45.3)
看起来他们离同一点已经足够近了。我不认为geosphere::centroid
可以给出单个点的质心,但可能是错误的。sf::st_centroid
对1,2或更多点没有问题。由reprex包(v0.3.0(于2020-12-20创建
在使用gsub
去掉站点编号后,可以使用ave
计算按站点名称分组的平均值。
within(dat, {
g <- gsub("\d", "", site)
mid.lat <- ave(lat, g)
mid.long <- ave(long, g)
rm(g)
})
# site lat long mid.long mid.lat
# 1 bras2 41.21 -115.11 -115.150 41.160
# 2 tex4 45.30 -112.31 -112.310 45.300
# 3 bras2 41.15 -115.15 -115.150 41.160
# 4 bras2 41.12 -115.19 -115.150 41.160
# 5 foo1 42.10 -123.10 -123.225 42.225
# 6 foo2 42.20 -123.20 -123.225 42.225
# 7 foo11 42.30 -123.30 -123.225 42.225
# 8 foo12 42.30 -123.30 -123.225 42.225
或者,如果您依赖NA
:
within(dat, {
g <- gsub("\d", "", site)
n <- ave(site, g, FUN=length)
mid.lat <- NA
mid.long <- NA
mid.lat[n > 1] <- ave(lat[n > 1], g[n > 1])
mid.long[n > 1] <- ave(long[n > 1], g[n > 1])
rm(g, n)
})
# site lat long mid.long mid.lat
# 1 bras2 41.21 -115.11 -115.150 41.160
# 2 tex4 45.30 -112.31 NA NA
# 3 bras2 41.15 -115.15 -115.150 41.160
# 4 bras2 41.12 -115.19 -115.150 41.160
# 5 foo1 42.10 -123.10 -123.225 42.225
# 6 foo2 42.20 -123.20 -123.225 42.225
# 7 foo11 42.30 -123.30 -123.225 42.225
# 8 foo12 42.30 -123.30 -123.225 42.225
数据:
dat <- structure(list(site = c("bras2", "tex4", "bras2", "bras2", "foo1",
"foo2", "foo11", "foo12"), lat = c(41.21, 45.3, 41.15, 41.12,
42.1, 42.2, 42.3, 42.3), long = c(-115.11, -112.31, -115.15,
-115.19, -123.1, -123.2, -123.3, -123.3)), class = "data.frame", row.names = c(NA,
-8L))
geosphere包有一个函数centroid
来解决此类问题
只要形状上有多个点,它就是直的。下面的大部分代码都涉及到处理上面示例中的单点情况。
df <- read.table(header=TRUE, text= "site lat long
bras2 41.21 -115.11
tex4 45.3 -112.31
bras2 41.15 -115.15
bras2 41.12 -115.19")
library(dplyr)
library(geosphere)
df %>% group_by(side) %>% centroid(.[ ,c(3,2)])
sites <- split(df, df$site)
results <-lapply(sites, function(x) {
if(nrow(x)>1 ) {
value <- as.data.frame(centroid(x[, c(3,2)]))
}
else {
value <- x[1, c(3,2)]
names(value) <- c("lon", "lat")
}
value$site <- x$site[1]
value
})
answer<-bind_rows(results)
lon lat site
1 -115.15 41.16001 bras2
2 -112.31 45.30000 tex4