r-计算数据帧中多个长点的中心点



我有一个看起来像这样的数据集:

site   lat      long 
bras2  41.21   -115.11
tex4   45.3    -112.31
bras2  41.15   -115.15 
bras2  41.12   -115.19

对于具有相同site名称的样本,我想计算它们的中心点,然后将其作为列添加到数据集中。某些site名称重复两次,其他重复三次,其他复制四次。

像这样:

site   lat      long    centre_lat  centre_long 
bras2  41.21   -115.11  value here     value here
tex4   45.3    -112.31  45.3           -112.31 
bras2  41.15   -115.15  value here     value here
bras2  41.12   -115.19  value here     value here

我该怎么做?

如果您使用的是空间数据,您应该考虑使用sf包。它可以很好地处理几何图形和函数。

下面的代码显示同时使用sf::st_centroidgeosphere::centroid。我更喜欢sf的做事方式。

df <- read.table(header=TRUE, text= "site   lat      long 
bras2  41.21   -115.11
tex4   45.3    -112.31
bras2  41.15   -115.15 
bras2  41.12   -115.19")

library(dplyr)
library(geosphere)
library(sf)
# Using sf's st_centroid
df_sf <- st_as_sf(df, coords = c('long', 'lat'))
centroids_sf <- df_sf %>%
group_by(site) %>% 
summarize(geometry = st_union(geometry)) %>% 
st_centroid

# Using geosphere::centroid
centroids_geoshpere <- df_sf %>%
group_by(site) %>%
filter(n() >2)  %>% ## geosphere needs polygons therefore 3+ points
st_union() %>%
st_cast('POLYGON') %>%
as('Spatial') %>% # geoshpere expects SpatialPolygons objects
centroid() 

centroids_geoshpere
#>         [,1]     [,2]
#> [1,] -115.15 41.16001
centroids_sf
#> Simple feature collection with 2 features and 1 field
#> geometry type:  POINT
#> dimension:      XY
#> bbox:           xmin: -115.15 ymin: 41.16 xmax: -112.31 ymax: 45.3
#> CRS:            NA
#> # A tibble: 2 x 2
#>   site         geometry
#> * <chr>         <POINT>
#> 1 bras2 (-115.15 41.16)
#> 2 tex4   (-112.31 45.3)

看起来他们离同一点已经足够近了。我不认为geosphere::centroid可以给出单个点的质心,但可能是错误的。sf::st_centroid对1,2或更多点没有问题。由reprex包(v0.3.0(于2020-12-20创建

在使用gsub去掉站点编号后,可以使用ave计算按站点名称分组的平均值。

within(dat, {
g <- gsub("\d", "", site)
mid.lat <- ave(lat, g)
mid.long <- ave(long, g)
rm(g)
})
#    site   lat    long mid.long mid.lat
# 1 bras2 41.21 -115.11 -115.150  41.160
# 2  tex4 45.30 -112.31 -112.310  45.300
# 3 bras2 41.15 -115.15 -115.150  41.160
# 4 bras2 41.12 -115.19 -115.150  41.160
# 5  foo1 42.10 -123.10 -123.225  42.225
# 6  foo2 42.20 -123.20 -123.225  42.225
# 7 foo11 42.30 -123.30 -123.225  42.225
# 8 foo12 42.30 -123.30 -123.225  42.225

或者,如果您依赖NA:

within(dat, {
g <- gsub("\d", "", site)
n <- ave(site, g, FUN=length)
mid.lat <- NA
mid.long <- NA
mid.lat[n > 1] <- ave(lat[n > 1], g[n > 1])
mid.long[n > 1] <- ave(long[n > 1], g[n > 1])
rm(g, n)
})
#    site   lat    long mid.long mid.lat
# 1 bras2 41.21 -115.11 -115.150  41.160
# 2  tex4 45.30 -112.31       NA      NA
# 3 bras2 41.15 -115.15 -115.150  41.160
# 4 bras2 41.12 -115.19 -115.150  41.160
# 5  foo1 42.10 -123.10 -123.225  42.225
# 6  foo2 42.20 -123.20 -123.225  42.225
# 7 foo11 42.30 -123.30 -123.225  42.225
# 8 foo12 42.30 -123.30 -123.225  42.225

数据:

dat <- structure(list(site = c("bras2", "tex4", "bras2", "bras2", "foo1", 
"foo2", "foo11", "foo12"), lat = c(41.21, 45.3, 41.15, 41.12, 
42.1, 42.2, 42.3, 42.3), long = c(-115.11, -112.31, -115.15, 
-115.19, -123.1, -123.2, -123.3, -123.3)), class = "data.frame", row.names = c(NA, 
-8L))

geosphere包有一个函数centroid来解决此类问题
只要形状上有多个点,它就是直的。下面的大部分代码都涉及到处理上面示例中的单点情况。

df <- read.table(header=TRUE, text= "site   lat      long 
bras2  41.21   -115.11
tex4   45.3    -112.31
bras2  41.15   -115.15 
bras2  41.12   -115.19")

library(dplyr)
library(geosphere)
df %>% group_by(side) %>% centroid(.[ ,c(3,2)])
sites <- split(df, df$site)
results <-lapply(sites, function(x) {
if(nrow(x)>1 ) {
value <- as.data.frame(centroid(x[, c(3,2)]))
}
else {
value <- x[1, c(3,2)]
names(value) <- c("lon", "lat")
}
value$site <- x$site[1]
value
})
answer<-bind_rows(results)
lon      lat  site
1 -115.15 41.16001 bras2
2 -112.31 45.30000  tex4

相关内容

  • 没有找到相关文章

最新更新