r区分组空间点距分析

我有一个看起来像这样的数据集，虽然大得多

###   ##Fake data for stack exdb <- data.frame(zone =
c(1,1,1,2,2,2),   site = c("study", "collect", "collect", "study",
"collect", "collect"),   x = c(53.307726, 53.310660, 53.307089,
53.313831, 53.319087, 53.318792),   y = c(-6.222291, -6.217151, -6.215080, -6.214152, -6.218723, -6.215815))

我需要在研究站点和收集站点之间运行一个点分析，以查看以米为单位的距离。问题是我有许多不同的区域或组，它们都是独立的(即从区域1中的点到区域2中的点的距离是无关的)。

因此我需要做两件事，

点分析，以米为单位计算每个区域一个研究点与多个收集点之间的距离，

，然后写一个FOREACH或LOOP函数来计算数据集中每个组的这个距离。

最优输出应该是

exdb <- data.frame(zone = c(1,1,1,2,2,2),
site = c("study", "collect", "collect", "study", "collect", "collect"),
x = c(53.307726, 53.310660, 53.307089, 53.313831, 53.319087, 53.318792),
y = c(-6.222291, -6.217151, -6.215080, -6.214152, -6.218723, -6.215815),
dist = c(0, 10.3, 30.4, 0, 12.5, 11.2))

每个区域的研究地点总是0，因为它是到该地点的距离，并且到每个收集地点的距离只计算到每个独特区域的研究地点。

非常感谢。

杀

简单的Base R版本，不需要其他软件包。

从上面的exdb开始。

首先添加一个名为dist的新列，其值为"study"，因为计划在zone和site=="study"上进行自合并:

> exdb$dist = "study"

自合并，只保留坐标列:

> MM = merge(exdb, exdb,
by.x=c("zone","site"),
by.y=c("zone","dist"))[,c("x.x","y.x","x.y","y.y")]

使用distGeo覆盖dist列。保持整洁:

> exdb$dist = distGeo(MM[,2:1],MM[,4:3])
> exdb
zone    site        x         y     dist
1    1   study 53.30773 -6.222291   0.0000
2    1 collect 53.31066 -6.217151 473.2943
3    1 collect 53.30709 -6.215080 485.8806
4    2   study 53.31383 -6.214152   0.0000
5    2 collect 53.31909 -6.218723 659.5238
6    2 collect 53.31879 -6.215815 563.1349

返回与@wimpel相同的答案，但没有额外的依赖关系，并且代码行更少。

我还在学习空间方面，但这是否有效?

library(sf)
library(tidyverse)
exdb %>%
arrange(zone, desc(site)) %>% #ensure study is first
st_as_sf(coords = c("x", "y"), crs = 4326) %>%
group_by(zone) %>%
mutate(
study_coord = geometry[1],
dist = st_distance(geometry, study_coord, by_element = T),
)

也许是这样的?

假设x和y是纬度和经度，我们可以使用haversine函数在旋转表格后获得以米为单位的距离，使两个点在计算距离的行中(以米为单位):

library(tidyverse)
library(pracma)
#> 
#> Attaching package: 'pracma'
#> The following object is masked from 'package:purrr':
#> 
#>     cross
data <- data.frame(zone = c(1, 1, 1, 2, 2, 2), site = c(
"study", "collect", "collect", "study",
"collect", "collect"
), x = c(
53.307726, 53.310660, 53.307089,
53.313831, 53.319087, 53.318792
), y = c(-6.222291, -6.217151, -6.215080, -6.214152, -6.218723, -6.215815))
data %>%
pivot_wider(names_from = site, values_from = c(x, y)) %>%
unnest(y_collect, y_study, x_collect, x_study) %>%
mutate(
dist = list(x_study, y_study, x_collect, y_collect) %>% pmap_dbl(~haversine(c(..1, ..2), c(..3, ..4)) * 1000)
)
#> Warning: Values are not uniquely identified; output will contain list-cols.
#> * Use `values_fn = list` to suppress this warning.
#> * Use `values_fn = length` to identify where the duplicates arise
#> * Use `values_fn = {summary_fun}` to summarise duplicates
#> Warning: Values are not uniquely identified; output will contain list-cols.
#> * Use `values_fn = list` to suppress this warning.
#> * Use `values_fn = length` to identify where the duplicates arise
#> * Use `values_fn = {summary_fun}` to summarise duplicates
#> Warning: unnest() has a new interface. See ?unnest for details.
#> Try `df %>% unnest(c(y_collect, y_study, x_collect, x_study))`, with `mutate()` if needed
#> # A tibble: 4 x 6
#>    zone x_study x_collect y_study y_collect  dist
#>   <dbl>   <dbl>     <dbl>   <dbl>     <dbl> <dbl>
#> 1     1    53.3      53.3   -6.22     -6.22  472.
#> 2     1    53.3      53.3   -6.22     -6.22  484.
#> 3     2    53.3      53.3   -6.21     -6.22  659.
#> 4     2    53.3      53.3   -6.21     -6.22  563.

^{由reprex包(v2.0.1)于2021-09-13创建}

我相信这应该行得通。但我无法在期望的输出中再现您的距离。

library(data.table)
library(purrr) # Or tidyverse
library(geosphere)
# Make your data a data.table
setDT(mydata)
# Split to a list based on zone and site
L <- split(mydata, by = c("zone", "site"), flatten = FALSE)
# Loop over list
L <- lapply(L, function(zone) {
#get reference point to take dustance from
point.study <- c(zone$study$y,zone$study$x)
zone$study$dist <- 0
# Calculate distance
zone$collect$dist <- unlist(purrr::pmap( list(a = zone$collect$y, 
b = zone$collect$x ), 
~(geosphere::distGeo( point.study, c(..1, ..2)))))
return(zone)
})
# Rowbind the results together
data.table::rbindlist(lapply(L, data.table::rbindlist))
#    zone    site        x         y     dist
# 1:    1   study 53.30773 -6.222291   0.0000
# 2:    1 collect 53.31066 -6.217151 473.2943
# 3:    1 collect 53.30709 -6.215080 485.8806
# 4:    2   study 53.31383 -6.214152   0.0000
# 5:    2 collect 53.31909 -6.218723 659.5238
# 6:    2 collect 53.31879 -6.215815 563.1349

相关内容

最新更新

热门标签：