我有一个看起来像这样的数据集,虽然大得多
### ##Fake data for stack exdb <- data.frame(zone =
c(1,1,1,2,2,2), site = c("study", "collect", "collect", "study",
"collect", "collect"), x = c(53.307726, 53.310660, 53.307089,
53.313831, 53.319087, 53.318792), y = c(-6.222291, -6.217151, -6.215080, -6.214152, -6.218723, -6.215815))
我需要在研究站点和收集站点之间运行一个点分析,以查看以米为单位的距离。问题是我有许多不同的区域或组,它们都是独立的(即从区域1中的点到区域2中的点的距离是无关的)。
因此我需要做两件事,
点分析,以米为单位计算每个区域一个研究点与多个收集点之间的距离,
,然后写一个FOREACH或LOOP函数来计算数据集中每个组的这个距离。
最优输出应该是
exdb <- data.frame(zone = c(1,1,1,2,2,2),
site = c("study", "collect", "collect", "study", "collect", "collect"),
x = c(53.307726, 53.310660, 53.307089, 53.313831, 53.319087, 53.318792),
y = c(-6.222291, -6.217151, -6.215080, -6.214152, -6.218723, -6.215815),
dist = c(0, 10.3, 30.4, 0, 12.5, 11.2))
每个区域的研究地点总是0,因为它是到该地点的距离,并且到每个收集地点的距离只计算到每个独特区域的研究地点。
非常感谢。
杀
简单的Base R版本,不需要其他软件包。
从上面的exdb
开始。
首先添加一个名为dist
的新列,其值为"study"
,因为计划在zone
和site=="study"
上进行自合并:
> exdb$dist = "study"
自合并,只保留坐标列:
> MM = merge(exdb, exdb,
by.x=c("zone","site"),
by.y=c("zone","dist"))[,c("x.x","y.x","x.y","y.y")]
使用distGeo
覆盖dist
列。保持整洁:
> exdb$dist = distGeo(MM[,2:1],MM[,4:3])
> exdb
zone site x y dist
1 1 study 53.30773 -6.222291 0.0000
2 1 collect 53.31066 -6.217151 473.2943
3 1 collect 53.30709 -6.215080 485.8806
4 2 study 53.31383 -6.214152 0.0000
5 2 collect 53.31909 -6.218723 659.5238
6 2 collect 53.31879 -6.215815 563.1349
返回与@wimpel相同的答案,但没有额外的依赖关系,并且代码行更少。
我还在学习空间方面,但这是否有效?
library(sf)
library(tidyverse)
exdb %>%
arrange(zone, desc(site)) %>% #ensure study is first
st_as_sf(coords = c("x", "y"), crs = 4326) %>%
group_by(zone) %>%
mutate(
study_coord = geometry[1],
dist = st_distance(geometry, study_coord, by_element = T),
)
也许是这样的?
假设x和y是纬度和经度,我们可以使用haversine
函数在旋转表格后获得以米为单位的距离,使两个点在计算距离的行中(以米为单位):
library(tidyverse)
library(pracma)
#>
#> Attaching package: 'pracma'
#> The following object is masked from 'package:purrr':
#>
#> cross
data <- data.frame(zone = c(1, 1, 1, 2, 2, 2), site = c(
"study", "collect", "collect", "study",
"collect", "collect"
), x = c(
53.307726, 53.310660, 53.307089,
53.313831, 53.319087, 53.318792
), y = c(-6.222291, -6.217151, -6.215080, -6.214152, -6.218723, -6.215815))
data %>%
pivot_wider(names_from = site, values_from = c(x, y)) %>%
unnest(y_collect, y_study, x_collect, x_study) %>%
mutate(
dist = list(x_study, y_study, x_collect, y_collect) %>% pmap_dbl(~haversine(c(..1, ..2), c(..3, ..4)) * 1000)
)
#> Warning: Values are not uniquely identified; output will contain list-cols.
#> * Use `values_fn = list` to suppress this warning.
#> * Use `values_fn = length` to identify where the duplicates arise
#> * Use `values_fn = {summary_fun}` to summarise duplicates
#> Warning: Values are not uniquely identified; output will contain list-cols.
#> * Use `values_fn = list` to suppress this warning.
#> * Use `values_fn = length` to identify where the duplicates arise
#> * Use `values_fn = {summary_fun}` to summarise duplicates
#> Warning: unnest() has a new interface. See ?unnest for details.
#> Try `df %>% unnest(c(y_collect, y_study, x_collect, x_study))`, with `mutate()` if needed
#> # A tibble: 4 x 6
#> zone x_study x_collect y_study y_collect dist
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 53.3 53.3 -6.22 -6.22 472.
#> 2 1 53.3 53.3 -6.22 -6.22 484.
#> 3 2 53.3 53.3 -6.21 -6.22 659.
#> 4 2 53.3 53.3 -6.21 -6.22 563.
由reprex包(v2.0.1)于2021-09-13创建
我相信这应该行得通。但我无法在期望的输出中再现您的距离。
library(data.table)
library(purrr) # Or tidyverse
library(geosphere)
# Make your data a data.table
setDT(mydata)
# Split to a list based on zone and site
L <- split(mydata, by = c("zone", "site"), flatten = FALSE)
# Loop over list
L <- lapply(L, function(zone) {
#get reference point to take dustance from
point.study <- c(zone$study$y,zone$study$x)
zone$study$dist <- 0
# Calculate distance
zone$collect$dist <- unlist(purrr::pmap( list(a = zone$collect$y,
b = zone$collect$x ),
~(geosphere::distGeo( point.study, c(..1, ..2)))))
return(zone)
})
# Rowbind the results together
data.table::rbindlist(lapply(L, data.table::rbindlist))
# zone site x y dist
# 1: 1 study 53.30773 -6.222291 0.0000
# 2: 1 collect 53.31066 -6.217151 473.2943
# 3: 1 collect 53.30709 -6.215080 485.8806
# 4: 2 study 53.31383 -6.214152 0.0000
# 5: 2 collect 53.31909 -6.218723 659.5238
# 6: 2 collect 53.31879 -6.215815 563.1349