我有一个数据集,我想将这些分组的行转换为列,并计算距离。
假设我有一个数据集,如下所示其中";订单;列只在组内相关,并且有许多种mod,一个组只使用一个mod。当然,数据帧有更多的带有组的行。
Group1 | order | Area_id | >start_date区域名称 | lat | 长 | |
---|---|---|---|---|---|---|
A | 1 | 0012022-01-01 | <2022-0102>>trk | <1d>California>36.778261-119.4179324 | ||
A | 2 | 0002 | 2022-01-04 | 2022-01-07 | rk>加利福尼亚td> | -119.4179325 |
A | 3 | 0003 | 2022-01-10 | 2022-01-12 | >trkalifornia>36.778264 | -119.4179329|
A | 4 | 0004 | 2022-01-16 | 2022-01-20 | >trk利福尼亚td>-119.4179330 | |
B | 1 | 0012 | <2022-02-11>022-02-12bus | >Barcelona | 41.385063 | 2.1734036 |
B | 2 | 0013 | <2022-02-14>022-02-18>bus | >Barcelonb | <1td>41.3850642.1734037 | |
C | 1 | 0020 | <2022-02-27>2022-01-29 | ar>-122.2711133 | ||
C | 2 | 0012 | <2022-02-02>022-02-04>car | 2.1734036 | ||
C | 3 | 0009 | <2022-02-07>022-02-10car | >oaklandc | 37.8043637 | -122.2711237 |
澄清后更新:参见注释:
library(dplyr)
library(geosphere)
df %>%
group_by(Group1) %>%
mutate(end_date1 = lead(start_date)) %>%
mutate(start_date = end_date, .keep="unused") %>%
mutate(Area_id = str_pad(as.character(Area_id), 4, pad="0")) %>%
mutate(To_area_id = lead(Area_id), .after="Area_id") %>%
mutate(To_area = lead(Area_name), .after="Area_name") %>%
mutate(across(c(lat, long), ~lead(.), .names = "To_{.col}")) %>%
na.omit() %>%
rename(end_date = end_date1, From_area_id = Area_id, From_area = Area_name, From_lat=lat, From_long = long) %>%
rowwise() %>%
mutate(Distance = distHaversine(c(From_long, From_lat),
c(To_long, To_lat)), .keep="unused") %>%
select(Group = Group1, From_area_id, To_area_id,
From_area, To_area, start_date, end_date, mod, Distance)
Group From_area_id To_area_id From_area To_area start_date end_date mod Distance
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl>
1 A 0001 0002 California Californib 2022-01-02 2022-01-04 trk 0.112
2 A 0002 0003 Californib Californic 2022-01-07 2022-01-10 trk 0.225
3 A 0003 0004 Californic Californid 2022-01-12 2022-01-16 trk 0.112
4 B 0012 0013 Barcelona Barcelonb 2022-02-12 2022-02-14 bus 0.112
5 C 0020 0012 oaklanda Barcelona 2022-01-29 2022-02-02 car 9572260.
6 C 0012 0009 Barcelona oaklandc 2022-02-04 2022-02-07 car 9572257.
第一个答案:
类似这样的东西:
library(dplyr)
library(geosphere)
df %>%
group_by(Group1) %>%
mutate(Area_id = str_pad(as.character(Area_id), 4, pad="0")) %>%
mutate(To_area_id = lead(Area_id), .after="Area_id") %>%
mutate(To_area = lead(Area_name), .after="Area_name") %>%
mutate(across(c(lat, long), ~lead(.), .names = "To_{.col}")) %>%
na.omit() %>%
rename(From_area_id = Area_id, From_area = Area_name, From_lat=lat, From_long = long) %>%
rowwise() %>%
mutate(Distance = distHaversine(c(From_long, From_lat),
c(To_long, To_lat)), .keep="unused")
Group1 order From_area_id To_area_id start_date end_date mod From_area To_area Distance
<chr> <int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl>
1 A 1 0001 0002 2022-01-01 2022-01-02 trk California Californib 0.112
2 A 2 0002 0003 2022-01-04 2022-01-07 trk Californib Californic 0.225
3 A 3 0003 0004 2022-01-10 2022-01-12 trk Californic Californid 0.112
4 B 1 0012 0013 2022-02-11 2022-02-12 bus Barcelona Barcelonb 0.112
5 C 1 0020 0012 2022-01-27 2022-01-29 car oaklanda Barcelona 9572260.
6 C 2 0012 0009 2022-02-02 2022-02-04 car Barcelona oaklandc 9572257.