将分组的行转置为r中的从到列

  • 本文关键字:转置 r row multiple-columns
  • 更新时间 :
  • 英文 :


我有一个数据集,我想将这些分组的行转换为列,并计算距离。

假设我有一个数据集,如下所示其中";订单;列只在组内相关,并且有许多种mod,一个组只使用一个mod。当然,数据帧有更多的带有组的行。

>001<2022-0102>><1d>California>36.778261rk>>trkalifornia>-119.4179329>trk利福尼亚td><2022-02-11>022-02-12><2022-02-14>022-02-18>><1td>41.385064<2022-02-27>ar><2022-02-02>022-02-04><2022-02-07>022-02-10>
Group1 order Area_idstart_date区域名称lat
A 12022-01-01trk-119.4179324
A 2 0002 2022-01-04 2022-01-07加利福尼亚td>-119.4179325
A 3 0003 2022-01-10 2022-01-1236.778264
A 4 0004 2022-01-16 2022-01-20-119.4179330
B 1 0012busBarcelona41.3850632.1734036
B 2 0013busBarcelonb2.1734037
C 1 00202022-01-29-122.2711133
C 2 0012car2.1734036
C 3 0009caroaklandc37.8043637-122.2711237

澄清后更新:参见注释:

library(dplyr)
library(geosphere)
df %>% 
group_by(Group1) %>% 
mutate(end_date1 = lead(start_date)) %>% 
mutate(start_date = end_date, .keep="unused") %>% 
mutate(Area_id = str_pad(as.character(Area_id), 4, pad="0")) %>%
mutate(To_area_id = lead(Area_id), .after="Area_id") %>% 
mutate(To_area = lead(Area_name), .after="Area_name") %>% 
mutate(across(c(lat, long), ~lead(.), .names = "To_{.col}")) %>% 
na.omit() %>% 
rename(end_date = end_date1, From_area_id = Area_id, From_area = Area_name, From_lat=lat, From_long = long) %>% 
rowwise() %>% 
mutate(Distance = distHaversine(c(From_long, From_lat), 
c(To_long, To_lat)), .keep="unused") %>% 
select(Group = Group1, From_area_id, To_area_id,
From_area, To_area, start_date, end_date, mod, Distance)
Group From_area_id To_area_id From_area  To_area    start_date end_date   mod      Distance
<chr> <chr>        <chr>      <chr>      <chr>      <chr>      <chr>      <chr>       <dbl>
1 A     0001         0002       California Californib 2022-01-02 2022-01-04 trk         0.112
2 A     0002         0003       Californib Californic 2022-01-07 2022-01-10 trk         0.225
3 A     0003         0004       Californic Californid 2022-01-12 2022-01-16 trk         0.112
4 B     0012         0013       Barcelona  Barcelonb  2022-02-12 2022-02-14 bus         0.112
5 C     0020         0012       oaklanda   Barcelona  2022-01-29 2022-02-02 car   9572260.   
6 C     0012         0009       Barcelona  oaklandc   2022-02-04 2022-02-07 car   9572257.   

第一个答案:

类似这样的东西:

library(dplyr)
library(geosphere)
df %>% 
group_by(Group1) %>% 
mutate(Area_id = str_pad(as.character(Area_id), 4, pad="0")) %>%
mutate(To_area_id = lead(Area_id), .after="Area_id") %>% 
mutate(To_area = lead(Area_name), .after="Area_name") %>% 
mutate(across(c(lat, long), ~lead(.), .names = "To_{.col}")) %>% 
na.omit() %>% 
rename(From_area_id = Area_id, From_area = Area_name, From_lat=lat, From_long = long) %>% 
rowwise() %>% 
mutate(Distance = distHaversine(c(From_long, From_lat), 
c(To_long, To_lat)), .keep="unused")
Group1 order From_area_id To_area_id start_date end_date   mod   From_area  To_area       Distance
<chr>  <int> <chr>        <chr>      <chr>      <chr>      <chr> <chr>      <chr>            <dbl>
1 A          1 0001         0002       2022-01-01 2022-01-02 trk   California Californib       0.112
2 A          2 0002         0003       2022-01-04 2022-01-07 trk   Californib Californic       0.225
3 A          3 0003         0004       2022-01-10 2022-01-12 trk   Californic Californid       0.112
4 B          1 0012         0013       2022-02-11 2022-02-12 bus   Barcelona  Barcelonb        0.112
5 C          1 0020         0012       2022-01-27 2022-01-29 car   oaklanda   Barcelona  9572260.   
6 C          2 0012         0009       2022-02-02 2022-02-04 car   Barcelona  oaklandc   9572257.  

相关内容

  • 没有找到相关文章

最新更新