r语言 - 根据 ID 和堆叠值重新排列三个类似的表



您好,我有三个表,我需要确定一个过程才能构建这些表以准备合并。一旦我想通了这一点,我觉得它会对我有很长一段时间的帮助。

第一张表,我有一个表格,对于交通事故中涉及的每个vehicleID,列出交通事故中遭受的损失(挡泥板、后端等vehicleID)。我想为每个VEHICLEID制作三个新的IMPACTCODE变量,它们显示下表中列出的最多三个IMPACTTYPE代码(格式char):

itemindex    damageid    impacttype    vehicleid
1            2433        1             VEH1
2            2434        11            VEH1
3            2435        13            VEH1
1            2521        10            VEH012
2            2522        8             VEH012

我希望帮助将其结构化为以下内容,其中每个vehicleID都有一行:

vehicleid    impacttype_1     impacttype_2    impacttype_3    
VEH1         1                11              13
VEH012       10               8               NA

对于第二个表格,我列出了交通事故中涉及personID's,以及几个contributing circumstance代码 - 这些代码描述了该personID是否被报告为发短信,超速,正常或其他导致所述事故的情况。可以分配的codes数量可以达到无限量,但我想只保留前四个。我想做一些与第一个类似的事情,但在这张表的情况下,我正在处理更多变量。所以我想了解这两个过程有何不同。请注意,在PERSON_06的情况下,代码并不总是从第一个位置开始

itemindex    circumstancecode    circumstanceid    circumstancetype    personid    reportnumber    
1            18                  4231703           Person              PERSON_01   REPORT1
2            17                  4231704           Person              PERSON_01   REPORT1
3            40.88               4231705           Person              PERSON_01   REPORT1
4            70.88               4231706           Person              PERSON_01   REPORT1
1            0                   4231707           Person              PERSON_02   REPORT1
1            61                  4231708           Person              PERSON_03   REPORT2
1            17                  4231709           Person              PERSON_04   REPORT3
1            40.88               4231710           Person              PERSON_05   REPORT4
1            0                   4231711           Person              PERSON_06   REPORT4
2            0                   4231712           Person              PERSON_06   REPORT4
3            55.88               4231713           Person              PERSON_06   REPORT4

我想像下面这样构建这个表(代码 1 到 4 基于circumstancecode) 请注意,对于没有代码填充 code1-4 变量的情况,我放置了一个 NA:

personid    reportnumber     circumstancetype    code_1    code1    code_3    code_4
PERSON_01   REPORT1          PERSON              18        17       40.88     70.88
PERSON_02   REPORT1          PERSON              0         NA       NA        NA
PERSON_03   REPORT2          PERSON              61        NA       NA        NA
PERSON_04   REPORT3          PERSON              17        NA       NA        NA
PERSON_05   REPORT4          PERSON              40.88     NA       NA        NA
PERSON_06   REPORT4          PERSON              0         0        55.88     NA

我想将所有code_值保留为字符,因为某些值中包含字母,并且我想保留原始数据中的确切格式(因此40.88是字符)。

对于最后的桌子,需求略有不同。我想改变这个表格,其中列出了personID's目击车祸的人。

itemindex    personid    reportnumber
1            PERSON_55   REPORT_1
2            PERSON_56   REPORT_1
1            PERSON_44   REPORT_2
1            PERSON_32   REPORT_3
2            PERSON_34   REPORT_3
3            PERSON_35   REPORT_3
4            PERSON_36   REPORT_3
1            PERSON_21   REPORT_4
2            PERSON_22   REPORT_4
3            PERSON_23   REPORT_4

我希望每一行都是一个崩溃reportnumber最多三个变量,最多列出三个见证人:

reportnumber    witness1    witness2    witness3
REPORT_1        PERSON_55   PERSON_56   NA
REPORT_2        PERSON_44   NA          NA
REPORT_3        PERSON_32   PERSON_34   PERSON_35
REPORT_4        PERSON_21   PERSON_23   NA

非常感谢您的帮助!!

你的 3 个例子很相似。我将提供一些一般解释,然后是每个示例的代码。

使用tidyverse,您可以使用pivot_wider将数据转换为宽格式。在这种情况下,您首先要枚举每个组中的行,这可以从row_number().然后,使用pivot_wider,您可以指定vehicleidid_cols。列名派生自行号,并在行号前面添加前缀"impacttype_"。这些值取自impacttype

library(tidyverse)
df1 %>%
group_by(vehicleid) %>%
mutate(rn = row_number()) %>%
pivot_wider(id_cols = vehicleid, names_from = rn, values_from = impacttype, names_prefix = "impacttype_")

输出

vehicleid impacttype_1 impacttype_2 impacttype_3
<chr>            <int>        <int>        <int>
1 VEH1                 1           11           13
2 VEH012              10            8           NA

在第二个示例中,您可以使用slice_head获取每个组中的前 4 行。否则,pivot_wider相对相似。

df2 %>%
group_by(personid) %>%
slice_head(n = 4) %>%
mutate(rn = row_number()) %>%
pivot_wider(id_cols = c(personid, reportnumber, circumstancetype), names_from = rn, values_from = circumstancecode, names_prefix = "code_")

输出

personid  reportnumber circumstancetype code_1 code_2 code_3 code_4
<chr>     <chr>        <chr>             <dbl>  <dbl>  <dbl>  <dbl>
1 PERSON_01 REPORT1      Person             18       17   40.9   70.9
2 PERSON_02 REPORT1      Person              0       NA   NA     NA  
3 PERSON_03 REPORT2      Person             61       NA   NA     NA  
4 PERSON_04 REPORT3      Person             17       NA   NA     NA  
5 PERSON_05 REPORT4      Person             40.9     NA   NA     NA  
6 PERSON_06 REPORT4      Person              0        0   55.9   NA 

在第三个示例中,您已经有一个用于pivot_wider的索引。

df3 %>%
pivot_wider(id_cols = reportnumber, names_from = itemindex, values_from = personid, names_prefix = "witness")

输出

reportnumber witness1  witness2  witness3  witness4 
<chr>        <chr>     <chr>     <chr>     <chr>    
1 REPORT_1     PERSON_55 PERSON_56 NA        NA       
2 REPORT_2     PERSON_44 NA        NA        NA       
3 REPORT_3     PERSON_32 PERSON_34 PERSON_35 PERSON_36
4 REPORT_4     PERSON_21 PERSON_22 PERSON_23 NA 

数据 1

library(dplyr)
library(tidyr)
dat1 %>%
select(-damageid) %>%
pivot_wider(vehicleid, names_from = "itemindex", names_prefix = "impacttype_",
values_from = "impacttype")
# # A tibble: 2 x 4
#   vehicleid impacttype_1 impacttype_2 impacttype_3
#   <chr>            <int>        <int>        <int>
# 1 VEH1                 1           11           13
# 2 VEH012              10            8           NA

作为额外的信用(如果您想保留damageid),您可以一次加宽两列:

dat1 %>%
pivot_wider(vehicleid, names_from = "itemindex",
values_from = c("impacttype", "damageid"))
# # A tibble: 2 x 7
#   vehicleid impacttype_1 impacttype_2 impacttype_3 damageid_1 damageid_2 damageid_3
#   <chr>            <int>        <int>        <int>      <int>      <int>      <int>
# 1 VEH1                 1           11           13       2433       2434       2435
# 2 VEH012              10            8           NA       2521       2522         NA

数据 2

dat2 %>%
select(personid, reportnumber, circumstancetype,
itemindex, circumstancecode) %>%
pivot_wider(personid:circumstancetype,
names_from = "itemindex", names_prefix = "code_",
values_from = "circumstancecode")
# # A tibble: 6 x 7
#   personid  reportnumber circumstancetype code_1 code_2 code_3 code_4
#   <chr>     <chr>        <chr>             <dbl>  <dbl>  <dbl>  <dbl>
# 1 PERSON_01 REPORT1      Person             18       17   40.9   70.9
# 2 PERSON_02 REPORT1      Person              0       NA   NA     NA  
# 3 PERSON_03 REPORT2      Person             61       NA   NA     NA  
# 4 PERSON_04 REPORT3      Person             17       NA   NA     NA  
# 5 PERSON_05 REPORT4      Person             40.9     NA   NA     NA  
# 6 PERSON_06 REPORT4      Person              0        0   55.9   NA  

数据 3

dat3 %>%
pivot_wider(reportnumber,
names_from = "itemindex", names_prefix = "witness",
values_from = "personid")
# # A tibble: 4 x 5
#   reportnumber witness1  witness2  witness3  witness4 
#   <chr>        <chr>     <chr>     <chr>     <chr>    
# 1 REPORT_1     PERSON_55 PERSON_56 <NA>      <NA>     
# 2 REPORT_2     PERSON_44 <NA>      <NA>      <NA>     
# 3 REPORT_3     PERSON_32 PERSON_34 PERSON_35 PERSON_36
# 4 REPORT_4     PERSON_21 PERSON_22 PERSON_23 <NA>     
<小时 />

原始数据:

dat1 <- structure(list(itemindex = c(1L, 2L, 3L, 1L, 2L), damageid = c(2433L, 2434L, 2435L, 2521L, 2522L), impacttype = c(1L, 11L, 13L, 10L, 8L), vehicleid = c("VEH1", "VEH1", "VEH1", "VEH012", "VEH012")), class = "data.frame", row.names = c(NA, -5L))
dat2 <- structure(list(itemindex = c(1L, 2L, 3L, 4L, 1L, 1L, 1L, 1L, 1L, 2L, 3L), circumstancecode = c(18, 17, 40.88, 70.88, 0, 61, 17, 40.88, 0, 0, 55.88), circumstanceid = 4231703:4231713, circumstancetype = c("Person", "Person", "Person", "Person", "Person", "Person", "Person", "Person", "Person", "Person", "Person"), personid = c("PERSON_01", "PERSON_01", "PERSON_01", "PERSON_01", "PERSON_02", "PERSON_03", "PERSON_04", "PERSON_05", "PERSON_06", "PERSON_06", "PERSON_06"), reportnumber = c("REPORT1", "REPORT1", "REPORT1", "REPORT1", "REPORT1", "REPORT2", "REPORT3", "REPORT4", "REPORT4", "REPORT4", "REPORT4")), class = "data.frame", row.names = c(NA, -11L))
dat3 <- structure(list(itemindex = c(1L, 2L, 1L, 1L, 2L, 3L, 4L, 1L, 2L, 3L), personid = c("PERSON_55", "PERSON_56", "PERSON_44", "PERSON_32", "PERSON_34", "PERSON_35", "PERSON_36", "PERSON_21", "PERSON_22", "PERSON_23"), reportnumber = c("REPORT_1", "REPORT_1", "REPORT_2", "REPORT_3", "REPORT_3", "REPORT_3", "REPORT_3", "REPORT_4", "REPORT_4", "REPORT_4")), class = "data.frame", row.names = c(NA, -10L))

最新更新