您好,我有三个表,我需要确定一个过程才能构建这些表以准备合并。一旦我想通了这一点,我觉得它会对我有很长一段时间的帮助。
第一张表,我有一个表格,对于交通事故中涉及的每个vehicleID
,列出交通事故中遭受的损失(挡泥板、后端等vehicleID
)。我想为每个VEHICLEID
制作三个新的IMPACTCODE
变量,它们显示下表中列出的最多三个IMPACTTYPE
代码(格式char
):
itemindex damageid impacttype vehicleid
1 2433 1 VEH1
2 2434 11 VEH1
3 2435 13 VEH1
1 2521 10 VEH012
2 2522 8 VEH012
我希望帮助将其结构化为以下内容,其中每个vehicleID
都有一行:
vehicleid impacttype_1 impacttype_2 impacttype_3
VEH1 1 11 13
VEH012 10 8 NA
对于第二个表格,我列出了交通事故中涉及personID's
,以及几个contributing circumstance
代码 - 这些代码描述了该personID
是否被报告为发短信,超速,正常或其他导致所述事故的情况。可以分配的codes
数量可以达到无限量,但我想只保留前四个。我想做一些与第一个类似的事情,但在这张表的情况下,我正在处理更多变量。所以我想了解这两个过程有何不同。请注意,在PERSON_06的情况下,代码并不总是从第一个位置开始:
itemindex circumstancecode circumstanceid circumstancetype personid reportnumber
1 18 4231703 Person PERSON_01 REPORT1
2 17 4231704 Person PERSON_01 REPORT1
3 40.88 4231705 Person PERSON_01 REPORT1
4 70.88 4231706 Person PERSON_01 REPORT1
1 0 4231707 Person PERSON_02 REPORT1
1 61 4231708 Person PERSON_03 REPORT2
1 17 4231709 Person PERSON_04 REPORT3
1 40.88 4231710 Person PERSON_05 REPORT4
1 0 4231711 Person PERSON_06 REPORT4
2 0 4231712 Person PERSON_06 REPORT4
3 55.88 4231713 Person PERSON_06 REPORT4
我想像下面这样构建这个表(代码 1 到 4 基于circumstancecode
) 请注意,对于没有代码填充 code1-4 变量的情况,我放置了一个 NA:
personid reportnumber circumstancetype code_1 code1 code_3 code_4
PERSON_01 REPORT1 PERSON 18 17 40.88 70.88
PERSON_02 REPORT1 PERSON 0 NA NA NA
PERSON_03 REPORT2 PERSON 61 NA NA NA
PERSON_04 REPORT3 PERSON 17 NA NA NA
PERSON_05 REPORT4 PERSON 40.88 NA NA NA
PERSON_06 REPORT4 PERSON 0 0 55.88 NA
我想将所有code_
值保留为字符,因为某些值中包含字母,并且我想保留原始数据中的确切格式(因此40.88
是字符)。
对于最后的桌子,需求略有不同。我想改变这个表格,其中列出了personID's
目击车祸的人。
itemindex personid reportnumber
1 PERSON_55 REPORT_1
2 PERSON_56 REPORT_1
1 PERSON_44 REPORT_2
1 PERSON_32 REPORT_3
2 PERSON_34 REPORT_3
3 PERSON_35 REPORT_3
4 PERSON_36 REPORT_3
1 PERSON_21 REPORT_4
2 PERSON_22 REPORT_4
3 PERSON_23 REPORT_4
我希望每一行都是一个崩溃reportnumber
最多三个变量,最多列出三个见证人:
reportnumber witness1 witness2 witness3
REPORT_1 PERSON_55 PERSON_56 NA
REPORT_2 PERSON_44 NA NA
REPORT_3 PERSON_32 PERSON_34 PERSON_35
REPORT_4 PERSON_21 PERSON_23 NA
非常感谢您的帮助!!
你的 3 个例子很相似。我将提供一些一般解释,然后是每个示例的代码。
使用tidyverse
,您可以使用pivot_wider
将数据转换为宽格式。在这种情况下,您首先要枚举每个组中的行,这可以从row_number()
.然后,使用pivot_wider
,您可以指定vehicleid
的id_cols
。列名派生自行号,并在行号前面添加前缀"impacttype_"。这些值取自impacttype
。
library(tidyverse)
df1 %>%
group_by(vehicleid) %>%
mutate(rn = row_number()) %>%
pivot_wider(id_cols = vehicleid, names_from = rn, values_from = impacttype, names_prefix = "impacttype_")
输出
vehicleid impacttype_1 impacttype_2 impacttype_3
<chr> <int> <int> <int>
1 VEH1 1 11 13
2 VEH012 10 8 NA
在第二个示例中,您可以使用slice_head
获取每个组中的前 4 行。否则,pivot_wider
相对相似。
df2 %>%
group_by(personid) %>%
slice_head(n = 4) %>%
mutate(rn = row_number()) %>%
pivot_wider(id_cols = c(personid, reportnumber, circumstancetype), names_from = rn, values_from = circumstancecode, names_prefix = "code_")
输出
personid reportnumber circumstancetype code_1 code_2 code_3 code_4
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 PERSON_01 REPORT1 Person 18 17 40.9 70.9
2 PERSON_02 REPORT1 Person 0 NA NA NA
3 PERSON_03 REPORT2 Person 61 NA NA NA
4 PERSON_04 REPORT3 Person 17 NA NA NA
5 PERSON_05 REPORT4 Person 40.9 NA NA NA
6 PERSON_06 REPORT4 Person 0 0 55.9 NA
在第三个示例中,您已经有一个用于pivot_wider
的索引。
df3 %>%
pivot_wider(id_cols = reportnumber, names_from = itemindex, values_from = personid, names_prefix = "witness")
输出
reportnumber witness1 witness2 witness3 witness4
<chr> <chr> <chr> <chr> <chr>
1 REPORT_1 PERSON_55 PERSON_56 NA NA
2 REPORT_2 PERSON_44 NA NA NA
3 REPORT_3 PERSON_32 PERSON_34 PERSON_35 PERSON_36
4 REPORT_4 PERSON_21 PERSON_22 PERSON_23 NA
数据 1
library(dplyr)
library(tidyr)
dat1 %>%
select(-damageid) %>%
pivot_wider(vehicleid, names_from = "itemindex", names_prefix = "impacttype_",
values_from = "impacttype")
# # A tibble: 2 x 4
# vehicleid impacttype_1 impacttype_2 impacttype_3
# <chr> <int> <int> <int>
# 1 VEH1 1 11 13
# 2 VEH012 10 8 NA
作为额外的信用(如果您想保留damageid
),您可以一次加宽两列:
dat1 %>%
pivot_wider(vehicleid, names_from = "itemindex",
values_from = c("impacttype", "damageid"))
# # A tibble: 2 x 7
# vehicleid impacttype_1 impacttype_2 impacttype_3 damageid_1 damageid_2 damageid_3
# <chr> <int> <int> <int> <int> <int> <int>
# 1 VEH1 1 11 13 2433 2434 2435
# 2 VEH012 10 8 NA 2521 2522 NA
数据 2
dat2 %>%
select(personid, reportnumber, circumstancetype,
itemindex, circumstancecode) %>%
pivot_wider(personid:circumstancetype,
names_from = "itemindex", names_prefix = "code_",
values_from = "circumstancecode")
# # A tibble: 6 x 7
# personid reportnumber circumstancetype code_1 code_2 code_3 code_4
# <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
# 1 PERSON_01 REPORT1 Person 18 17 40.9 70.9
# 2 PERSON_02 REPORT1 Person 0 NA NA NA
# 3 PERSON_03 REPORT2 Person 61 NA NA NA
# 4 PERSON_04 REPORT3 Person 17 NA NA NA
# 5 PERSON_05 REPORT4 Person 40.9 NA NA NA
# 6 PERSON_06 REPORT4 Person 0 0 55.9 NA
数据 3
dat3 %>%
pivot_wider(reportnumber,
names_from = "itemindex", names_prefix = "witness",
values_from = "personid")
# # A tibble: 4 x 5
# reportnumber witness1 witness2 witness3 witness4
# <chr> <chr> <chr> <chr> <chr>
# 1 REPORT_1 PERSON_55 PERSON_56 <NA> <NA>
# 2 REPORT_2 PERSON_44 <NA> <NA> <NA>
# 3 REPORT_3 PERSON_32 PERSON_34 PERSON_35 PERSON_36
# 4 REPORT_4 PERSON_21 PERSON_22 PERSON_23 <NA>
<小时 />原始数据:
dat1 <- structure(list(itemindex = c(1L, 2L, 3L, 1L, 2L), damageid = c(2433L, 2434L, 2435L, 2521L, 2522L), impacttype = c(1L, 11L, 13L, 10L, 8L), vehicleid = c("VEH1", "VEH1", "VEH1", "VEH012", "VEH012")), class = "data.frame", row.names = c(NA, -5L))
dat2 <- structure(list(itemindex = c(1L, 2L, 3L, 4L, 1L, 1L, 1L, 1L, 1L, 2L, 3L), circumstancecode = c(18, 17, 40.88, 70.88, 0, 61, 17, 40.88, 0, 0, 55.88), circumstanceid = 4231703:4231713, circumstancetype = c("Person", "Person", "Person", "Person", "Person", "Person", "Person", "Person", "Person", "Person", "Person"), personid = c("PERSON_01", "PERSON_01", "PERSON_01", "PERSON_01", "PERSON_02", "PERSON_03", "PERSON_04", "PERSON_05", "PERSON_06", "PERSON_06", "PERSON_06"), reportnumber = c("REPORT1", "REPORT1", "REPORT1", "REPORT1", "REPORT1", "REPORT2", "REPORT3", "REPORT4", "REPORT4", "REPORT4", "REPORT4")), class = "data.frame", row.names = c(NA, -11L))
dat3 <- structure(list(itemindex = c(1L, 2L, 1L, 1L, 2L, 3L, 4L, 1L, 2L, 3L), personid = c("PERSON_55", "PERSON_56", "PERSON_44", "PERSON_32", "PERSON_34", "PERSON_35", "PERSON_36", "PERSON_21", "PERSON_22", "PERSON_23"), reportnumber = c("REPORT_1", "REPORT_1", "REPORT_2", "REPORT_3", "REPORT_3", "REPORT_3", "REPORT_3", "REPORT_4", "REPORT_4", "REPORT_4")), class = "data.frame", row.names = c(NA, -10L))