我有两组数据看起来像这样(这是其中的一个子集)。
data1 <- data.frame("Metal" = c("Al", "Al", "Al", "Al", "Al", "Al", "Al"), "Type" =
c("F", "F", "F", "F", "F", "F", "F"), "Date" = c("2000-01-01", "2000-01-01", "2000-
01-02", "2000-01-03",
"2000-01-03", "2000-01-07", "2000-01-07"), "Time" = c("11:00:00", "12:00:00",
"15:00:00", "13:00:00", "17:00:00", "20:00:00", "20:00:00"), "Value" = c(100, 200,
300, 100, 400, 500, 500))
data2 <- data.frame("Metal" = c("Al", "Al", "Al", "Al", "Al", "Al", "Al"), "Type" =
c("P", "P",
"P", "P", "P",
"P", "P"), "Date" = c("2000-01-01", "2000-01-01", "2000-01-01", "2000-01-03", "2000-
01-03",
"2000-01-04", "2000-01-07"), "Time" = c("11:00:00", "11:00:00", "14:00:00",
"17:00:00", "13:00:00", "16:00:00", "20:00:00"), "Value" = c(100, 100, 200, 900, 100,
400, 999))
我想保留两个表中日期和时间相同的数据,并创建一个新表(data3)。有时在data1和data2中,会有重复项,我不希望data3包含这些重复项,只有其中的一个,以及来自另一个表的对。我还希望输出表被排序,以显示每个表下的对(所以我的"类型"列将是交替的F、P、F、P等)
这是我想要的输出
data3 <- data.frame("Metal" = c("Al", "Al", "Al", "Al", "Al",
"Al", "Al", "Al"), "Type" = c("F", "P", "F",
"P", "F", "P", "F", "P"), "Date" = c("2000-01-01", "2000-01-01",
"2000-01-03", "2000-01-03", "2000-01-03", "2000-01-03", "2001-01-
07", "2001-01-07"), "Time" =
c("11:00:00", "11:00:00", "13:00:00",
"13:00:00", "17:00:00", "17:00:00", "20:00:00", "20:00:00"),
"Value" = c(100, 100, 100, 100, 400, 900, 500, 999))
我尝试过使用dplyr的各种类型的连接,但它们并没有以我希望的方式连接。
谢谢你的帮助!!
我们可能需要绑定数据,然后在分组后filter
取出重复数据
library(dplyr)
library(data.table)
bind_rows(data1, data2, .id = 'grp')%>%
group_by(Metal, Date, Time) %>%
filter(n() > 1) %>%
arrange(Date, Time, rowid(grp)) %>%
slice(match(c("F", "P"), Type)) %>%
ungroup %>%
select(-grp)
-输出
# A tibble: 8 × 5
Metal Type Date Time Value
<chr> <chr> <chr> <chr> <dbl>
1 Al F 2000-01-01 11:00:00 100
2 Al P 2000-01-01 11:00:00 100
3 Al F 2000-01-03 13:00:00 100
4 Al P 2000-01-03 13:00:00 100
5 Al F 2000-01-03 17:00:00 400
6 Al P 2000-01-03 17:00:00 900
7 Al F 2000-01-07 20:00:00 500
8 Al P 2000-01-07 20:00:00 999
-OP的数据
> data3
Metal Type Date Time Value
1 Al F 2000-01-01 11:00:00 100
2 Al P 2000-01-01 11:00:00 100
3 Al F 2000-01-03 13:00:00 100
4 Al P 2000-01-03 13:00:00 100
5 Al F 2000-01-03 17:00:00 400
6 Al P 2000-01-03 17:00:00 900
7 Al F 2001-01-07 20:00:00 500
8 Al P 2001-01-07 20:00:00 999
这并不容易:-)
library(dplyr)
bind_rows(data1, data2) %>%
group_by(Date, Time) %>%
filter(n()>1) %>%
ungroup() %>%
group_by(Type) %>%
arrange(Time) %>%
ungroup() %>%
mutate(Flag = ifelse(Type == "P" & lag(Type, default = last(Type)) == "F", 1, NA)) %>%
mutate(Flag1 = lead(Flag)) %>%
filter(if_any(.cols = starts_with("Flag"), .fns = ~ . == 1)) %>%
select(-starts_with("Flag"))
Metal Type Date Time Value
<chr> <chr> <chr> <chr> <dbl>
1 Al F 2000-01-01 11:00:00 100
2 Al P 2000-01-01 11:00:00 100
3 Al F 2000-01-03 13:00:00 100
4 Al P 2000-01-03 13:00:00 100
5 Al F 2000-01-03 17:00:00 400
6 Al P 2000-01-03 17:00:00 900
7 Al F 2000-01-07 20:00:00 500
8 Al P 2000-01-07 20:00:00 999
inner_join
的一种方法
这里的困难在于获得正确的格式,仅仅数据过滤器本身就是在inner_join之后完成的。
library(dplyr)
library(tidyr)
joined <- inner_join(data1 %>% distinct(), data2 %>% distinct(),
c("Metal", "Date", "Time"))
joined
Metal Type.x Date Time Value.x Type.y Value.y
1 Al F 2000-01-01 11:00:00 100 P 100
2 Al F 2000-01-03 13:00:00 100 P 100
3 Al F 2000-01-03 17:00:00 400 P 900
4 Al F 2000-01-07 20:00:00 500 P 999
排列数据
joined %>%
pivot_longer(starts_with("Type"), values_to="Type") %>%
rowwise() %>%
mutate(Value = c_across(starts_with("Value"))[c(F=1, P=2)[Type]]) %>%
select(-contains("."), -name) %>%
ungroup()
# A tibble: 8 × 5
Metal Date Time Type Value
<chr> <chr> <chr> <chr> <dbl>
1 Al 2000-01-01 11:00:00 F 100
2 Al 2000-01-01 11:00:00 P 100
3 Al 2000-01-03 13:00:00 F 100
4 Al 2000-01-03 13:00:00 P 100
5 Al 2000-01-03 17:00:00 F 400
6 Al 2000-01-03 17:00:00 P 900
7 Al 2000-01-07 20:00:00 F 500
8 Al 2000-01-07 20:00:00 P 999