我有一个相当大的销售信息数据集。其中一些有负的salesvalue
(信用票据)。我需要将这些信贷票据与原始销售票据联系起来,同时保留原始销售日期。没有单一的变量表明信用证票据属于哪个销售票据,所以我必须确定所有其他变量是相等的。
示例数据:
data <- data.frame(number = 1:4,
Date=c("02-01-2022", "06-01-2022", "07-01-2022", "06-01-2022"),
state = c("filet", "filet", "whole", "filet"),
specie = c("haddock", "cod", "haddock", "cod"),
salesvalue= c(300,500,200,-400),
salesdate=c("01-01-2022", "05-01-2022", "06-01-2022", "12-01-2022"))
data
# number Date state specie value salesdate
# 1 02-01-2022 filet haddock 300 01-01-2022
# 2 06-01-2022 filet cod 500 05-01-2022
# 3 07-01-2022 whole haddock 200 06-01-2022
# 4 06-01-2022 filet cod -400 12-01-2022
我想把salesvalue < 0
的salesdate
改为salesdate
,其中salesvalue > 0
和其他所有元素相等,并将它们聚合在一起。
在本例中,我想将编号4的salesdate
从"12-01-2022"05 - 01 - 2022"(与2相同,其他值相等)。
这是我尝试过的:
datanew <- data %>%
group_by(Date, state, specie, value) %>%
mutate(date_sale=if_else(value < 0,
first(salesdate[value>0]), salesdate)) %>%
group_by(Date, state, specie, value, salesdate, date_sale) %>%
summarise(value_new=sum(value))
datanew
# Date state specie value salesdate date_sale value_new
<chr> <chr> <chr> <dbl> <chr> <chr> <dbl>
# 02-01-2022 filet haddock 300 01-01-2022 01-01-2022 300
# 06-01-2022 filet cod -400 12-01-2022 NA -400
# 06-01-2022 filet cod 500 05-01-2022 05-01-2022 500
# 07-01-2022 whole haddock 200 06-01-2022 06-01-2022 200
但这不能解决我的问题。我没有我知道如何解决这个问题,并将感谢任何建议!
提前感谢!
您可以使用rows_update()
:
library(dplyr)
data %>%
filter(salesvalue >= 0) %>%
select(-c(number, salesvalue)) %>%
rows_update(data, ., by = c("Date", "state", "specie"))
# number Date state specie salesvalue salesdate
# 1 1 02-01-2022 filet haddock 300 01-01-2022
# 2 2 06-01-2022 filet cod 500 05-01-2022
# 3 3 07-01-2022 whole haddock 200 06-01-2022
# 4 4 06-01-2022 filet cod -400 05-01-2022
的另一个选择:
data %>%
filter(salesvalue >= 0) %>%
select(-c(number, salesvalue), x = salesdate) %>%
left_join(data, .) %>%
mutate(salesdate = ifelse(salesvalue < 0, x, salesdate)) %>%
select(-x)
请检查一下这是否适用于您:Cave:它可能强烈影响原始数据!
这个想法是通过分组和安排来把数据放在合适的位置:
library(dplyr)
data %>%
group_by(Date, state, specie) %>%
arrange(salesvalue, .by_group = TRUE) %>%
ungroup() %>%
mutate(salesdate = ifelse(salesvalue < 0, lead(salesdate), salesdate)) %>%
arrange(number)
number Date state specie salesvalue salesdate
<int> <chr> <chr> <chr> <dbl> <chr>
1 1 02-01-2022 filet haddock 300 01-01-2022
2 2 06-01-2022 filet cod 500 05-01-2022
3 3 07-01-2022 whole haddock 200 06-01-2022
4 4 06-01-2022 filet cod -400 05-01-2022