我有这个数据,除了Product_Code之外的所有变量都是重复的。我想创建新变量,例如:Prod_,Prod_2....而不是转置新变量的Product_Code并消除重复项。
ID DATE DAYS MONTH Product_Code
1 00003600B 2018-06-30 854 6 83648
2 00003600B 2018-06-30 854 6 40984
3 00003600B 2018-06-30 854 6 14534
4 00003600B 2018-06-30 854 6 18708
5 00003600B 2018-06-30 854 6 18710
我尝试了扩展和转置功能,但没有奏效。
spread(data = Tickets, key = ID, value = Product_Code)
我也尝试了转置,但它效果不佳
Tickets.t = t(Tickets)
关于我该怎么做的任何想法?
我期待类似这样的东西:
ID DATA DAYS MONTH PROD_1 PROD_2 PROD_3 PROD_4 PROD_5
00003600B 2018-06-30 854 6 83648 40984 14534 18708 18710
00003600B 2016-02-27 280 2 999195 999154 999339 0 0
00003600B 2015-05-23 77 5 999026 999339 999021 27640 999195
在这里,我们需要一个序列列。 按"ID"、"DATE"、"DAYS"、"MONTH"分组,通过将字符串"PROD"与row_number()
连接来创建"PROD"列,然后使用它来spread
"Product_Code"值
library(tidyverse)
Tickets %>%
group_by(ID, DATE, DAYS, MONTH) %>%
mutate(PROD = str_c("PROD_", row_number())) %>%
spread(PROD, Product_Code)
# A tibble: 1 x 9
# Groups: ID, DATE, DAYS, MONTH [1]
# ID DATE DAYS MONTH PROD_1 PROD_2 PROD_3 PROD_4 PROD_5
# <chr> <chr> <int> <int> <int> <int> <int> <int> <int>
#1 00003600B 2018-06-30 854 6 83648 40984 14534 18708 18710
数据
Tickets <- structure(list(ID = c("00003600B", "00003600B", "00003600B",
"00003600B", "00003600B"), DATE = c("2018-06-30", "2018-06-30",
"2018-06-30", "2018-06-30", "2018-06-30"), DAYS = c(854L, 854L,
854L, 854L, 854L), MONTH = c(6L, 6L, 6L, 6L, 6L), Product_Code = c(83648L,
40984L, 14534L, 18708L, 18710L)), class = "data.frame", row.names = c("1",
"2", "3", "4", "5"))
在使用点差之前,您需要添加一个与产品编号对应的变量。
library(tidyverse)
Ticket %>%
group_by(ID, DATE, DAYS, MONTH) %>%
mutate(PROD = 1:n()) %>%
spread(key = PROD, value = Product_code)