我有一个df,其中包含1000多列日期(从2017-01-01到2020-01-01(,以及500多个供应商的行,其中填充了每个供应商的每日销售额。我想把它分为3行(提供者、日期和每日销售额(,但无法理解pivot_langer在2列以上输出中的工作方式。
我所拥有的:
+------------+------------+------------+------------+------------+
| Provider | 2017-01-01 | 2017-01-02 | 2017-01-03 | 2017-01-04 |
+------------+------------+------------+------------+------------+
| Nestle | 12 | 10 | 8 | 12 |
| Heineken | - | 3 | 1 | 3 |
| Hagen Dazs | 5 | - | - | 2 |
+------------+------------+------------+------------+------------+
我想要的输出:
| Provider | Date | Sales |
+------------+------------+-------+
| Nestle | 2017-01-01 | 12 |
| Nestle | 2017-01-02 | 8 |
| Nestle | 2017-01-03 | 10 |
| Nestle | 2017-01-04 | 12 |
| Heineken | 2017-01-02 | 3 |
| Heineken | 2017-01-03 | 1 |
| Heineken | 2017-01-04 | 3 |
| Hagen Dazs | 2017-01-01 | 5 |
| Hagen Dazs | 2017-01-04 | 2 |
+------------+------------+-------+
从技术上讲,@AllanCameron的伟大建议起到了作用。您只需要为-
添加一个过滤器,并在需要时将变量格式化为数字:
library(dplyr)
library(tidyr)
#Code
newdf <- df %>% pivot_longer(-Provider) %>%
filter(value!='-') %>% mutate(value=as.numeric(value))
输出:
# A tibble: 9 x 3
Provider name value
<chr> <chr> <dbl>
1 Nestle 2017-01-01 12
2 Nestle 2017-01-02 10
3 Nestle 2017-01-03 8
4 Nestle 2017-01-04 12
5 Heineken 2017-01-02 3
6 Heineken 2017-01-03 1
7 Heineken 2017-01-04 3
8 Hagen Dazs 2017-01-01 5
9 Hagen Dazs 2017-01-04 2
使用的一些数据:
#Data
df <- structure(list(Provider = c("Nestle", "Heineken", "Hagen Dazs"
), `2017-01-01` = c("12", "-", "5"), `2017-01-02` = c("10", "3",
"-"), `2017-01-03` = c("8", "1", "-"), `2017-01-04` = c("12",
"3", "2")), row.names = c(NA, -3L), class = "data.frame")
带有melt
的选项
library(reshape2)
library(dplyr)
df %>%
melt(id.var = 'Provider') %>%
mutate(value = replace(value, value == '-', NA)) %>%
type.convert(as.is = TRUE)
数据
df <- structure(list(Provider = c("Nestle", "Heineken", "Hagen Dazs"
), `2017-01-01` = c("12", "-", "5"), `2017-01-02` = c("10", "3",
"-"), `2017-01-03` = c("8", "1", "-"), `2017-01-04` = c("12",
"3", "2")), row.names = c(NA, -3L), class = "data.frame")