r-将更长的数据透视到3列中



我有一个df,其中包含1000多列日期(从2017-01-01到2020-01-01(,以及500多个供应商的行,其中填充了每个供应商的每日销售额。我想把它分为3行(提供者、日期和每日销售额(,但无法理解pivot_langer在2列以上输出中的工作方式。

我所拥有的:

+------------+------------+------------+------------+------------+
|  Provider  | 2017-01-01 | 2017-01-02 | 2017-01-03 | 2017-01-04 |
+------------+------------+------------+------------+------------+
| Nestle     |         12 |         10 |          8 |         12 |
| Heineken   |          - |          3 |          1 |          3 |
| Hagen Dazs |          5 |          - |          - |          2 |
+------------+------------+------------+------------+------------+

我想要的输出:

|  Provider  |    Date    | Sales |
+------------+------------+-------+
| Nestle     | 2017-01-01 |    12 |
| Nestle     | 2017-01-02 |     8 |
| Nestle     | 2017-01-03 |    10 |
| Nestle     | 2017-01-04 |    12 |
| Heineken   | 2017-01-02 |     3 |
| Heineken   | 2017-01-03 |     1 |
| Heineken   | 2017-01-04 |     3 |
| Hagen Dazs | 2017-01-01 |     5 |
| Hagen Dazs | 2017-01-04 |     2 |
+------------+------------+-------+

从技术上讲,@AllanCameron的伟大建议起到了作用。您只需要为-添加一个过滤器,并在需要时将变量格式化为数字:

library(dplyr)
library(tidyr)
#Code
newdf <- df %>% pivot_longer(-Provider) %>%
filter(value!='-') %>% mutate(value=as.numeric(value))

输出:

# A tibble: 9 x 3
Provider   name       value
<chr>      <chr>      <dbl>
1 Nestle     2017-01-01    12
2 Nestle     2017-01-02    10
3 Nestle     2017-01-03     8
4 Nestle     2017-01-04    12
5 Heineken   2017-01-02     3
6 Heineken   2017-01-03     1
7 Heineken   2017-01-04     3
8 Hagen Dazs 2017-01-01     5
9 Hagen Dazs 2017-01-04     2

使用的一些数据:

#Data
df <- structure(list(Provider = c("Nestle", "Heineken", "Hagen Dazs"
), `2017-01-01` = c("12", "-", "5"), `2017-01-02` = c("10", "3", 
"-"), `2017-01-03` = c("8", "1", "-"), `2017-01-04` = c("12", 
"3", "2")), row.names = c(NA, -3L), class = "data.frame")

带有melt的选项

library(reshape2)
library(dplyr)
df %>%
melt(id.var = 'Provider') %>% 
mutate(value = replace(value, value == '-', NA)) %>% 
type.convert(as.is = TRUE)

数据

df <- structure(list(Provider = c("Nestle", "Heineken", "Hagen Dazs"
), `2017-01-01` = c("12", "-", "5"), `2017-01-02` = c("10", "3", 
"-"), `2017-01-03` = c("8", "1", "-"), `2017-01-04` = c("12", 
"3", "2")), row.names = c(NA, -3L), class = "data.frame")

相关内容

  • 没有找到相关文章

最新更新