r-用plyr合并类似行时出错-我做错了什么



我有一个数据帧(dtetags.df),其中的日期列有许多重复的日期:

dtetags.df$Date
 "2016-07-22" "2016-07-22" "2016-07-21" "2016-07-21" "2016-07-20" "2016-07-20" "2016-07-19" "2016-07-19" "2016-07-18" "2016-07-18" "2016-07-15" "2016-07-15" "2016-07-15" "2016-07-14"
 "2016-07-14" "2016-07-13" "2016-07-13" "2016-07-13" "2016-07-12" "2016-07-12" "2016-07-12" "2016-07-12" "2016-07-11" "2016-07-11" "2016-07-11" "2016-07-11" "2016-07-08" "2016-07-08"
 "2016-07-08" "2016-07-07" "2016-07-07" "2016-07-07" "2016-07-07" "2016-07-06" "2016-07-06" "2016-07-05" "2016-07-05" "2016-07-05" "2016-07-05" "2016-07-01" "2016-07-01" "2016-06-30"
 "2016-06-30" "2016-06-29" "2016-06-29" "2016-06-29" "2016-06-29" "2016-06-29" "2016-06-28" "2016-06-28" "2016-06-28" "2016-06-27" "2016-06-27" "2016-06-27" "2016-06-24" "2016-06-24"
 "2016-06-23" "2016-06-23" "2016-06-22" "2016-06-22" "2016-06-21" "2016-06-21" "2016-06-20" "2016-06-20" "2016-06-17" "2016-06-17" "2016-06-16" "2016-06-16" "2016-06-15" "2016-06-15"
 "2016-06-14" "2016-06-13" "2016-06-13" "2016-06-10" "2016-06-10" "2016-06-09" "2016-06-09" "2016-06-09" "2016-06-09" "2016-06-08" "2016-06-08" "2016-06-07" "2016-06-07" "2016-06-06"
 "2016-06-06" "2016-06-06" "2016-06-01" "2016-06-01" "2016-05-29" "2016-05-29" "2016-05-27" "2016-05-27" "2016-05-26" "2016-05-26" "2016-05-25" "2016-05-25" "2016-05-24" "2016-05-23"
 "2016-05-23" "2016-05-20"

以及一些二进制标签列,显示在该日期是否使用该标签发帖,例如:

dtetags.df$Technology
 "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "1" "0" "0" "0" "0" "1" "1" "0" "1" "0" "1"
 "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "1" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "1" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0"
 "0" "0" "0" "0" "0" "0" "0" "0" "0" "0"

并且基于这个问题我试图使用CCD_ 1,但是它返回这个错误消息CCD_。我已经尝试了许多不同的方法来格式化ddply命令,但我无法使其工作。

理想的输出应该是:

               Date            Technology
1        2016-07-22                     0
2        2016-07-21                     0
3        2016-07-20                     0
4        2016-07-19                     0
5        2016-07-18                     0
6        2016-07-15                     0
7        2016-07-14                     0
8        2016-07-13                     0
9        2016-07-12                     0
10       2016-07-11                     0
11       2016-07-08                     0
12       2016-07-07                     0
13       2016-07-06                     1
14       2016-07-05                     0
15       2016-07-01                     2
16       2016-06-30                     1
17       2016-06-29                     1
18       2016-06-28                     0
19       2016-06-27                     0
20       2016-06-24                     1
21       2016-06-23                     0
22       2016-06-22                     0
23       2016-06-21                     0
24       2016-06-20                     0
25       2016-06-17                     0
26       2016-06-16                     0
27       2016-06-15                     0
28       2016-06-14                     1
29       2016-06-13                     0
30       2016-06-10                     0
31       2016-06-09                     0
32       2016-06-08                     0
33       2016-06-07                     0
34       2016-06-06                     0
35       2016-06-01                     0
36       2016-05-29                     0
37       2016-05-27                     0
38       2016-05-26                     0
39       2016-05-25                     0
40       2016-05-24                     0
41       2016-05-23                     0
42      2016-05-20                      0

有什么明显的我做错了吗?

从系数转换为数值

我删除了Date列,将data.frame(apply(dtetags.df, 2, function(x) as.numeric(as.character(x))))应用于数据帧的其余部分,并在中重新准备Date列

dput(dtetags.df)
structure(list(Date = c("2016-07-22", "2016-07-22", "2016-07-21", 
"2016-07-21", "2016-07-20", "2016-07-20", "2016-07-19", "2016-07-19", 
"2016-07-18", "2016-07-18", "2016-07-15", "2016-07-15", "2016-07-15", 
"2016-07-14", "2016-07-14", "2016-07-13", "2016-07-13", "2016-07-13", 
"2016-07-12", "2016-07-12", "2016-07-12", "2016-07-12", "2016-07-11", 
"2016-07-11", "2016-07-11", "2016-07-11", "2016-07-08", "2016-07-08", 
"2016-07-08", "2016-07-07", "2016-07-07", "2016-07-07", "2016-07-07", 
"2016-07-06", "2016-07-06", "2016-07-05", "2016-07-05", "2016-07-05", 
"2016-07-05", "2016-07-01", "2016-07-01", "2016-06-30", "2016-06-30", 
"2016-06-29", "2016-06-29", "2016-06-29", "2016-06-29", "2016-06-29", 
"2016-06-28", "2016-06-28", "2016-06-28", "2016-06-27", "2016-06-27", 
"2016-06-27", "2016-06-24", "2016-06-24", "2016-06-23", "2016-06-23", 
"2016-06-22", "2016-06-22", "2016-06-21", "2016-06-21", "2016-06-20", 
"2016-06-20", "2016-06-17", "2016-06-17", "2016-06-16", "2016-06-16", 
"2016-06-15", "2016-06-15", "2016-06-14", "2016-06-13", "2016-06-13", 
"2016-06-10", "2016-06-10", "2016-06-09", "2016-06-09", "2016-06-09", 
"2016-06-09", "2016-06-08", "2016-06-08", "2016-06-07", "2016-06-07", 
"2016-06-06", "2016-06-06", "2016-06-06", "2016-06-01", "2016-06-01", 
"2016-05-29", "2016-05-29", "2016-05-27", "2016-05-27", "2016-05-26", 
"2016-05-26", "2016-05-25", "2016-05-25", "2016-05-24", "2016-05-23", 
"2016-05-23", "2016-05-20"), `Technology` = c(0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 
1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), .Names = c("Date", 
"Technology"), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -100L))

要完成您想要的任务,您可以使用dplyr包:

library(dplyr)
out <- dtetags.df %>% group_by(Date) %>% summarise_each(funs(sum)) %>% arrange(desc(Date))

注:

  1. group_by Date,这意味着后续操作将在具有相同日期的行组上进行
  2. 使用sum函数汇总每列(Date除外)
  3. 使用arrange按日期降序对结果进行排序

给定输入数据,输出如预期:

print(out)
# A tibble: 42 x 2
     Date     Technology
    <chr>          <dbl>
1  2016-07-22          0
2  2016-07-21          0
3  2016-07-20          0
4  2016-07-19          0
5  2016-07-18          0
6  2016-07-15          0
7  2016-07-14          0
8  2016-07-13          0
9  2016-07-12          0
10 2016-07-11          0
11 2016-07-08          0
12 2016-07-07          0
13 2016-07-06          1
14 2016-07-05          0
15 2016-07-01          2
16 2016-06-30          1
17 2016-06-29          1
18 2016-06-28          0
19 2016-06-27          0
20 2016-06-24          1
21 2016-06-23          0
22 2016-06-22          0
23 2016-06-21          0
24 2016-06-20          0
25 2016-06-17          0
26 2016-06-16          0
27 2016-06-15          0
28 2016-06-14          1
29 2016-06-13          0
30 2016-06-10          0
31 2016-06-09          0
32 2016-06-08          0
33 2016-06-07          0
34 2016-06-06          0
35 2016-06-01          0
36 2016-05-29          0
37 2016-05-27          0
38 2016-05-26          0
39 2016-05-25          0
40 2016-05-24          0
41 2016-05-23          0
42 2016-05-20          0

注意:这需要ddply(dtetags.df,"Date",numcolwise(sum))1中除Date之外的所有行都是numeric。如果不是,则应在应用此代码之前对其进行转换。这可以使用这里找到的答案来完成

希望这能有所帮助。

相关内容

  • 没有找到相关文章

最新更新