我真的不知道如何在不使用 for 循环的情况下实现这一目标:
x <- c('a', 'b', 'c', 'd')
> x
[1] "a" "b" "c" "d"
data <- data.frame(
x=c('a', 'b', 'a', 'b', 'c', 'a', 'a', 'b', 'c', 'd'),
name=c('one','one', 'two','two','two', 'three', 'four','four','four','four'),
other=c(1, 4, 5, 3, 2, 4, 5, 6, 3, 2)
)
> data
x name other
1 a one 1
2 b one 4
3 a two 5
4 b two 3
5 c two 2
6 a three 4
7 a four 5
8 b four 6
9 c four 3
10 d four 2
我想按 name
的值拆分data
,并用x
merge
每个子组来填充"缺失的行",得到这样的结果:
> data
x name other
1 a one 1
2 b one 4
c one 0 <- missing row added
d one 0 <- missing row added
3 a two 5
4 b two 3
5 c two 2
d two 0 <- missing row added
6 a three 4
b three 0 <- missing row added
c three 0 <- missing row added
d three 0 <- missing row added
7 a four 5
8 b four 6
9 c four 3
10 d four 2
最后,像这样重新格式化data.frame
:
> data
x one two three four
1 a 1 5 4 5
2 b 4 3 0 6
3 c 0 2 0 3
4 d 0 0 0 2
我可以使用 for 循环来实现它,但我相信必须有一个更好的解决方案,包括 *apply
、by
、split
或类似的东西。有什么建议吗?
**更新**
我终于对接受的答案进行了一些修改(再次是tnx,伙计!),因为我不太喜欢使用levels
,而且我不在乎列的顺序:
grid <- expand.grid(x, unique(data$name))
colnames(grid) <- c("x", "name")
data <- merge(grid, data, all.x = TRUE)
data[is.na(data)] <- 0
dcast(data, x ~ name, value.var = 'other')
尝试xtabs
. 不需要任何包。
首先将name
的级别按顺序排列,以便对列进行排序:
data$name <- factor(data$name, levels = c("one", "two", "three", "four"))
tab <- xtabs(other ~., data)
给出这个c("xtabs", "table")
类输出:
> tab
name
x one two three four
a 1 5 4 5
b 4 3 0 6
c 0 2 0 3
d 0 0 0 2
或者,如果需要具有类"data.frame"
输出,请使用as.data.frame.matrix(tab)
。
更直接:
您真正需要的是reshape2::dcast
:
# clean up factor levels for prettier results
data$name <- factor(data$name, levels = c('one', 'two', 'three', 'four'))
library(reshape2)
dcast(data, x ~ name, value.var = 'other', fill = 0)
# x one two three four
# 1 a 1 5 4 5
# 2 b 4 3 0 6
# 3 c 0 2 0 3
# 4 d 0 0 0 2
如问:
要按照您布置的步骤进行操作,请先使用 expand.grid
获取组合,然后使用 all = TRUE
merge
,然后使用 reshape2::dcast
重新排列:
df <- merge(data, expand.grid(x, levels(data$name)),
by.x = c('x', 'name'), by.y = c('Var1', 'Var2'), all = TRUE)
df[is.na(df)] <- 0 # replace `NA`s with 0
df$name <- factor(df$name, levels = c('one', 'two', 'three', 'four')) # fix order of levels
library(reshape2)
dcast(df, x ~ name, value.var = 'other')
# x one two three four
# 1 a 1 5 4 5
# 2 b 4 3 0 6
# 3 c 0 2 0 3
# 4 d 0 0 0 2
要回答您的第一部分,您可以使用 expand.grid
.此处应用的逻辑是:
您的数据:
x=c('a', 'b', 'a', 'b', 'c', 'a', 'a', 'b', 'c', 'd')
name=c('one','one', 'two','two','two', 'three', 'four','four','four','four')
other=c(1, 4, 5, 3, 2, 4, 5, 6, 3, 2)
将此设为数据帧:
ee<-data.frame(x,name,other)
现在使用 expand.grid 展开所有组合并将其应用于 x 和 name:
dd<-expand.grid(unique(x), unique(name))
这看起来像:
Var1 Var2
1 a one
2 b one
3 c one
4 d one
5 a two
6 b two
7 c two
8 d two
9 a three
10 b three
11 c three
12 d three
13 a four
14 b four
15 c four
16 d four
您的所有组合都已创建:现在使用 SQLDF 或任何合并包:
ff<-sqldf("select Var1, Var2, ifnull(c.other,0) from dd left join ee c on x=Var1 and name=Var2")
因此,您的输出是:
Var1 Var2 other
1 a one 1
2 b one 4
3 c one 0
4 d one 0
5 a two 5
6 b two 3
7 c two 2
8 d two 0
9 a three 4
10 b three 0
11 c three 0
12 d three 0
13 a four 5
14 b four 6
15 c four 3
16 d four 2
>