在列之间操作并按组 R 对值进行分类

  • 本文关键字:分类 之间 操作 r
  • 更新时间 :
  • 英文 :


我尝试获取有关一个变量的百分比分组值。

为此,我使用sapply来获取每列相对于另一列的百分比,但我不知道如何按类型(另一个变量(对这些值进行分组

x <- data.frame("A" = c(0,0,1,1,1,1,1), "B" = c(0,1,0,1,0,1,1), "C" = c(1,0,1,1,0,0,1),
"type" = c("x","x","x","y","y","y","x"), "yes" = c(0,0,1,1,0,1,1))
x
A B C type yes
1 0 0 1    x   0
2 0 1 0    x   0
3 1 0 1    x   1
4 1 1 1    y   1
5 1 0 0    y   0
6 1 1 0    y   1
7 1 1 1    x   1

我需要注意下一个值(百分比(:A==1&yes==1/A==1,为此我使用下一个代码:

result <- as.data.frame(sapply(x[,1:3],
function(i) (sum(i & x$yes)/sum(i))*100))
result
sapply(x[, 1:3], function(i) (sum(i & x$yes)/sum(i)) * 100)
A                                                          80
B                                                          75
C                                                          75

现在我需要获得相同的数学运算,但要考虑到可变的"类型"。这意味着,获得相同的百分比,但按类型区分它。所以,我期望的表格是:

type   sapply(x[, 1:3], function(i) (sum(i & x$yes)/sum(i)) * 100)
A  x      40             
A  y      40                                                  
B  x      25
B  y      50                                                  
C  x      50
C  y      25                                                  

在示例中,可以观察到,通过字母,百分比总和与在第一个结果中获得的值相同,只是此处按类型区分。 多谢。

您可以使用 data.table 执行以下操作:

法典

setDT(df)
cols = c('A', 'B', 'C')
mat = df[yes == 1, lapply(.SD, function(x){
100 * sum(x)/df[, lapply(.SD, sum), .SDcols = cols][[substitute(x)]]
# Here, the numerator is sum(x | yes == 1) for x == columns A, B, C
# If we look at the denominator, it equals sum(x) for x == columns A, B, C
# The reason why we need to apply substitute(x) is because df[, lapply(.SD, sum)]
# generates a list of column sums, i.e. list(A = sum(A), B = sum(B), ...). 
# Hence, for each x in the column names we must subset the list above using [[substitute(x)]]
# Ultimately, the operation equals sum(x | yes == 1)/sum(x) for A, B, C.
}), .(type), .SDcols = cols] 
# '.(type)' simply means that we apply this for each type group, 
# i.e. once for x and once for y, for each ABC column. 
# The dot is just shorthand for 'list()'.
# .SDcols assigns the subset that I want to apply my lapply statement onto.

结果

> mat
type  A  B  C
1:    x 40 25 50
2:    y 40 50 25

长格式(您的示例(

> melt(mat)
type variable value
1:    x        A    40
2:    y        A    40
3:    x        B    25
4:    y        B    50
5:    x        C    50
6:    y        C    25

数据

df <- data.frame("A" = c(0,0,1,1,1,1,1), "B" = c(0,1,0,1,0,1,1), "C" = c(1,0,1,1,0,0,1),
"type" = c("x","x","x","y","y","y","x"), "yes" = c(0,0,1,1,0,1,1))

最新更新