r-如何循环浏览所有列并与特定列进行比较,并绘制频率读数



我有一个数据帧,看起来像这样:

 y<-c("A1","B1", "C2", "A1", "B1","C1", "A1","B2", "C3", "A1", "B1", "C4", "A1", "B1","C4", "A1","B2", "C4", "A1","B1", "C4", "A1", "B1", "C4")
 test<- data.frame(matrix(y, nrow = 3, ncol = 8))
 colnames(test) <- c("Learn_1", "Car_1", "Car_2", "Fan_1", "Fan_2", "Fan_3","Kart_1", "God_1")
 test
 Learn_1 Car_1 Car_2 Fan_1 Fan_2 Fan_3 Kart_1 God_1
 1      A1    A1    A1    A1    A1    A1     A1    A1
 2      B1    B1    B2    B1    B1    B2     B1    B1
 3      C2    C1    C3    C4    C4    C4     C4    C4

我的实际数据有13列,长度不等,有数千行,值是混合的。我想确定God_ 1中的每个值到所有其他列的频率,但对于具有相同单词的每一列(意味着列来自同一研究)(即Fan和Car列,如果该值在这些列中出现多次,则将该值的频率计为1。然后,我想绘制显示为5,4,3,2,1的值的百分比与GOD_1中可用值的总百分比(100%)的关系图。我在想一个框,它显示了值的总数,然后是区分频率值的不同百分比阴影(1,2,3,4,5)。我的情节应该最小值为1,最大值为5(有5个唯一的专栏词)。

我的问题是,我不知道如何开始,但在过去几天里我一直在思考这个问题。有人有想法吗?

这些频率显示多少次取决于我想要的:

A1 = 5
B1 = 5
C4 = 3

这是我的例子的str,我的真实数据看起来是这样的,但有2366个obs.在13个变量中,各种因子w/一些级别(范围从200:3000)

str(test)
'data.frame':   3 obs. of  8 variables:
 $ Learn_1: Factor w/ 3 levels "A1","B1","C2": 1 2 3
 $ Car_1  : Factor w/ 3 levels "A1","B1","C1": 1 2 3
 $ Car_2  : Factor w/ 3 levels "A1","B2","C3": 1 2 3
 $ Fan_1  : Factor w/ 3 levels "A1","B1","C4": 1 2 3
 $ Fan_2  : Factor w/ 3 levels "A1","B1","C4": 1 2 3
 $ Fan_3  : Factor w/ 3 levels "A1","B2","C4": 1 2 3
 $ Kart_1 : Factor w/ 3 levels "A1","B1","C4": 1 2 3
 $ God_1  : Factor w/ 3 levels "A1","B1","C4": 1 2 3

我们可以使用dplyrtidyr

首先,数据被gather转换为宽格式,然后我们从标签中separate数字部分,使用distinct删除重复项,计算所有出现的次数,并使用left_join只查看God_1列中的数据。

library(dplyr)
library(tidyr)
dat %>% gather(key, val) %>%
        separate(key, c("id", "num")) %>% 
        distinct(id, val) %>%
        count(val) %>%
        left_join(dat["God_1"], ., by = c(God_1 = "val"))

Source: local data frame [3 x 2]
   God_1   out
  (fctr) (dbl)
1     A1     5
2     B1     5
3     C4     3

最新更新