R 为行创建一个有序的五分位数



>我有一个数据框,我想将跨行的列分类为属于第一、第二、第三、第四或第五个 qunitile(我知道有点令人困惑,但示例应该澄清(。我已经这样做了,但问题是,首先,并非所有因子水平都存在于每个变量中,其次,因子没有以最合乎逻辑的方式排序。以下是一些测试数据。

x.df<-structure(list(Location = structure(1:6, .Label = c("Site A", 
"Site B", "Site C", "Site D", "Site E", "Site F"), class = "factor"), 
Var1 = c(78L, 5L, 85L, 87L, 89L, 82L), Var2 = c(98L, 5L, 
67L, 92L, 3L, 44L), Var3 = c(30L, 54L, 56L, 3L, 31L, 58L), 
Var4 = c(63L, 96L, 14L, 95L, 90L, 99L), Var5 = c(71L, 52L, 
78L, 93L, 74L, 26L), Var6 = c(21L, 66L, 57L, 42L, 39L, 69L
), Var7 = c(97L, 42L, 84L, 46L, 86L, 46L), Var8 = c(100L, 
99L, 6L, 41L, 94L, 20L), Var9 = c(84L, 82L, 26L, 91L, 38L, 
80L), Var10 = c(8L, 50L, 23L, 92L, 46L, 1L)), .Names = c("Location",
"Var1", "Var2", "Var3", "Var4", "Var5", "Var6", "Var7", "Var8", 
"Var9", "Var10"), class = "data.frame", row.names = c(NA, -6L
))
cut_fn<-function(x){cut(x,quantile(x, c(0.0,0.2,0.4,0.6,0.8,1.0)),include.lowest=TRUE, c("lowest","low","middle","high","highest"))}
r.df<-data.frame(t(apply(x.df[,-1], 1, cut_fn)))

因此,每行各有两个"最高"、"高"、"..."、"最低"。

r.df
       X1      X2     X3      X4      X5     X6      X7      X8     X9    X10
1  middle highest    low     low  middle lowest    high highest   high lowest
2  lowest  lowest middle highest  middle   high     low highest   high    low
3 highest    high middle  lowest    high middle highest  lowest    low    low
4  middle    high lowest highest highest    low     low  lowest middle   high
5    high  lowest lowest highest  middle    low    high highest    low middle
6 highest     low middle highest     low   high  middle  lowest   high lowest
str(r.df)
'data.frame':   6 obs. of  10 variables:
 $ X1 : Factor w/ 4 levels "high","highest",..: 4 3 2 4 1 2
 $ X2 : Factor w/ 4 levels "high","highest",..: 2 4 1 1 4 3
 $ X3 : Factor w/ 3 levels "low","lowest",..: 1 3 3 2 2 3
 $ X4 : Factor w/ 3 levels "highest","low",..: 2 1 3 1 1 1
 $ X5 : Factor w/ 4 levels "high","highest",..: 4 4 1 2 4 3
 $ X6 : Factor w/ 4 levels "high","low","lowest",..: 3 1 4 2 2 1
 $ X7 : Factor w/ 4 levels "high","highest",..: 1 3 2 3 1 4
 $ X8 : Factor w/ 2 levels "highest","lowest": 1 1 2 2 1 2
 $ X9 : Factor w/ 3 levels "high","low","middle": 1 1 2 3 2 1
 $ X10: Factor w/ 4 levels "high","low","lowest",..: 3 2 2 1 4 3

理想情况下,我想要的是所有具有(有序(结构的变量:

 $ X1 : Factor w/ 5 levels "highest","high",..: 

如果我正确理解您的问题,您希望对每一列进行排序。这种最简单的方法是遍历所有列,使用带有 ordered=TRUE 选项的因子函数转换它们。
试试这个:

#first create r.df with stringsAsFactors as false
r.df<-data.frame(t(apply(x.df[,-1], 1, cut_fn)), stringsAsFactors = FALSE)
#now loop across all of the columns creating an order factor list
#lowest=1 while highest =5
for(x in names(r.df)) {
  r.df[[x]]<-factor(r.df[[x]], levels=c("lowest","low","middle","high","highest"), ordered=TRUE)}
}

现在,每列将按正确的顺序排列所有五个级别。

最新更新