r语言 - 在列联表中包括所有因素组合以创建方形概率表/矩阵



我试图从一个偶然性/频率表创建一个9 x 9的概率矩阵。

包含一对值(x1,x2)过渡到一对值(y1,y2)的频率。x1y1的值分别为ABC, x2y2的值分别为DEF

不存在所有xy对之间的转换。然而,我希望这些"缺失"的过渡以零的形式出现在表/矩阵中,使其成为方形(9x9),以便在其他分析中使用。

df <- structure(list(x1 = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 
                    3L, 1L, 2L, 3L), .Label = c("A", "B", "C"), class = "factor"), 
                    y1 = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 
                    2L, 3L), .Label = c("A", "B", "C"), class = "factor"), 
                    x2 = structure(c(1L,2L, 3L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 1L), 
                    .Label = c("D", "E", "F"), class = "factor"), 
                    y2 = structure(c(1L, 2L, 3L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 3L), 
                    .Label = c("D", "E", "F"), class = "factor"), 
                    x = c("AD", "BE", "CF", "AD", "BD", "CD", "AE", "BE", "CE", "AE", "BF", "CD"), 
                    y = c("AD", "BE", "CF", "AE", "BD", "CD", "AD", "BD", "CD", "AE", "BE", "CF")),
                    .Names = c("x1", "y1", "x2", "y2", "x", "y"), row.names = c(NA, -12L), class = "data.frame")
# df$x <- paste0(df$x1, df$x2) # included in the dput
# df$y <- paste0(df$y1,df$y2)
# convert to factor to include all transitions http://stackoverflow.com/a/13705236/1670053
df$x <- factor(df$x, levels = c("AD", "AE", "AF", "BD", "BE", "BF", "CD", "CE", "CF"))
df$y <- factor(df$y,levels = c("AD", "AE", "AF", "BD", "BE", "BF", "CD", "CE", "CF") )
t1 <- with(df,(table(x,y)))
# t1m <- as.data.frame.matrix(t1)
t2 <- t1/(colSums(t1))
dfm <- as.data.frame.matrix(t2)
#dm <- as.matrix(dfm)
xy上不使用factor的结果DFM具有正确的值,但当然确实包括完整的9x9过渡集。期望的结果DFMd如下。

然而,当我包括factorxy时,产生的结果不是期望的,NAInf的值被引入。

是否有方法使用"缺失"因素来评估table/colSums(table)并获得所需的结果?

DFMd <- structure(list(AD = c(0.5, 0.5, 0, 0, 0, 0, 0, 0, 0), AE = c(0.5, 
0.5, 0, 0, 0, 0, 0, 0, 0), AF = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L), BD = c(0, 0, 0, 0.5, 0.5, 0, 0, 0, 0), BE = c(0, 0, 
0, 0, 0.5, 0.5, 0, 0, 0), BF = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L), CD = c(0, 0, 0, 0, 0, 0, 0.5, 0.5, 0), CE = c(0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L), CF = c(0, 0, 0, 0, 0, 0, 0.5, 0, 
0.5)), .Names = c("AD", "AE", "AF", "BD", "BE", "BF", "CD", "CE", 
"CF"), class = "data.frame", row.names = c("AD", "AE", "AF", 
"BD", "BE", "BF", "CD", "CE", "CF"))

我仍然不确定为什么上面的代码会产生一些inf值或错误的值,但是下面的代码会产生所需的输出。看起来确实有点复杂。

t1 <- with(df,(table(x,y))) # contingency table
tcc <- as.matrix(colSums(t1)) # get col sums
tc <-as.data.frame.matrix(tcc) # store as data.frame to using the rep code below
tct <- t(tc) # transpose to build matrix of colsums
tcx <- tct[rep(seq_len(nrow(tct)), each=9),] # http://stackovernflow.com/a/11121463/1670053 build colsums dataframe to be 9x9
pmat <- t1/tcx # transition matrix
pmat[is.na(pmat)] <- 0 #remove na from 0/0

最新更新