R 降价列联表 %>% 使用选择性值将列变量制成表格



我对R很陌生,来自Stata。下面是带有可复制数据示例的r标记区块。这些数据代表了我正在处理的数据。但只有在数量上有更多的二进制(逻辑(和因子变量。

图书馆和数据:

# Setup and load package:
library(dplyr)
library(expss)
library(hablar)
library(kableExtra)
library(summarytools)
# Load data:
data("mtcars")
raw_df <- select(mtcars,c(wt,cyl,gear,vs,am))
# Data prep and labelling:
df <- raw_df %>%
apply_labels(wt = "Facility ID",
cyl = "Geographical Area",
cyl = c("Area A" = 4,"Area B" = 6, "Area C" = 8),
gear = "Tier",
gear = c("Tier 1" = 3, "Tier 2" = 4, "Tier 3" = 5),
vs = "E.coli",
am = "V.choleri") %>%
convert(chr(wt),
fct(cyl,gear),
lgl(vs,am))

请注意,在我的实际数据中,有更多的分类和逻辑变量。我已经设法在r markdown(html输出(中制作了下表:


df %>%
tab_cells(cyl, gear) %>%
tab_total_row_position("below") %>%
tab_total_statistic("u_rpct")%>%
tab_total_label("Total hosts (Row proportions)") %>% 
tab_cols(vs, am) %>% 
tab_stat_rpct() %>% 
tab_cols(total(label = "Number of hosts")) %>%  
tab_stat_cases() %>%
tab_pivot(stat_position = "outside_columns") %>%
recode(as.criterion(is.numeric) & is.na ~ 0, TRUE ~ copy) %>% 
split_table_to_df() %>% 
kable(align = "c", digits = 1) %>% 
kable_styling(bootstrap_options = c("striped", "condensed", "responsive"),
full_width = F, position = "center") %>% 
row_spec(1:2, bold = TRUE)

问题:1.我希望我可以只包括"TRUE"列,从表中删除"FALSE"列。但保持第一行标签完整("大肠杆菌","霍乱弧菌"(。事实上,我不需要第二行("TRUE","FALSE"(2.我已经标记了"总行比例"(#Total hosts(,但无法删除前导的"#"号。在带有"总行比率"的行的最右侧列单元格中,它显示"100"。我尝试将其作为列单元格的总和,但失败了。"100"完全是误导。3.我还试图通过"summarytools"包的"可检测"功能来获得我想要的表。由于它具有良好的结构,在比例细胞内也诱导了许多观察结果

print(ctable(df$cyl,df$am), method = 'render')

但问题是,它似乎只允许一对分类变量。此外,"FALSE"也不能省略。但最后一列的行总数(观测值(非常完美

详细信息:R:4.0.0R工作室:1.2.5042这些包裹都是最新的。

expss中的表是常见的数据帧。列标签只是用"|"符号分隔行的列名。因此,您可以像通常的列名一样操作它们。行标签位于列row_labels中,我们可以通过搜索和替换操作删除"#"号。"总行比例"显示为"100",因为在开始时,您将总统计信息指定为行百分比,而单列的行百分比为100。考虑到以上所有因素:

library(dplyr)
library(expss)
library(hablar)
library(kableExtra)
library(summarytools)
# Load data:
data("mtcars")
raw_df <- select(mtcars,c(wt,cyl,gear,vs,am))
# Data prep and labelling:
df <- raw_df %>%
apply_labels(wt = "Facility ID",
cyl = "Geographical Area",
cyl = c("Area A" = 4,"Area B" = 6, "Area C" = 8),
gear = "Tier",
gear = c("Tier 1" = 3, "Tier 2" = 4, "Tier 3" = 5),
vs = "E.coli",
am = "V.choleri") %>%
convert(chr(wt),
fct(cyl,gear),
lgl(vs,am))

tbl = df %>%
tab_cells(cyl, gear) %>%
tab_total_row_position("below") %>%
tab_total_statistic("u_rpct")%>%
tab_total_label("Total hosts (Row proportions)") %>% 
tab_cols(vs, am) %>% 
tab_stat_rpct() %>% 
tab_cols(total(label = "Number of hosts")) %>%  
# specify total statistic for last column
tab_stat_cases(total_statistic = "u_cases") %>%
tab_pivot(stat_position = "outside_columns") %>%
recode(as.criterion(is.numeric) & is.na ~ 0, TRUE ~ copy) %>% 
# remove columns with FALSE
except(contains("FALSE")) %>% 
compute(
# remove '#' sign from row labels
row_labels = gsub("#", "", row_labels)
)
# remove '#' sign from column labels
colnames(tbl) = gsub("\|TRUE", "", colnames(tbl))
tbl %>% 
split_table_to_df() %>% 
kable(align = "c", digits = 1) %>% 
kable_styling(bootstrap_options = c("striped", "condensed", "responsive"),
full_width = F, position = "center") %>% 
row_spec(1:2, bold = TRUE)

最新更新