r语言 - 如何应用chisq.测试数据框架列表中每个数据框架的独立性?



我想应用这个凿子。对列表的每个df依次进行独立性测试(chisq.test(x)),但我总是得到一个错误消息,告诉我'x'的所有条目必须是非负的和有限的',而它们实际上都是负的和有限的。我认为这是因为它把我所有的数据都当作字符,或者因为我的每一列都是由字符组成的,但我真的不知道如何解决它…(我对R相当陌生)。

# Creating the list of dataframes
install.packages("datapasta")
my_data <- tibble::tribble(
~UC, ~Fr_term, ~Fr_other_terms,
"reference scenario",                  0L,                    2168L,
"reference scenario",                224L,                    5158L,
"reference scenario",                 19L,                    2247L,
"capacity building",                 65L,                    2168L,
"capacity building",                 52L,                    5158L,
"capacity building",                  0L,                    2247L,
"evolution scenario",                184L,                    2168L,
"evolution scenario",                273L,                    5158L,
"evolution scenario",                  0L,                    2247L,
"carbon market",                 37L,                    2168L,
"carbon market",                  0L,                    5158L,
"carbon market",                 17L,                    2247L
)

my_data <- split(my_data, my_data$UC) # I split the dataframe into a list of several df 
lapply(my_data, chisq.test) # try to apply chi2 of independence...
#> Error in FUN(X[[i]], ...): all entries of 'x' must be nonnegative and finite

精度:我的数据已经是一个列联表的形式,因此我在应用chisq.test()函数时没有指定变量

指定chisq.test中的变量

result <- lapply(my_data, function(x) chisq.test(x$Fr_term, x$Fr_other_terms))

解决方案1 -在data.frame

中保存统计信息
my_data %>% 
nest(-UC) %>%
group_by(UC) %>%
mutate(
test = map(.x = data,~broom::tidy(chisq.test(.x$Fr_term,.x$Fr_other_terms)))
) %>% 
unnest(test)

输出
# A tibble: 4 x 6
# Groups:   UC [4]
UC                 data             statistic p.value parameter method                    
<chr>              <list>               <dbl>   <dbl>     <int> <chr>                     
1 reference scenario <tibble [3 x 2]>         6   0.199         4 Pearson's Chi-squared test
2 capacity building  <tibble [3 x 2]>         6   0.199         4 Pearson's Chi-squared test
3 evolution scenario <tibble [3 x 2]>         6   0.199         4 Pearson's Chi-squared test
4 carbon market      <tibble [3 x 2]>         6   0.199         4 Pearson's Chi-squared test

解决方案2 -打印每个UC的测试

my_data %>% 
group_split(UC) %>% 
map(.f = function(x) chisq.test(x$Fr_term, x$Fr_other_terms))

输出
[[1]]
Pearson's Chi-squared test
data:  x$Fr_term and x$Fr_other_terms
X-squared = 6, df = 4, p-value = 0.1991

[[2]]
Pearson's Chi-squared test
data:  x$Fr_term and x$Fr_other_terms
X-squared = 6, df = 4, p-value = 0.1991

[[3]]
Pearson's Chi-squared test
data:  x$Fr_term and x$Fr_other_terms
X-squared = 6, df = 4, p-value = 0.1991

[[4]]
Pearson's Chi-squared test
data:  x$Fr_term and x$Fr_other_terms
X-squared = 6, df = 4, p-value = 0.1991

将整个data.frame作为列联表传递给chisq.test()的另一种方法

library(dplyr)
lapply(unique(my_data$UC), function(x) {
my_data %>% 
filter(UC == x) %>% 
select(-UC) %>% 
chisq.test(.)
})
#> [[1]]
#> 
#>  Pearson's Chi-squared test
#> 
#> data:  .
#> X-squared = 143.59, df = 2, p-value < 2.2e-16
#> 
#> 
#> [[2]]
#> 
#>  Pearson's Chi-squared test
#> 
#> data:  .
#> X-squared = 83.697, df = 2, p-value < 2.2e-16
#> 
#> 
#> [[3]]
#> 
#>  Pearson's Chi-squared test
#> 
#> data:  .
#> X-squared = 167.75, df = 2, p-value < 2.2e-16
#> 
#> 
#> [[4]]
#> 
#>  Pearson's Chi-squared test
#> 
#> data:  .
#> X-squared = 79.891, df = 2, p-value < 2.2e-16

使用tidy()rbindlist合并列表的更整洁的解决方案

lapply(unique(my_data$UC), function(x) {
my_data %>% 
filter(UC == x) %>% 
select(-UC) %>% 
chisq.test(.) %>% 
broom::tidy() %>% 
mutate(UC = x, .before = 1)
}) %>%
data.table::rbindlist()
#>                    UC statistic      p.value parameter
#> 1: reference scenario 143.59031 6.603277e-32         2
#> 2:  capacity building  83.69708 6.689745e-19         2
#> 3: evolution scenario 167.75043 3.745051e-37         2
#> 4:      carbon market  79.89116 4.485962e-18         2
#>                        method
#> 1: Pearson's Chi-squared test
#> 2: Pearson's Chi-squared test
#> 3: Pearson's Chi-squared test
#> 4: Pearson's Chi-squared test

如果数据已经是列联表的形式,我应用chisq.test而不指定变量,如下所示:

library(dplyr)
lapply(unique(my_data$UC), function(x) {
my_data %>% 
filter(UC == x) %>% 
select(-UC) %>% 
chisq.test(.)
})

或者如下如果我想在df:

中包含我的结果
lapply(unique(my_data$UC), function(x) {
my_data %>% 
filter(UC == x) %>% 
select(-UC) %>% 
chisq.test(.) %>% 
broom::tidy() %>% 
mutate(UC = x, .before = 1)
}) %>%
data.table::rbindlist()

如果数据不是列联表的形式,我需要确保在将chisq.test应用于列表时指定变量,即chisq.test(x$Fr_term, x$Fr_other_terms).

最新更新