我想应用这个凿子。对列表的每个df依次进行独立性测试(chisq.test(x)),但我总是得到一个错误消息,告诉我'x'的所有条目必须是非负的和有限的',而它们实际上都是负的和有限的。我认为这是因为它把我所有的数据都当作字符,或者因为我的每一列都是由字符组成的,但我真的不知道如何解决它…(我对R相当陌生)。
# Creating the list of dataframes
install.packages("datapasta")
my_data <- tibble::tribble(
~UC, ~Fr_term, ~Fr_other_terms,
"reference scenario", 0L, 2168L,
"reference scenario", 224L, 5158L,
"reference scenario", 19L, 2247L,
"capacity building", 65L, 2168L,
"capacity building", 52L, 5158L,
"capacity building", 0L, 2247L,
"evolution scenario", 184L, 2168L,
"evolution scenario", 273L, 5158L,
"evolution scenario", 0L, 2247L,
"carbon market", 37L, 2168L,
"carbon market", 0L, 5158L,
"carbon market", 17L, 2247L
)
my_data <- split(my_data, my_data$UC) # I split the dataframe into a list of several df
lapply(my_data, chisq.test) # try to apply chi2 of independence...
#> Error in FUN(X[[i]], ...): all entries of 'x' must be nonnegative and finite
精度:我的数据已经是一个列联表的形式,因此我在应用chisq.test()函数时没有指定变量
指定chisq.test
中的变量
result <- lapply(my_data, function(x) chisq.test(x$Fr_term, x$Fr_other_terms))
解决方案1 -在data.frame
中保存统计信息my_data %>%
nest(-UC) %>%
group_by(UC) %>%
mutate(
test = map(.x = data,~broom::tidy(chisq.test(.x$Fr_term,.x$Fr_other_terms)))
) %>%
unnest(test)
输出# A tibble: 4 x 6
# Groups: UC [4]
UC data statistic p.value parameter method
<chr> <list> <dbl> <dbl> <int> <chr>
1 reference scenario <tibble [3 x 2]> 6 0.199 4 Pearson's Chi-squared test
2 capacity building <tibble [3 x 2]> 6 0.199 4 Pearson's Chi-squared test
3 evolution scenario <tibble [3 x 2]> 6 0.199 4 Pearson's Chi-squared test
4 carbon market <tibble [3 x 2]> 6 0.199 4 Pearson's Chi-squared test
解决方案2 -打印每个UC的测试
my_data %>%
group_split(UC) %>%
map(.f = function(x) chisq.test(x$Fr_term, x$Fr_other_terms))
输出[[1]]
Pearson's Chi-squared test
data: x$Fr_term and x$Fr_other_terms
X-squared = 6, df = 4, p-value = 0.1991
[[2]]
Pearson's Chi-squared test
data: x$Fr_term and x$Fr_other_terms
X-squared = 6, df = 4, p-value = 0.1991
[[3]]
Pearson's Chi-squared test
data: x$Fr_term and x$Fr_other_terms
X-squared = 6, df = 4, p-value = 0.1991
[[4]]
Pearson's Chi-squared test
data: x$Fr_term and x$Fr_other_terms
X-squared = 6, df = 4, p-value = 0.1991
将整个data.frame
作为列联表传递给chisq.test()
的另一种方法
library(dplyr)
lapply(unique(my_data$UC), function(x) {
my_data %>%
filter(UC == x) %>%
select(-UC) %>%
chisq.test(.)
})
#> [[1]]
#>
#> Pearson's Chi-squared test
#>
#> data: .
#> X-squared = 143.59, df = 2, p-value < 2.2e-16
#>
#>
#> [[2]]
#>
#> Pearson's Chi-squared test
#>
#> data: .
#> X-squared = 83.697, df = 2, p-value < 2.2e-16
#>
#>
#> [[3]]
#>
#> Pearson's Chi-squared test
#>
#> data: .
#> X-squared = 167.75, df = 2, p-value < 2.2e-16
#>
#>
#> [[4]]
#>
#> Pearson's Chi-squared test
#>
#> data: .
#> X-squared = 79.891, df = 2, p-value < 2.2e-16
使用tidy()
和rbindlist
合并列表的更整洁的解决方案
lapply(unique(my_data$UC), function(x) {
my_data %>%
filter(UC == x) %>%
select(-UC) %>%
chisq.test(.) %>%
broom::tidy() %>%
mutate(UC = x, .before = 1)
}) %>%
data.table::rbindlist()
#> UC statistic p.value parameter
#> 1: reference scenario 143.59031 6.603277e-32 2
#> 2: capacity building 83.69708 6.689745e-19 2
#> 3: evolution scenario 167.75043 3.745051e-37 2
#> 4: carbon market 79.89116 4.485962e-18 2
#> method
#> 1: Pearson's Chi-squared test
#> 2: Pearson's Chi-squared test
#> 3: Pearson's Chi-squared test
#> 4: Pearson's Chi-squared test
如果数据已经是列联表的形式,我应用chisq.test
而不指定变量,如下所示:
library(dplyr)
lapply(unique(my_data$UC), function(x) {
my_data %>%
filter(UC == x) %>%
select(-UC) %>%
chisq.test(.)
})
或者如下如果我想在df:
中包含我的结果lapply(unique(my_data$UC), function(x) {
my_data %>%
filter(UC == x) %>%
select(-UC) %>%
chisq.test(.) %>%
broom::tidy() %>%
mutate(UC = x, .before = 1)
}) %>%
data.table::rbindlist()
如果数据不是列联表的形式,我需要确保在将chisq.test
应用于列表时指定变量,即。chisq.test(x$Fr_term, x$Fr_other_terms)
.