R中的多重脉冲(求解错误.默认值(xtx+diag(pen)):系统在计算上是奇异的:倒数条件数=)



我想分析有关新冠肺炎的数据。我已经完成了部分数据清理,最终得到了这个数据集(160260行34列(。我已经将变量大陆、地点、测试单位转换为因子。我想检查缺失值,所以我计算了缺失值的百分比,结果是:

> (colMeans(is.na(dataset1)))*100
continent                location                    date             total_cases 
0.0000000               0.0000000               0.0000000               1.9699239 
new_cases            total_deaths              new_deaths       reproduction_rate 
2.0366904               8.0094846               8.1130663              14.0078622 
icu_patients           hosp_patients   weekly_icu_admissions  weekly_hosp_admissions 
84.7747410              83.7021091              96.2386123              92.5851741 
total_tests               new_tests           positive_rate          tests_per_case 
54.4465244              56.6966180              43.9292400              44.7154624 
tests_units people_fully_vaccinated        new_vaccinations        stringency_index 
38.0974666              73.6390865              76.2298765              15.7138400 
population      population_density              median_age           aged_70_older 
0.0000000               4.3073755              10.5291401              11.0077374 
gdp_per_capita         extreme_poverty   cardiovasc_death_rate     diabetes_prevalence 
11.9381006              42.0897292              11.0077374               6.7003619 
female_smokers            male_smokers  handwashing_facilities         life_expectancy 
32.9963809              33.9535754              55.9690503               0.4785973 
human_development_index        excess_mortality
13.3738924                    96.1225509 

我不想分析一个缺少值的数据集,因此我搜索了很多,以找到填充这些NA的方法。我发现我可以使用鼠标功能来填充这些NA。我的目标是:

  1. 以一种不将可变日期用作预测值的方式使用鼠标功能
  2. 不要在变量中估算值:大陆、地点、日期、人口,因为他们没有NA
  3. 在变量中估算值:total_case、new_case、total_death、new_death、reproduction_rate、icu_patients、hosp_patient、weekly_icu_admissions、weekly-hosts_admissions,total_tests,new_tests、positive_rate、tests_per_case、people_fully_vaccinated、new_vaccinations、strincy_index、population_density、median_age、aged_70_older、gdp_per_capita、extreme_police、cardiovasc_death_rate,糖尿病患病率、女性吸烟者、男性吸烟者、洗手能力、寿命预期、人类发展指数、卓越能力,因为这些变量都是数字
  4. 使用polyreg(Polytomous逻辑回归(方法估算变量tests_units中的值,因为该变量是一个具有4个水平的因子

我遵循了这个链接的每一步,并运行了以下代码:

library(mice)
init = mice(dataset1,maxit = 0)
meth = init$method
predM = init$predictorMatrix
predM[, c("date")] = 0 #goal number 1
meth[c("continent","location","date","population")] = "" #goal number 2
meth[c("total_cases","new_cases","total_deaths","new_deaths","reproduction_rate",
"icu_patients","hosp_patients","weekly_icu_admissions",
"weekly_hosp_admissions","total_tests","new_tests","positive_rate",
"tests_per_case","people_fully_vaccinated",
"new_vaccinations","stringency_index","population_density","median_age",
"aged_70_older","gdp_per_capita","extreme_poverty",
"cardiovasc_death_rate","diabetes_prevalence","female_smokers",
"male_smokers","handwashing_facilities","life_expectancy",
"human_development_index","excess_mortality")]="pmm" #goal number 3
meth[c("tests_units")] = "polyreg" #goal number 4
set.seed(103)
imputed = mice(dataset1, method=meth, predictorMatrix=predM, m=5)

我得到的结果是

> library(mice)
> init = mice(dataset1,maxit = 0)
Warning message:
Number of logged events: 1 
> meth = init$method
> predM = init$predictorMatrix
> predM[, c("date")] = 0
> meth[c("continent","location","date","population")] = ""
> meth[c("total_cases","new_cases","total_deaths","new_deaths","reproduction_rate",
+        "icu_patients","hosp_patients","weekly_icu_admissions",
+        "weekly_hosp_admissions","total_tests","new_tests","positive_rate",
+        "tests_per_case","people_fully_vaccinated",
+        "new_vaccinations","stringency_index","population_density","median_age",
+        "aged_70_older","gdp_per_capita","extreme_poverty",
+        "cardiovasc_death_rate","diabetes_prevalence","female_smokers",
+        "male_smokers","handwashing_facilities","life_expectancy",
+        "human_development_index","excess_mortality")]="pmm"
> meth[c("tests_units")] = "polyreg"
> 
> set.seed(103)
> imputed = mice(dataset1, method=meth, predictorMatrix=predM, m=5)
iter imp variable
1   1  total_casesError in solve.default(xtx + diag(pen)) : 
system is computationally singular: reciprocal condition number = 2.80783e-24

这不是很愉快。我应该更改什么或运行哪些代码?

提前感谢!

您检查了记录的事件吗?

view(init$loggedEvents)

也许是因为你使用的插补方法("polyreg"(。您是否尝试过使用像pmm这样更健壮的方法?

最新更新