插补数据帧列表:使用调查包计算分位数或在 R 中计算 MIcomb?



我正在使用R并遵循Anthony Damico(@AnthonyDamico(的代码(http://asdfree.com/survey-of-consumer-finances-scf.html(来计算消费者金融调查的分位数。代码如下:

## Load necessary libraries (Note: should be installed in the OS)
library(mitools)    # allows analysis of multiply-imputed survey data
library(survey)     # load survey package (analyzes complex design surveys)
library(downloader) # downloads and then runs the source() function on scripts from github
library(foreign)    # load foreign package (converts data files into R)
library(Hmisc)      # load Hmisc package (loads a simple wtd.quantile function)
library(convey)
library(lodown)
library(devtools)
library(srvyr)
## Read SCF data: wave 2016, path.expand("~") is default working directory  
scf_imp <- readRDS("scf 2016.rds" )
scf_rw <- readRDS("scf 2016 rw.rds" )
# define which variables from the five imputed iterations to keep
vars.to.keep <- c( 'y1' , 'yy1' , 'wgt' , 'one' , 'houses', 'oresre', 'mrthel', 'liq', 'income', 'age' )
# restrict each `scf_imp#` data frame to only those variables
scf_imp[[1]] <- scf_imp[[1]][ , vars.to.keep ]
scf_imp[[2]] <- scf_imp[[2]][ , vars.to.keep ]
scf_imp[[3]] <- scf_imp[[3]][ , vars.to.keep ]
scf_imp[[4]] <- scf_imp[[4]][ , vars.to.keep ]
scf_imp[[5]] <- scf_imp[[5]][ , vars.to.keep ]
# clear up RAM
gc()
# turn off scientific notation in most output
options( scipen = 20 )
scf_design <- 
svrepdesign( 
weights = ~wgt , 
repweights = scf_rw[ , -1 ] , 
data = imputationList( scf_imp ) , 
scale = 1 ,
rscales = rep( 1 / 998 , 999 ) ,
mse = TRUE ,
type = "other" ,
combined.weights = TRUE
)
scf_design <-
update(
scf_design,
wealth = liq + houses + oresre - mrthel,
hwealth = houses + oresre,
htoinc  = (houses + oresre)/income,
ltoinc = mrthel / income,
ltv = mrthel + hwealth,
wtoinc = wealth + income
)
### gini coefficient (whole sample) ###
scf_design$designs <- lapply( scf_design$designs , convey_prep )
giniindex <- scf_MIcombine( with( scf_design , svygini( ~ wealth) ) )

在这段代码之后,我尝试了几种方法,包括 1(:

q1.wealth <- scf_MIcombine(with(scf_design,svyquantile(~wealth, 0.01, 
method='constant',interval.type='quantile',se=TRUE)))
q5.wealth <- scf_MIcombine(with(scf_design,svyquantile(~wealth, 0.05, method='constant',interval.type='quantile',se=TRUE)))
q20.wealth <- scf_MIcombine(with(scf_design,svyquantile(~wealth, 0.20, method='constant',interval.type='quantile',se=TRUE)))
q40.wealth <- scf_MIcombine(with(scf_design,svyquantile(~wealth, 0.40, method='constant',interval.type='quantile',se=TRUE)))
q60.wealth <- scf_MIcombine(with(scf_design,svyquantile(~wealth, 0.60, method='constant',interval.type='quantile',se=TRUE)))
q80.wealth <- scf_MIcombine(with(scf_design,svyquantile(~wealth, 0.80, method='constant',interval.type='quantile',se=TRUE)))
q90.wealth <- scf_MIcombine(with(scf_design,svyquantile(~wealth, 0.90, method='constant',interval.type='quantile',se=TRUE)))
q95.wealth <- scf_MIcombine(with(scf_design,svyquantile(~wealth, 0.95, method='constant',interval.type='quantile',se=TRUE)))
q99.wealth <- scf_MIcombine(with(scf_design,svyquantile(~wealth, 0.99, method='constant',interval.type='quantile',se=TRUE)))
maxi.wealth <- scf_MIcombine(with(scf_design,svyquantile(~wealth, 1, method='constant',interval.type='quantile',se=TRUE)))  

和以下方法 2(:

quantile.wealth <- scf_MIcombine(with(scf_design,svyquantile(~networth, c(0.01,0.2,0.4,0.6,0.8,0.9,0.99,1), method='constant',interval.type='quantile',se=TRUE)))  

方法1(给我数字,但不准确。首先,它给出了以下警告标志:

Warning messages:
1: In svyquantile.svyrep.design(~networth, 1, method = "constant",  :
Not all replicate weight designs give valid standard errors for quantiles.
2: In svyquantile.svyrep.design(~networth, 1, method = "constant",  :
Not all replicate weight designs give valid standard errors for quantiles.
3: In svyquantile.svyrep.design(~networth, 1, method = "constant",  :
Not all replicate weight designs give valid standard errors for quantiles.
4: In svyquantile.svyrep.design(~networth, 1, method = "constant",  :
Not all replicate weight designs give valid standard errors for quantiles.
5: In svyquantile.svyrep.design(~networth, 1, method = "constant",  :
Not all replicate weight designs give valid standard errors for quantiles.

也就是说,标准误差并不准确...所以我想平均值也不可信...(不确定这个(。我尝试使用方法 2(,这只是对方法 1 的一点调整(,使用调查包参数 c[ , ] 在单个命令中计算分位数。然后我收到以下错误。

Error in (1 + 1/m) * evar/vbar : non-conformable arrays
In addition: Warning messages:
1: In svyquantile.svyrep.design(~networth, c(0.01, 0.2, 0.4, 0.6, 0.8,  :
Not all replicate weight designs give valid standard errors for quantiles.
2: In svyquantile.svyrep.design(~networth, c(0.01, 0.2, 0.4, 0.6, 0.8,  :
Not all replicate weight designs give valid standard errors for quantiles.
3: In svyquantile.svyrep.design(~networth, c(0.01, 0.2, 0.4, 0.6, 0.8,  :
Not all replicate weight designs give valid standard errors for quantiles.
4: In svyquantile.svyrep.design(~networth, c(0.01, 0.2, 0.4, 0.6, 0.8,  :
Not all replicate weight designs give valid standard errors for quantiles.
5: In svyquantile.svyrep.design(~networth, c(0.01, 0.2, 0.4, 0.6, 0.8,  :
Not all replicate weight designs give valid standard errors for quantiles.

(加法(方法3(我使用直接引用 (https://github.com/cran/survey/blob/master/man/svyquantile.Rd( 的调查包并尝试像下面这样计算分位数:

q <- svyquantile(~networth,scf_design, c(.25,.5,.75),ci=TRUE)

但是我得到了以下错误。

Error in UseMethod("svyquantile", design) : 
no applicable method for 'svyquantile' applied to an object of class 
"svyimputationList"

长话短说,我需要以正确的方式计算分位数和基尼系数,以便我可以将数字与模型进行比较。有没有办法纠正代码,或使用其他方法?

请记住,这是数据框的插补列表,由 5 个数据框组成。 而且... 值得注意的是,数据框不包含相同数量的样本(行(。

我可以看到 3 个问题。第一个很简单。变量networth不包含在vars.to.keep中,因此以后无法使用,并且会引发错误。

其次,警告实际上不是问题。警告在源代码中的位置在这里。只要您使用interval.type='quantile'调用svyquantile,并且调查设计是使用type = 'other'或使用千斤顶复制权重的任何类型的方法构建的,就会引发此警告。不应更改调查设计的类型。该警告提醒您,interval.type='quantile'产生的标准误差在 jacknife 调查设计或不寻常的自举设计中无效。SCF 数据集使用引导,但不清楚标准错误是否有效。或者,interval.type='probability'不会引发错误,但也许这不是您要找的。

q1.wealth <- scf_MIcombine(with(scf_design,
svyquantile(~wealth,0.01,
method='constant',
interval.type='probability',
se=TRUE, na.rm=T)));

第三,最后一个错误是令人讨厌的。我指的是这个

> quantile.wealth <- scf_MIcombine(with(
scf_design,
svyquantile(~networth,
c(0.01,0.2,0.4,0.6,0.8,0.9,0.99,1),
method='constant',
interval.type='quantile',
se=T,na.rm=T)));
# Error in (1 + 1/m) * evar/vbar : non-conformable arrays

似乎scf_MIcombineMIcombine无法将svyquantile调用的结果与多个分位数相结合。有一种解决方法,函数svyby不受此错误的影响。

scf_MIcombine( with(scf_design,
svyby(~networth,~as.factor(1),
svyquantile,
c(0.01,0.2,0.4,0.6,0.8,0.9,0.99,1),
se=T, na.rm=T,
method='constant',
interval.type='quantile')))

在上面的示例中,函数svyby用于计算调查数据子集的统计数据。在这种情况下,我使用了包含所有样本的假子集~as.factor(1)

最新更新