从具有相同国家列表的多个变量创建新变量



在Stata中,我拆分了一个变量,其中最多有20个国家用逗号分隔,现在我有20个不同的变量(country1country20),但是相同的国家在多个变量country1country20中列出。

例如,乌干达可能在country1country2country5中。现在,我想为每个国家创建一个变量(1为真,0为假)。所以,基本上我想为这20个国家中的每个国家设置一个变量。我试过了,但是没有成功。

local N = _N
forvalues i = 1/`N' {
local s1 = Countryies1 [`i']
local s2 = Countryies2   [`i']   
local s3 = Countryies3 [`i']
local s4 = Countryies4   [`i'] 
local s5 = Countryies5 [`i']
local s6 = Countryies6   [`i'] 
local s7 = Countryies7 [`i']
local s8 = Countryies8   [`i'] 
local s9 = Countryies9 [`i']
local s10 = Countryies10   [`i'] 
local s11 = Countryies11 [`i']
local s12 = Countryies12   [`i']   
local s13 = Countryies13 [`i']
local s14 = Countryies14   [`i'] 
local s15 = Countryies15 [`i']
local s16 = Countryies16   [`i'] 
local s17 = Countryies17 [`i']
local s18 = Countryies18   [`i'] 
local s19 = Countryies19 [`i']
local s20 = Countryies20   [`i'] 

local intersection: list s1 & s2 & s3 & s4 & s5 & s6 & s7 & s8 & s9 & s10 & s11 & s12 & s13 & s14 & s15 & s16 & s17 & s18 & s19 & s20
replace country ="`intersection'" in `i'
}

这似乎可行——而且在任何意义上都不排除其他解决方案。

clear 
input str42 countries 
"Uganda"
"Uganda, Kenya"
"Uganda, Kenya, Tanzania"
"South Africa"
end 
gen id = _n 
save datasofar, replace 
keep id countries 
split countries, parse(,)
drop countries 
reshape long countries, i(id) j(which)
drop if missing(countries)
replace countries = trim(countries)
gen name = strtoname(countries)
levelsof name, local(names) 
gen new_id = _n 
foreach n of local names { 
gen is_`n' = name == "`n'" 
su new_id if is_`n', meanonly 
label var is_`n' "`=countries[r(min)]'"
local vars `vars' is_`n' 
}
collapse  (max) `vars', by(id)
merge 1:1 id using datasofar 
+----------------------------------------------------------------------------------------+
| id   is_Kenya   is_Sou~a   is_Tan~a   is_Uga~a                 countries        _merge |
|----------------------------------------------------------------------------------------|
1. |  1          0          0          0          1                    Uganda   Matched (3) |
2. |  2          1          0          0          1             Uganda, Kenya   Matched (3) |
3. |  3          1          0          1          1   Uganda, Kenya, Tanzania   Matched (3) |
4. |  4          0          1          0          0              South Africa   Matched (3) |
+----------------------------------------------------------------------------------------+

另一种解决方案是循环遍历名称,因此

foreach c in Uganda Kenya Tanzania "South Africa" { 
local C = strtoname("`c'") 
gen is_`C' = strpos(countries, "`c'") > 0 
} 

但是要小心——拼写的变化会咬你一口。它们也会与之前的代码发生冲突。

相关内容

  • 没有找到相关文章