在Stata中,我拆分了一个变量,其中最多有20个国家用逗号分隔,现在我有20个不同的变量(country1
到country20
),但是相同的国家在多个变量country1
到country20
中列出。
例如,乌干达可能在country1
、country2
和country5
中。现在,我想为每个国家创建一个变量(1为真,0为假)。所以,基本上我想为这20个国家中的每个国家设置一个变量。我试过了,但是没有成功。
local N = _N
forvalues i = 1/`N' {
local s1 = Countryies1 [`i']
local s2 = Countryies2 [`i']
local s3 = Countryies3 [`i']
local s4 = Countryies4 [`i']
local s5 = Countryies5 [`i']
local s6 = Countryies6 [`i']
local s7 = Countryies7 [`i']
local s8 = Countryies8 [`i']
local s9 = Countryies9 [`i']
local s10 = Countryies10 [`i']
local s11 = Countryies11 [`i']
local s12 = Countryies12 [`i']
local s13 = Countryies13 [`i']
local s14 = Countryies14 [`i']
local s15 = Countryies15 [`i']
local s16 = Countryies16 [`i']
local s17 = Countryies17 [`i']
local s18 = Countryies18 [`i']
local s19 = Countryies19 [`i']
local s20 = Countryies20 [`i']
local intersection: list s1 & s2 & s3 & s4 & s5 & s6 & s7 & s8 & s9 & s10 & s11 & s12 & s13 & s14 & s15 & s16 & s17 & s18 & s19 & s20
replace country ="`intersection'" in `i'
}
这似乎可行——而且在任何意义上都不排除其他解决方案。
clear
input str42 countries
"Uganda"
"Uganda, Kenya"
"Uganda, Kenya, Tanzania"
"South Africa"
end
gen id = _n
save datasofar, replace
keep id countries
split countries, parse(,)
drop countries
reshape long countries, i(id) j(which)
drop if missing(countries)
replace countries = trim(countries)
gen name = strtoname(countries)
levelsof name, local(names)
gen new_id = _n
foreach n of local names {
gen is_`n' = name == "`n'"
su new_id if is_`n', meanonly
label var is_`n' "`=countries[r(min)]'"
local vars `vars' is_`n'
}
collapse (max) `vars', by(id)
merge 1:1 id using datasofar
+----------------------------------------------------------------------------------------+
| id is_Kenya is_Sou~a is_Tan~a is_Uga~a countries _merge |
|----------------------------------------------------------------------------------------|
1. | 1 0 0 0 1 Uganda Matched (3) |
2. | 2 1 0 0 1 Uganda, Kenya Matched (3) |
3. | 3 1 0 1 1 Uganda, Kenya, Tanzania Matched (3) |
4. | 4 0 1 0 0 South Africa Matched (3) |
+----------------------------------------------------------------------------------------+
另一种解决方案是循环遍历名称,因此
foreach c in Uganda Kenya Tanzania "South Africa" {
local C = strtoname("`c'")
gen is_`C' = strpos(countries, "`c'") > 0
}
但是要小心——拼写的变化会咬你一口。它们也会与之前的代码发生冲突。