转换Stata中的数据变量名称



你好,我想使用美国国家科学基金会数据集,但原始excel文件变量名称没有正确转置数据。有没有人有关于如何转置数据集,使其适合分析的数据格式的任何示例代码。这是原始的excel文件,这样你就可以理解问题了。

来自NSF网站的Excel原始文件

*编辑添加导入视图

记住Nick关于SO本质的恰当观点,我在这里确实有一些示例代码可能会有所帮助。不知道你所说的"调换"是什么意思我不能给出一个确切的答案,但是您可以根据您的目的调整下面的reshape命令。

导入和查看

// Import Excel File defining TL and BR of table
import excel "nsb20197-tabs02-012.xlsx", cellrange(A6:K177) clear
list in 1/15
+----------------------------------------------------------------------------------------------------------------------------------------+
|                                A       B       C       D      E                   F       G       H       I                   J      K |
|----------------------------------------------------------------------------------------------------------------------------------------|
1. |                Associate's level                                                                                                       |
2. | American Indian or Alaska Native                                                                                                       |
3. |                       All fields    6282    4131    2151   65.8                34.2    8935    5697    3238                63.8   36.2 |
4. |                              S&E     608     386     222   63.5                36.5     942     495     447                52.5   47.5 |
5. |                      Engineering      17       4      13   23.5                76.5      50       8      42                  16     84 |
|----------------------------------------------------------------------------------------------------------------------------------------|
6. |                 Natural sciences     384     220     164   57.3                42.7     453     174     279                38.4   61.6 |
7. |   Social and behavioral sciences     207     162      45   78.3                21.7     439     313     126                71.3   28.7 |
8. |                          Non-S&E    5674    3745    1929     66                  34    7993    5202    2791   65.09999999999999   34.9 |
9. |        Asian or Pacific Islander                                                                                                       |
10. |                       All fields   27313   15522   11791   56.8                43.2   54809   30916   23893                56.4   43.6 |
|----------------------------------------------------------------------------------------------------------------------------------------|
11. |                              S&E    2649    1284    1365   48.5                51.5    7862    3492    4370                44.4   55.6 |
12. |                      Engineering     160      23     137   14.4   85.59999999999999     574     111     463                19.3   80.7 |
13. |                 Natural sciences    2010     939    1071   46.7                53.3    4419    1562    2857                35.3   64.7 |
14. |   Social and behavioral sciences     479     322     157   67.2                32.8    2869    1819    1050                63.4   36.6 |
15. |                          Non-S&E   24664   14238   10426   57.7                42.3   46947   27424   19523                58.4   41.6 |

构造数据

// Create supercategories

* level = Column A if column A contains the word level (ignoring case)
gen level = word(A,1) if ustrregexm(A, "level", 1), before(A)

* demographic = Column A if next ob in column A contains the word all fields (ignoring case)
gen demographic = A if ustrregexm(A[_n+1], "all fields", 1), before(A)

* fill down demographic and level, and drop blank rows
foreach v of varlist level demographic {
replace `v' = `v'[_n-1] if missing(`v')
}
drop if mi(demographic) | demographic == A | regexm(A, level)

// rename variables 
* rename A
rename A field

* rename count columns
local list "B C D E F" 
local year = 2000
rename (`list') (all_`year' female_`year' male_`year' perc_female_`year' perc_male_`year' )

local list "G H I J K" 
local year = 2017
rename (`list') (all_`year' female_`year' male_`year' perc_female_`year' perc_male_`year' )

* destring 
destring *_2000 *_2017, replace

重塑为长

* reshape long
drop perc*
reshape long all_ male_ female_, i(level demographic field) j(year)

rename *_ degrees*
reshape long degrees,  i(level demographic field year) j(gender) string

一个如何重塑wide by field的例子

* test reshape wide by field + ensure the variable name is less than 32 characters post reshape 
replace field = lower(strtoname(field))
replace field = substr(field, 1, 32 - strlen("degrees") - 1) 
reshape wide degrees, i(level demographic year gender) j(field) string

相关内容

  • 没有找到相关文章

最新更新