将Stata-Do文件中的值返回到Python



我可以成功地从Python调用Stata-Do文件,但从Stata将本地宏导入Python的最佳方法是什么?我打算在Python中的循环中使用Do文件。

到目前为止我所拥有的:

Python:

import subprocess
InputParams = [' -3','0',' -3','0',' -3','0']
# /e makes it run quietly, i.e., Stata doesn't open a window
cmd = ['C:\Program Files (x86)\Stata14\StataMP-64.exe','/e','do',dofile] + InputParams
subprocess.call(cmd,shell=True)

在Stata中,我运行了一个回归,得到了一个包含均方误差的局部宏,比如

local MSE = 0.0045

将本地宏返回到Python的最佳方式是什么?将其写入文件?我找不到任何关于将宏写入文件的信息。

额外的问题:如果我把InputParams = ['-3' , '0']放在Python中(我去掉了负三前面的空格(,Stata会给出一个错误/3 invalid name,为什么?

编辑

添加Stata-Do文件。这不是实际的脚本,它只是我在实际脚本中所做操作的表示。

quietly {
capture log close
clear all
cls
version 14.2
set more off
cd "<path here>"
local datestamp: di %tdCCYY-NN-DD daily("$S_DATE","DMY")
local timestamp = subinstr("$S_TIME",":","-",2)
log using "Logslog_`datestamp'_`timestamp'_UTC.log"
set matsize 10000
use "<dataset path here>"
gen date = dofc(TimeVar)
encode ID, generate(uuid)
xtset uuid date
gen double DepVarLagSum = 0
gen double IndVar1LagMax = 0
gen double IndVar2LagMax = 0
local DepVar1LagStart = `1' // INPUT PARAMS GO HERE
local DepVar1LagEnd = `2'
local IndVar1LagStart = `3' 
local IndVar1LagEnd = `4'
local IndVar2Start = `5'
local IndVar2End = `6'
** number of folds for cross validation
scalar kfold = 5
set seed 42
gen byte randint = runiform(1,kfold)
** thanks to Álvaro A. Gutiérrez-Vargas for the matrix operations
matrix results = J(kfold,4,.)
matrix colnames results = "R2_fold" "MSE_fold" "R2_hold" "MSE_hold"
matrix rownames results = "1" "2" "3" "4" "5"
local MSE = 0
** rolling sum, thanks to Nick Cox for the algorithm
forval k = `DepVarLagStart'(1)`DepVar1agEnd' {
if `k' < 0 {
local k1 = -(`k')
replace DepVarLagSum = DepVarLagSum + L`k1'.DepVar
}
else replace DepVarLagSum = DepVarLagSum + F`k'.DepVar
}
** rolling max, thanks to Nick Cox for the algorithm
local IndVar1_arg IndVar1 
forval k = `IndVar1LagStart'(1)`IndVar1LagEnd' {
if `k' <= 0 {
local k1 = -(`k')
local IndVar1_arg `IndVar1_arg', L`k1'.IndVar1
}    
}
local IndVar2_arg IndVar2 
forval k = `IndVar2LagStart'(1)`IndVar2LagEnd' {
if `k' <= 0 {
local k1 = -(`k')
local IndVar2_arg `IndVar2_arg', L`k1'.IndVar2
}    
}
gen resid_squared = .
forval b = 1(1)`=kfold' {
** perform regression on 4/5 parts
xtreg c.DepVarLagSum ///
c.IndVar1LagMax ///
c.IndVar2LagMax ///
if randint != `b' ///
, fe vce(cluster uuid)
** store results
matrix results[`b',1] = e(r2)
matrix results[`b',2] = e(rmse)*e(rmse) // to get MSE

** test set
predict predDepVarLagSum if randint == `b', xb
predict residDepVarLagSum if randint == `b', residuals

** get R-squared
corr DepVarLagSum predDepVarLagSum if randint == `b'
matrix results[`b',3] = r(rho)^2

** calculate squared residuals
replace resid_squared = residDepVarLagSum*residDepVarLagSum
summarize resid_squared if randint == `b'
matrix results[`b',4] = r(mean)
drop predDepVarLagSum
drop residDepVarLagSum
mat U = J(rowsof(results),1,1)
mat sum = U'*results
mat mean_results = sum/rowsof(results)
local MSE = mean_results[1,4]
}
}

我想把MSE反馈到Python中。

很抱歉,如果我错过了小的拼写错误,我无法直接从运行Stata的机器上复制代码。

其想法是提供输入参数以确定滞后期,基于新变量运行回归,获得平均测试集均方误差,并将其反馈到Python中。

编辑2

我在InputParams列表中添加了更多项目,以反映Stata-Do文件的预期输入数量。

Stata 16.1中提供了Python和Stata之间更好的集成,但早期版本的实用解决方案是在磁盘上写下带有结果的Stata矩阵(这里我使用的是Excel文件(,然后从Python中读取。这里有一个代码行的例子,你可以把它放在dofile的末尾来写你想要的矩阵。

clear all
version 14.1
matrix M = J(5,2,999)
matrix colnames M = "col1"  "col2" 
matrix rownames M ="1" "2" "3" "4" "5"
global route = "C:Usersroute_to_your_working_directory"
putexcel set "${route}M.xlsx", sheet("M")  replace
putexcel A1 = matrix(M)   , names

最新更新