如何为唯一ID运行多个线性模型,并通过唯一ID将结果放入单个数据帧中



如何将数据帧中唯一ID的回归截距和系数数据获取到单个数据帧中,其中每行都有UID、截距和参数?

这是我的原始数据的一个片段。未来的数据可以有更多的UID和更多的字段(自变量(。

A3A4评级0.9503118460.8921352930.07705408540.3886345430.9479364830.5698098860.26018737640.86571051730.8818416510.49469013340.98942645<1>0.2478112550.4688206530.5395995930.4900817230.8030745740.0776454690.9684102520.34854230450.2015063560.689863785>0.174277013<1>
UID A1 A2
1 0.377489423
1 0.595570737 0.8243344824
1 0.585703124 0.8254863150.3211175213
1 0.386968371 0.5945569110.394238102
1 0.532731866 0.2197418580.173044631
1 0.16565561 0.125096015
2 0.42418965 0.8148942140.871014023
2 0.742640457 0.5717800362
2 0.401989919 0.3751341730.4432601463
2 0.167910365 0.9400737395
2 0.614160221 0.0458173590.3674560744
3 0.866397055 0.2932472
3 0.141680391 0.9984461211
3 0.4071824140.7216506630.922810374

以下是我提出的/。我只需要添加UID,不知道如何为每一行添加UID。

ids = df.UID.unique()
op = pd.DataFrame
intercept = []
coefficients=[]
UID = []
for i in ids:
df_i = df[df.UID == i]
X =df_i.drop(['UID','Rating'], axis=1)
y= df_i['Rating']
reg = LinearRegression().fit(X, y)
reg.score(X, y)
unique_id=df_i['UID'].unique()   
const = reg.intercept_
coef = reg.coef_
UID.append(unique_id)
intercept.append(const)
coefficients.append(coef)
intercep_new = pd.DataFrame(intercept)
coefficients_new = pd.DataFrame(coefficients)
UID_new = pd.DataFrame(UID)
colNames = df.drop(['Rating',], axis=1).columns
colNames = colNames.insert(1, 'Const')
colNames
op = pd.concat([UID_new,intercep_new, coefficients_new], axis=1)
op.columns = colNames

请参阅以下更改:

ids = df.UID.unique()
op=pd.DataFrame()
for i in ids:

df_i = df[df.UID == i]
X =df_i.drop(['UID','Rating'], axis=1)
y= df_i['Rating']
reg = LinearRegression().fit(X, y)
reg.score(X, y)
const = reg.intercept_
coef = reg.coef_
uid=i
array=np.append(coef,const)
array=np.append(array,uid)
array=array.reshape(1,len(array))
df_append=pd.DataFrame(array)
op=op.append(df_append)
op.columns=['A'+str(i) for i in range (1,len(op.columns)+1)]
op.rename(columns={op.columns[-1]:"UID"},inplace=True)
op.rename(columns={op.columns[-2]:"Intercept"},inplace=True)
op=op.reset_index().drop('index',axis=1)
op=op.drop_duplicates()

最新更新