如何将数据帧中唯一ID的回归截距和系数数据获取到单个数据帧中,其中每行都有UID、截距和参数?
这是我的原始数据的一个片段。未来的数据可以有更多的UID和更多的字段(自变量(。
UID | A1 | A2 | A3A4评级||
---|---|---|---|---|
1 | 0.377489423 | 0.9503118460.8921352930.0770540854|||
1 | 0.595570737 | 0.824334482 | 0.3886345430.9479364834 | |
1 | 0.585703124 | 0.825486315 | 0.5698098860.321117521 | 3 |
1 | 0.386968371 | 0.594556911 | 0.2601873760.394238102 | 4|
1 | 0.532731866 | 0.219741858 | 0.8657105170.173044631 | 3|
1 | 0.16565561 | 0.125096015 | 0.8818416510.4946901334||
2 | 0.42418965 | 0.814894214 | 0.989426450.871014023 | <1>|
2 | 0.742640457 | 0.571780036 | 0.2478112550.4688206532 | |
2 | 0.401989919 | 0.375134173 | 0.5395995930.443260146 | 3 |
2 | 0.167910365 | 0.940073739 | 0.4900817230.8030745745 | |
2 | 0.614160221 | 0.045817359 | 0.0776454690.367456074 | 4 |
3 | 0.866397055 | 0.2932472 | 0.9684102520.3485423045||
3 | 0.141680391 | 0.998446121 | 0.2015063560.6898637851 | |
3 | 0.407182414 | >0.721650663 | 0.1742770130.922810374 | <1>
以下是我提出的/。我只需要添加UID,不知道如何为每一行添加UID。
ids = df.UID.unique()
op = pd.DataFrame
intercept = []
coefficients=[]
UID = []
for i in ids:
df_i = df[df.UID == i]
X =df_i.drop(['UID','Rating'], axis=1)
y= df_i['Rating']
reg = LinearRegression().fit(X, y)
reg.score(X, y)
unique_id=df_i['UID'].unique()
const = reg.intercept_
coef = reg.coef_
UID.append(unique_id)
intercept.append(const)
coefficients.append(coef)
intercep_new = pd.DataFrame(intercept)
coefficients_new = pd.DataFrame(coefficients)
UID_new = pd.DataFrame(UID)
colNames = df.drop(['Rating',], axis=1).columns
colNames = colNames.insert(1, 'Const')
colNames
op = pd.concat([UID_new,intercep_new, coefficients_new], axis=1)
op.columns = colNames
请参阅以下更改:
ids = df.UID.unique()
op=pd.DataFrame()
for i in ids:
df_i = df[df.UID == i]
X =df_i.drop(['UID','Rating'], axis=1)
y= df_i['Rating']
reg = LinearRegression().fit(X, y)
reg.score(X, y)
const = reg.intercept_
coef = reg.coef_
uid=i
array=np.append(coef,const)
array=np.append(array,uid)
array=array.reshape(1,len(array))
df_append=pd.DataFrame(array)
op=op.append(df_append)
op.columns=['A'+str(i) for i in range (1,len(op.columns)+1)]
op.rename(columns={op.columns[-1]:"UID"},inplace=True)
op.rename(columns={op.columns[-2]:"Intercept"},inplace=True)
op=op.reset_index().drop('index',axis=1)
op=op.drop_duplicates()