将数据帧列表示为其他列的线性组合



我有一个包含10列数据和2000行的数据框架。它还有其他需要忽略的列:

df1 = pd.DataFrame(np.random.randint(0,100,size=(2000, 10)), columns=list('ABCDEFGHIJ'))
df1['Company Name']=stringlist1

由于不同的文件具有不同的列名,因此列的名称可以在不同的运行中更改。唯一常见的是,要考虑的数据从第7列开始,接下来是10列。我有几个列表,每个列表包含10个权重,其中一些是零,另一些是非零,加起来为1。示例:

wt1=[0.0,0.34,0.05,0.0,0.1,0.01,0.0,0.0,0.5,0.0]

我需要定义一个新的df1列,它是10列的线性组合,权重在wt1中指定。

我该怎么做?请注意,列的名称(ABCD…(不能出现在求和表达式中,因为对于列名可能不同的数据(它们是从Excel工作表中读取的(,上述代码需要可重复使用。

我试过了:

icollist1=[icol1 for icol1,val1 in enumerate(wt1) if val1>0.0]
for icol1 in icollist1:
df1['Weighted Sum']+=np.asarray(wt1[icol1])*df1[colnames1[icol1]]

其中colnames1是从读取该数据帧的Excel文件中提取的列的列表。

我收到错误:

TypeError: can't multiply sequence by non-int of type 'float'
...
During handling of the above exception, another exception occurred:
...
TypeError: can't multiply sequence by non-int of type 'float'

在提供的示例中试试这个

df1 = pd.DataFrame(np.random.randint(0,100,size=(2000, 10)), columns=list('ABCDEFGHIJ'))
wt1=[0.0,0.34,0.05,0.0,0.1,0.01,0.0,0.0,0.5,0.0]
df1.mul(wt1, axis=1).sum(axis=1)

如果你有超过10列,并且你想从第7列开始进行倍数:

df1 = pd.DataFrame(np.random.randint(0,100,size=(2000, 20)))
wt1=[0.0,0.34,0.05,0.0,0.1,0.01,0.0,0.0,0.5,0.0]
df1.iloc[:,6:16].mul(wt1, axis=1).sum(axis=1)

最新更新