Python / Pandas - 使用年份列名称动态计算数据帧中的年度比率



我有这个数据帧:

df:
Co_Name  . 2014 Revenues . Address . 2012 Profits . 2014 Profits...
1 .  Apple      1231            Gjud St   20             23          ...
2 .  Orange     84894           Uinjs St  712            313         ...
3 .  Squirrel   9192            Iusaa St  4312           123         ...
...

例如:"2014 年收入"表示某家公司在 2014 年的收入。

我需要计算不同的比率,用同年的值制作方程。比如我需要计算2014年的利润率,也就是说:

df['2014 ProfitMg'] = df['2014 Profits']/df['2014 Revenues']

但是,我有很多年和很多比率要从这个数据帧(巨大的数据帧(中取出,所以我想以一种动态和pythonic的方式做到这一点。我想说:"Python先生,请计算以相同的4个字符字符串开头的名为'利润'和'收入'的列的利润/收入"或类似的东西。

它应该看起来像这样:

Co_Name  . 2014 Revenues . Address . 2012 Profits . 2014 Profits . 2014 ProfitMg  ...
1 .  Apple      1231            Gjud St   20             23             0.019
2 .  Orange     84894           Uinjs St  712            313            0.008
3 .  Squirrel   9192            Iusaa St  4312           123            0.0133
...

有人可以帮助以动态方式而不是df['2014 ProfitMg'] = df['2014 Profits']/df['2014 Revenues']来制作它吗?

当然,您可以找到匹配的列并将公式应用于它们:

import re
years = [re.findall(r"(d{4})sRevenues", col) for col in df.columns]
for year in years:
if year:
df['{} ProfitMg'.format(year[0])] = df['{} Profits'.format(year[0])]
/ df['{} Revenues'.format(year[0])]

此解决方案假定每个"收入"列都有一个匹配的"利润"列。如果没有,请获取"利润"年份集和"收入"年份集并取其交集。

import pandas as pd 
import numpy as np 
# create some data
profit_nm = ['. '+str(i)+' Profits' for i in range(1951,2051)]
revenue_nm = ['. '+str(i)+' Revenues' for i in range(1951,2051)]
column_nm = profit_nm+revenue_nm
column_nm.sort()
data = np.asarray(np.random.randint(100,1000,size=(1000,200)))
df = pd.DataFrame(data,columns=column_nm)
# function that will return the ratios
def func(pd_series):
year = pd_series.name[:6]
ret =  df.loc[:,year+' Profits']/df.loc[:,year+' Revenues']
return ret
# extract names of profit columns from dataframe
profit_cols = [i for i in df.columns.tolist() if i.find(' Profits')!=-1]
#  get ratios and store in df
df2 = df.loc[:,profit_cols].apply(func, axis=0)
# change column names before joining as column names are same in df and df2
df2.columns = [year[:6]+' PftPct' for year in profit_cols]
df = df.join(df2)

最新更新