复制Pandas中的行并添加一个新的(月)列



我确信这是一个简单的问题。我有一个数据帧,它有大约1000 rows,它们是唯一的。

这将按位置按类别显示expenses for the year。每个位置都有相同的类别组。

我想为每个站点的每个费用创建一个monthly budget列。

我还想创建一个今年迄今为止的预算列,该列取今年的总预算,并将其除以12得出月度数字。然后乘以月份(4月=1月(得出今年迄今的值-例如,5月将是月度数字*2等。

我正试图用熊猫来做这件事。我试过

pd.DataFrame(np.repeat(budget.values,12,axis=0)) #replicate each row by 12

我的计划是在每组中的每一行中迭代,以添加月份,但我很难实现任何目标。

如有任何帮助,我们将不胜感激。

(很抱歉,我无法正确粘贴表格-请参阅图片(

当前

+------------+-------------+--------+
|  Location  |  Expense    | Amount |
+------------+-------------+--------+
| Sheffield  | Electricity |  10000 |
| Sheffield  | Gas         |  12000 |
| Manchester | Electricity |  15000 |
| Manchester | Electricity |  13000 |
+------------+-------------+--------+

所需

+------------+-------------+--------+--------+---------+-------+
|  Location  |  Expense    | Amount | Budget |  Month  |  YTD  |
+------------+-------------+--------+--------+---------+-------+
| Sheffield  | Electricity |  10000 |  10000 | April   |  1000 |
| Sheffield  | Electricity |  10000 |  10000 | May     |  2000 |
| Sheffield  | Electricity |  10000 |  10000 | June    |  3000 |
| Sheffield  | Electricity |  10000 |  10000 | July    |  4000 |
| Sheffield  | Electricity |  10000 |  10000 | August  |  5000 |
| Sheffield  | Electricity |  10000 |  10000 | Sep     |  6000 |
| Sheffield  | Electricity |  10000 |  10000 | Oct     |  7000 |
| Sheffield  | Electricity |  10000 |  10000 | Dec     |  8000 |
| Sheffield  | Electricity |  10000 |  10000 | Jan     |  9000 |
| Sheffield  | Electricity |  10000 |  10000 | Feb     | 10000 |
| Sheffield  | Electricity |  10000 |  10000 | March   | 11000 |
| Sheffield  | Gas         |  12000 |  20000 | April   |  2000 |
| Sheffield  | Gas         |  12000 |  20000 | May     |  4000 |
| Sheffield  | Gas         |  12000 |  20000 | June... |  6000 |
| Sheffield  | Gas         |  12000 |  20000 | ..March |  8000 |
| Manchester | Electricity |  15000 |  36000 | April   |  4000 |
| Manchester | Electricity |  15000 |  36000 | May     |  8000 |
+------------+-------------+--------+--------+---------+-------+

您可以创建特定的月份表,其中四月是会计年度中开始月份的数字1。

import pandas as pd 

# intialise data from list. 
data = {'Month':['April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December', 'January', 'February', 'March'], 
'Number':range(1,13), 
'key': [1] * 12 }

# Create DataFrame 
df_months = pd.DataFrame(data)

生成这样的表格:

+------------+--------+-----+
| Month      | Number | key |
+------------+--------+-----+
| April      |  1     | 1   |
| May        |  2     | 1   |
| June       |  3     | 1   |
| July       |  4     | 1   |
| August     |  5     | 1   |
| September  |  6     | 1   |
| October    |  7     | 1   |
| November   |  8     | 1   |
| December   |  9     | 1   |
| January    |  10    | 1   |
| February   |  11    | 1   |
| March      |  12    | 1   |
+------------+--------+-----+

现在调整您的第二个表,让我们称之为df_amounts,以具有一个有效的键列(key(,它将确保每个月都加入到每个位置/费用组合中:

df_amounts['key'] = 1

df_amounts:

+------------+-------------+--------+-----+
|  Location  |  Expense    | Amount | key |
+------------+-------------+--------+-----+
| Sheffield  | Electricity |  10000 | 1   |
| Sheffield  | Gas         |  12000 | 1   |
| Manchester | Electricity |  15000 | 1   |
| Manchester | Electricity |  13000 | 1   |
+------------+-------------+--------+-----+

然后加入key:上的表格

df = pd.merge(df_amounts, df_months, on="key", how="left")

获取下表:

+------------+-------------+--------+-----+-------+--------+
|  Location  |  Expense    | Amount | key | Month | Number | 
+------------+-------------+--------+-----+-------+--------+

现在将列number除以12,并将该值与Amount相乘,得到新的列YTD

df['YTD']= df['Amount'] * (df['Number'] / 12)

你每月的Budget专栏工作原理类似:

df['Budget']= df['Amount'] / 12

最新更新