我确信这是一个简单的问题。我有一个数据帧,它有大约1000 rows
,它们是唯一的。
这将按位置按类别显示expenses for the year
。每个位置都有相同的类别组。
我想为每个站点的每个费用创建一个monthly budget
列。
我还想创建一个今年迄今为止的预算列,该列取今年的总预算,并将其除以12得出月度数字。然后乘以月份(4月=1月(得出今年迄今的值-例如,5月将是月度数字*2等。
我正试图用熊猫来做这件事。我试过
pd.DataFrame(np.repeat(budget.values,12,axis=0)) #replicate each row by 12
我的计划是在每组中的每一行中迭代,以添加月份,但我很难实现任何目标。
如有任何帮助,我们将不胜感激。
(很抱歉,我无法正确粘贴表格-请参阅图片(
当前
+------------+-------------+--------+
| Location | Expense | Amount |
+------------+-------------+--------+
| Sheffield | Electricity | 10000 |
| Sheffield | Gas | 12000 |
| Manchester | Electricity | 15000 |
| Manchester | Electricity | 13000 |
+------------+-------------+--------+
所需
+------------+-------------+--------+--------+---------+-------+
| Location | Expense | Amount | Budget | Month | YTD |
+------------+-------------+--------+--------+---------+-------+
| Sheffield | Electricity | 10000 | 10000 | April | 1000 |
| Sheffield | Electricity | 10000 | 10000 | May | 2000 |
| Sheffield | Electricity | 10000 | 10000 | June | 3000 |
| Sheffield | Electricity | 10000 | 10000 | July | 4000 |
| Sheffield | Electricity | 10000 | 10000 | August | 5000 |
| Sheffield | Electricity | 10000 | 10000 | Sep | 6000 |
| Sheffield | Electricity | 10000 | 10000 | Oct | 7000 |
| Sheffield | Electricity | 10000 | 10000 | Dec | 8000 |
| Sheffield | Electricity | 10000 | 10000 | Jan | 9000 |
| Sheffield | Electricity | 10000 | 10000 | Feb | 10000 |
| Sheffield | Electricity | 10000 | 10000 | March | 11000 |
| Sheffield | Gas | 12000 | 20000 | April | 2000 |
| Sheffield | Gas | 12000 | 20000 | May | 4000 |
| Sheffield | Gas | 12000 | 20000 | June... | 6000 |
| Sheffield | Gas | 12000 | 20000 | ..March | 8000 |
| Manchester | Electricity | 15000 | 36000 | April | 4000 |
| Manchester | Electricity | 15000 | 36000 | May | 8000 |
+------------+-------------+--------+--------+---------+-------+
您可以创建特定的月份表,其中四月是会计年度中开始月份的数字1。
import pandas as pd
# intialise data from list.
data = {'Month':['April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December', 'January', 'February', 'March'],
'Number':range(1,13),
'key': [1] * 12 }
# Create DataFrame
df_months = pd.DataFrame(data)
生成这样的表格:
+------------+--------+-----+
| Month | Number | key |
+------------+--------+-----+
| April | 1 | 1 |
| May | 2 | 1 |
| June | 3 | 1 |
| July | 4 | 1 |
| August | 5 | 1 |
| September | 6 | 1 |
| October | 7 | 1 |
| November | 8 | 1 |
| December | 9 | 1 |
| January | 10 | 1 |
| February | 11 | 1 |
| March | 12 | 1 |
+------------+--------+-----+
现在调整您的第二个表,让我们称之为df_amounts
,以具有一个有效的键列(key
(,它将确保每个月都加入到每个位置/费用组合中:
df_amounts['key'] = 1
df_amounts:
+------------+-------------+--------+-----+
| Location | Expense | Amount | key |
+------------+-------------+--------+-----+
| Sheffield | Electricity | 10000 | 1 |
| Sheffield | Gas | 12000 | 1 |
| Manchester | Electricity | 15000 | 1 |
| Manchester | Electricity | 13000 | 1 |
+------------+-------------+--------+-----+
然后加入key
:上的表格
df = pd.merge(df_amounts, df_months, on="key", how="left")
获取下表:
+------------+-------------+--------+-----+-------+--------+
| Location | Expense | Amount | key | Month | Number |
+------------+-------------+--------+-----+-------+--------+
现在将列number
除以12
,并将该值与Amount
相乘,得到新的列YTD
。
df['YTD']= df['Amount'] * (df['Number'] / 12)
你每月的Budget
专栏工作原理类似:
df['Budget']= df['Amount'] / 12