我有以下有效期项目的数据框架:
Item Expiry Date Stock
Voucher 1 1-Mar-2022 3
Voucher 2 31-Apr-2022 2
Voucher 3 1-Feb-2022 1
我想创建一个老化的仪表板,并绘制出我在那里的库存数量:
Jan Feb Mar Apr
Voucher 1 3
Voucher 2 2
Voucher 3 1
有什么想法或指导如何做上面的事情吗?我搜索了很多资源,找不到任何资源。我对构建仪表板很陌生。谢谢
您可以从表中提取月份名称(NB。您的日期无效。4月31日是不可能的(和pivot
。如果需要,reindex
带有月份名称列表:
from calendar import month_abbr
cols = month_abbr[1:] # first item is empty string
(df.assign(month=df['Expiry Date'].str.extract('-(D+)-'))
.pivot(index='Item', columns='month', values='Stock')
.reindex(columns=cols)
)
如果您希望有重复的项目,请使用pivot_table
和sum
作为聚合函数
输出:
month Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Item
Voucher 1 NaN NaN 3.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN
Voucher 2 NaN NaN NaN 2.0 NaN NaN NaN NaN NaN NaN NaN NaN
Voucher 3 NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
您可以这样尝试:
import pandas as pd
# Item Expiry Date Stock
# Voucher 1 1-Mar-2022 3
# Voucher 2 31-Apr-2022 2
# Voucher 3 1-Feb-2022 1
data = {'Item': ['Voucher 1', 'Voucher 2', 'Voucher 3'],
'Expiry Date': ['1-Mar-2022', '31-Apr-2022', '1-Feb-2022'],
'Stock': [3, 2, 1]}
df = pd.DataFrame(data)
# Using pandas apply method, get the month from each row using axis=1 and store it in new column 'Month'
# https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html
df['Month'] = df.apply(lambda x: x['Expiry Date'].split('-')[1], axis=1)
# Using pandas pivot method, set 'Item' column as index,
# https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.pivot.html
# set unique values in 'Month' column as separate columns
# set values in 'Stock' column as values for respective month columns
# and using 'rename_axis' method, remove the row name 'Month'
# https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.rename_axis.html
new_df = df.pivot(index='Item', columns='Month', values='Stock').rename_axis(None, axis=1)
# Sort the month column names by first converting it to the the pandas timestamp object
# then using it as a key in a sorted function on all columns
new_df = new_df[sorted(new_df.columns, key=lambda x: pd.to_datetime(x, format='%b'))]
print(new_df)
这就是我得到的输出:
Feb Mar Apr
Item
Voucher 1 NaN 3.0 NaN
Voucher 2 NaN NaN 2.0
Voucher 3 1.0 NaN NaN