时间表,列=小时,行=工作日,数据=主题
[工作日x小时]
1 2 3 4 5 6 7
Name
Monday Project Project Project Data Science Embedded Systems Data Mining Industrial Psychology
Tuesday Project Project Project Project Data Science Industrial Psychology Embedded Systems
Wednesday Data Science Project Project Project Project Project Project
Thursday Data Mining Industrial Psychology Embedded Systems Data Mining Project Project Project
Friday Industrial Psychology Embedded Systems Data Science Data Mining Project Project Project
频率表行=工作日,列=主题,数据=相应工作日中的主题频率
【工作日x主题】
Data Data Mining Data Science Embedded Systems Industrial Psychology Project
Name
Friday 1 1 1 1 3
Monday 1 1 1 1 3
Thursday 2 0 1 1 3
Tuesday 0 1 1 1 4
Wednesday 0 1 0 0 6
代码
self.start = datetime(2022, 1, 1)
self.end = datetime(2022, 3, 31)
self.file = 'timetable.csv'
self.sdf = pd.read_csv(self.file, header=0, index_col="Name")
self.subject_frequency = self.sdf.apply(pd.value_counts).fillna(0)
print(self.subject_frequency.to_string())
self.subject_frequency["sum"] = self.subject_frequency.sum(axis=1)
self.p = self.sdf.melt(var_name='Freq', value_name='Data', ignore_index=False).assign(variable=1)
.pivot_table('Freq', 'Name', 'Data', fill_value=0, aggfunc='count')
print(self.p.to_string())
所需表格
classes ...
Data Mining 32
Data Science 32
Embedded Systems 32
Industrial Psychology 32
Project 146
稍后将添加更多列,如当前出勤率、每节课缺课的百分比下降、周一、周二休假的百分比损失。。。以便从出席百分比中减去它们。
最终目标是分析哪一天可以安全休假,并监控我的百分比。如果我的方向可以更好,请给我建议。
一种可能的方法是像使用bdate_range
一样使用weekday
来选择工作日(0-4(和map
,这些数字对应于它们的工作日名称;然后reindex
频率表。然后你得到一个DataFrame,其中每行对应2022-1-1和2022-3-31之间的一个工作日。然后sum
找到每个类的总数:
out = (freqtable.reindex(pd.bdate_range('2022-1-1','2022-3-31').weekday
.map(dict(enumerate(['Monday','Tuesday','Wednesday','Thursday','Friday']))))
.sum()
.rename_axis(['classes']).reset_index(name='count'))
输出:
classes count
0 Data Mining 51
1 Data Science 51
2 Embedded Systems 51
3 Industrial Psychology 51
4 Project 244
select_rows = [date.strftime("%A") for date in pd.bdate_range(self.start, self.end)]
r = self.p.loc[select_rows, :]
print(r.to_string())
print(r.sum())
请随意添加一个更简单的代码,设计建议也很感激!