我有一个问题陈述如下:
在每个考试中心,考试将分两班进行;批次II(报告时间上午9:00和下午2点(。考试可以在2020年12月1日至30日期间的任何一天在一个地区进行,具体取决于该地区的考生人数。请注意,每个地区只能有一个考试中心,一个班次最多可以有20名学生参加。根据上述信息,通过分配完成考试数据库:
- Rollno:候选人的卷号将从NL2000001开始(例如:NL2000001、NL2000002、NL2000003……(
- cent_allot:通过输入考试城市代码来分配中心
- cent_add:放置NL";地区名称";作为每个位置的中心地址(例如,如果地区名称为ADI,则中心地址为NL ADI(
- 考试日期:分配2020年12月12020日至12月30日之间的任何考试日期,保持最低考试天数,且不违反上述任何条件
- 批次:分配批次I或II,确保满足上述所有条件
- rep_time:第一批报告时间为上午9点,第二批报告时间是下午2点
根据以上描述,我需要制作一个满足上述条件的表。我已经制作了Rollno、cent_allot和cent_add列,但我正在努力制作examDate栏,因为每40个地区值都应该有相同的日期。
以下是地区及其发生频率的列表:
Dist Count
WGL 299
MAHB 289
KUN 249
GUN 198
KARN 196
KRS 171
CTT 169
VIZ 150
PRA 145
NALG 130
MED 128
ADI 123
KPM 119
TRI 107
ANA 107
KHAM 85
NEL 85
VIZI 84
EGOD 84
SOA 84
SIR 80
NIZA 73
PUD 70
KRK 69
WGOD 56
以下是数据帧的前25行:
Rollno cent_allot cent_add examDate batch rep_time
NL2000001 WGL NL WGL NaN NaN NaN
NL2000002 WGL NL WGL NaN NaN NaN
NL2000003 WGL NL WGL NaN NaN NaN
NL2000004 KUN NL KUN NaN NaN NaN
NL2000005 KUN NL KUN NaN NaN NaN
NL2000006 KUN NL KUN NaN NaN NaN
NL2000007 GUN NL GUN NaN NaN NaN
NL2000008 GUN NL GUN NaN NaN NaN
NL2000009 GUN NL GUN NaN NaN NaN
NL2000010 GUN NL GUN NaN NaN NaN
NL2000011 VIZ NL VIZ NaN NaN NaN
NL2000012 VIZ NL VIZ NaN NaN NaN
NL2000013 VIZ NL VIZ NaN NaN NaN
NL2000014 VIZ NL VIZ NaN NaN NaN
NL2000015 MAHB NL MAHB NaN NaN NaN
NL2000016 MAHB NL MAHB NaN NaN NaN
NL2000017 MAHB NL MAHB NaN NaN NaN
NL2000018 WGOD NL WGOD NaN NaN NaN
NL2000019 WGOD NL WGOD NaN NaN NaN
NL2000020 WGOD NL WGOD NaN NaN NaN
NL2000021 WGOD NL WGOD NaN NaN NaN
NL2000022 EGOD NL EGOD NaN NaN NaN
NL2000023 EGOD NL EGOD NaN NaN NaN
NL2000024 EGOD NL EGOD NaN NaN NaN
NL2000025 EGOD NL EGOD NaN NaN NaN
最后3列都是NaN,因为这三列尚未制作。
让我们以WGL
为例。根据以上描述,每个地区每班最多允许20名候选人,这意味着每个地区将分配40次相同的日期,每个地区需要分配20次相同的批次和相同的报告时间。
有人知道怎么做吗?
关键是使用.groupby().cumcount()
首先获取运行数。CCD_ 3和CCD_。
数据
使用每个CCD_ 5的给定总计数来生成随机行。
import numpy as np
import pandas as pd
import io
import datetime
df_count = pd.read_csv(io.StringIO("""
Dist Count
WGL 299
MAHB 289
KUN 249
GUN 198
KARN 196
KRS 171
CTT 169
VIZ 150
PRA 145
NALG 130
MED 128
ADI 123
KPM 119
TRI 107
ANA 107
KHAM 85
NEL 85
VIZI 84
EGOD 84
SOA 84
SIR 80
NIZA 73
PUD 70
KRK 69
WGOD 56
"""), sep=r"s{2,}", engine="python")
# generate random cent_allot
df = df_count.loc[np.repeat(df_count.index.values, df_count["Count"]), "Dist"]
.sample(frac=1)
.reset_index(drop=True)
.to_frame()
.rename(columns={"Dist": "cent_allot"})
df["Rollno"] = df.index.map(lambda s: f"NL2{s+1:06}")
df["cent_add"] = df["cent_allot"].map(lambda s: f"NL {s}")
df
到现在应该和你的一样。
代码
# Assign the first examDate
first_day = datetime.date(2020, 12, 1)
# running no. grouped by "cent_allot" (i.e. "Dist")
df["gp_no"] = df.groupby("cent_allot").cumcount()
# increase one day for every 40 records
df["examDate"] = df["gp_no"].apply(lambda x: first_day + datetime.timedelta(days=int(x / 40)))
# batch - can be determined by the even-ness of int(no. / 20)
df["batch"] = df["gp_no"].apply(lambda x: 1 + int(x / 20) % 2)
# map batch to time (or "9 AM" / "2 PM" as you'd like)
df["rep_time"] = df["batch"].apply(lambda x: datetime.time(9, 0) if x == 1 else datetime.time(14, 0))
输出
print(df[["Rollno", "cent_allot", "cent_add", "examDate", "batch", "rep_time"]])
Rollno cent_allot cent_add examDate batch rep_time
0 NL2000001 CTT NL CTT 2020-12-01 1 09:00:00
1 NL2000002 MAHB NL MAHB 2020-12-01 1 09:00:00
2 NL2000003 CTT NL CTT 2020-12-01 1 09:00:00
3 NL2000004 SOA NL SOA 2020-12-01 1 09:00:00
4 NL2000005 PUD NL PUD 2020-12-01 1 09:00:00
... ... ... ... ... ...
3345 NL2003346 KHAM NL KHAM 2020-12-03 1 09:00:00
3346 NL2003347 ADI NL ADI 2020-12-04 1 09:00:00
3347 NL2003348 KARN NL KARN 2020-12-05 2 14:00:00
3348 NL2003349 SIR NL SIR 2020-12-02 2 14:00:00
3349 NL2003350 ADI NL ADI 2020-12-04 1 09:00:00
[3350 rows x 6 columns]
我为获得解决方案而苦苦挣扎,但最终在那天结束时,当我问这个问题时,我找到了一个解决方案:
# examDate column
n_stud = 20 # mention the number of students per batch here
n_batch = 2 # mention the number of batches per day here
temp = data['TH_CENT_CH'].value_counts().sort_index().reset_index() # storing centers and their counts in a temp variable
cent = temp['index'].to_list() # storing centers in a list
cnt = temp['TH_CENT_CH'].to_list() # storing counts in a list
cent1 = []
cnt1 = []
j = 0
# for loops to repeat each center by count times
for c in cent:
for i in range(1, cnt[j] + 1):
cent1.append(c)
cnt1.append(i)
j += 1
df1 = pd.DataFrame(list(zip(cent1, cnt1)), columns = ['cent','cnt']) # dataframe to store the centers and new count list
counts = df1['cnt'].to_list() # storing the new counts in a list
helper = {} # helper dictionary
max_no = max(cnt)
# for-while loops to map helper number to each counts number
for i in counts:
j = 0
while(j < (round(max_no / (n_stud * n_batch)) + 1)):
if((i > (n_stud * n_batch * j)) & (i < (n_stud * n_batch * (i + 1)))):
helper[i] = j
j += 1
# mapping the helper with counts
counts = pd.Series(counts)
helper = pd.Series(helper)
hel = counts.map(helper).to_list()
df1['helper'] = hel
examDate = {} # dictionary to store exam dates
# for loop to map dates to each helper number
for i in hel:
examDate[i] = pd.to_datetime(date(2020, 12, 1) + timedelta(days = (2 * i)))
# mapping the dates with helpers
hel = pd.Series(hel)
examDate = pd.Series(examDate)
exam = hel.map(examDate).to_list()
df1['examDate'] = exam
# adding the dates to the original dataframe
examDate = df1['examDate'].to_list()
data['examDate'] = examDate
data['examDate']
这里TH_CENT_CH
是指原始数据帧中的区域列。当我运行data.head()
时,我得到了我需要的输出,即40名学生的一次约会。我对剩下的两个专栏做了类似的事情,我需要为20名学生提供相同的批次。所以我得到的输出是:
Rollno cent_allot cent_add examDate batch rep_time
0 NL2000001 ADI NL ADI 2020-12-01 1 09:00:00
1 NL2000002 ADI NL ADI 2020-12-01 1 09:00:00
2 NL2000003 ADI NL ADI 2020-12-01 1 09:00:00
3 NL2000004 ADI NL ADI 2020-12-01 1 09:00:00
4 NL2000005 ADI NL ADI 2020-12-01 1 09:00:00
... ... ... ... ... ...
3345 NL2003346 WGOD NL WGOD 2020-12-03 1 09:00:00
3346 NL2003347 WGOD NL WGOD 2020-12-04 1 09:00:00
3347 NL2003348 KRS NL KRS 2020-12-05 1 09:00:00
3348 NL2003349 WGOD NL WGOD 2020-12-02 1 09:00:00
3349 NL2003350 WGOD NL WGOD 2020-12-04 1 09:00:00
请找到其余两列的代码:
# batch column
counts = df1['cnt'].to_list() # storing the new counts in a list
helper2 = {} # helper dictionary
# for-while loops to map helper number to each counts number
for i in counts:
j = 0
while(j < (round(max_no / (n_stud)) + 1)):
if((i > (n_stud * j)) & (i < (n_stud * (i + 1)))):
helper2[i] = j
j += 1
# mapping the helper with counts
counts = pd.Series(counts)
helper2 = pd.Series(helper2)
hel2 = counts.map(helper2).to_list()
df1['helper2'] = hel2
batch = {} # dictionary to store batch numbers
# for loop to map batch numbers to each helper number
for i in hel2:
if(i % 2 == 0):
batch[i] = 1
else:
batch[i] = 2
# mapping the batches with helpers
hel2 = pd.Series(hel2)
batch = pd.Series(batch)
bat = hel2.map(batch).to_list()
df1['batch'] = bat
# adding the batches to the original dataframe
batch = df1['batch'].to_list()
data['batch'] = batch
data['batch'].unique()
# rep_time column
data.loc[data['batch'] == 1, 'rep_time'] = '9:00 AM'
data.loc[data['batch'] == 2, 'rep_time'] = '2:00 PM'
data['rep_time'].unique()