一个txt文件中有两种不同类型的数据,我如何使用panda来插入每一行并添加相应的数据



我最近为我当地的健身房获取了数据,我正试图将数据标准化;健身房注册";对象,该对象包含注册该会话的所有人员。

文本文件如下所示:https://pastebin.com/YcnSJiA7

Sep 30th  '20 at 9:00AM Until Sep 30th  '20 at 10:00AM
JD  John Doe    
AW  Alice Wonderland    
IM  Iron Man
Sep 30th  '20 at 8:00AM Until Sep 30th  '20 at 9:00AM
JD  John Doe    
AW  Alice Wonderland    
IM  Iron Man

我已经能够使用panda按列(姓名的首字母缩写(来分隔注册,但我不知道如何检测一行何时与时隙相对应,而不是与注册的人相对应。

因此,在程序运行后,每一行都应该由列[名称、名称、时隙的首字母]组成

对我来说,处理这些数据最简单的方法是使用这种格式,


JD  John Doe    Sep 30th  '20 at 9:00AM Until Sep 30th  '20 at 10:00AM
AW  Alice Wonderland    Sep 30th  '20 at 9:00AM Until Sep 30th  '20 at 10:00AM
IM  Iron Man    Sep 30th  '20 at 9:00AM Until Sep 30th  '20 at 10:00AM
JD  John Doe    Sep 30th  '20 at 8:00AM Until Sep 30th  '20 at 9:00AM
AW  Alice Wonderland    Sep 30th  '20 at 8:00AM Until Sep 30th  '20 at 9:00AM
IM  Iron Man      Sep 30th  '20 at 8:00AM Until Sep 30th  '20 at 9:00AM

我尝试迭代每一行,一旦出现一个时隙行,我就会将该行附加到下一行,直到出现一个新的时隙。

def testSort():
with open("1-weak-gym.txt") as fp:
id= []
totalSheet=[]
timeSlot = []
lastLine=[]
for ln in fp:
if ln.startswith("Sep"): ##this is a time slot
timeSlot.clear()
timeSlot.append(ln[0:]) ##save that time slot as the lastDate variable
else:
if (timeSlot):
totalSheet.append(timeSlot) ##append the time slot
totalSheet.append(ln[0:]) ##append the name line
else:
print('Hello eror')
print(totalSheet, file=open("newOuput.txt","a")) 

您可以尝试这种方法(如果在标题行的末尾有一个带有时间的强模式(:

import re
def is_time_format(s):
time_re = re.compile(r'b((1[0-2]|0?[1-9]):([0-5][0-9])([AaPp][Mm]))')
return bool(time_re.match(s))
with open("1-weak-gym.txt") as fp:
new_lines = []
extra_info = ''
for line in fp:
last_bit = line.split(' ')[-1]
if is_time_format(last_bit):
extra_info = line
continue
else:
new_lines.append(line.rstrip() + 't' + extra_info)
open("newOutput", 'w').writelines(new_lines)

然后你会得到一个正确格式的文件。

最新更新