一个txt文件中有两种不同类型的数据，我如何使用panda来插入每一行并添加相应的数据

我最近为我当地的健身房获取了数据，我正试图将数据标准化；健身房注册"；对象，该对象包含注册该会话的所有人员。

文本文件如下所示：https://pastebin.com/YcnSJiA7

Sep 30th  '20 at 9:00AM Until Sep 30th  '20 at 10:00AM
JD  John Doe    
AW  Alice Wonderland    
IM  Iron Man
Sep 30th  '20 at 8:00AM Until Sep 30th  '20 at 9:00AM
JD  John Doe    
AW  Alice Wonderland    
IM  Iron Man

我已经能够使用panda按列(姓名的首字母缩写(来分隔注册，但我不知道如何检测一行何时与时隙相对应，而不是与注册的人相对应。

因此，在程序运行后，每一行都应该由列[名称、名称、时隙的首字母]组成

对我来说，处理这些数据最简单的方法是使用这种格式，


JD  John Doe    Sep 30th  '20 at 9:00AM Until Sep 30th  '20 at 10:00AM
AW  Alice Wonderland    Sep 30th  '20 at 9:00AM Until Sep 30th  '20 at 10:00AM
IM  Iron Man    Sep 30th  '20 at 9:00AM Until Sep 30th  '20 at 10:00AM
JD  John Doe    Sep 30th  '20 at 8:00AM Until Sep 30th  '20 at 9:00AM
AW  Alice Wonderland    Sep 30th  '20 at 8:00AM Until Sep 30th  '20 at 9:00AM
IM  Iron Man      Sep 30th  '20 at 8:00AM Until Sep 30th  '20 at 9:00AM

我尝试迭代每一行，一旦出现一个时隙行，我就会将该行附加到下一行，直到出现一个新的时隙。

def testSort():
with open("1-weak-gym.txt") as fp:
id= []
totalSheet=[]
timeSlot = []
lastLine=[]
for ln in fp:
if ln.startswith("Sep"): ##this is a time slot
timeSlot.clear()
timeSlot.append(ln[0:]) ##save that time slot as the lastDate variable
else:
if (timeSlot):
totalSheet.append(timeSlot) ##append the time slot
totalSheet.append(ln[0:]) ##append the name line
else:
print('Hello eror')
print(totalSheet, file=open("newOuput.txt","a"))

您可以尝试这种方法(如果在标题行的末尾有一个带有时间的强模式(：

import re
def is_time_format(s):
time_re = re.compile(r'b((1[0-2]|0?[1-9]):([0-5][0-9])([AaPp][Mm]))')
return bool(time_re.match(s))
with open("1-weak-gym.txt") as fp:
new_lines = []
extra_info = ''
for line in fp:
last_bit = line.split(' ')[-1]
if is_time_format(last_bit):
extra_info = line
continue
else:
new_lines.append(line.rstrip() + 't' + extra_info)
open("newOutput", 'w').writelines(new_lines)

然后你会得到一个正确格式的文件。

相关内容

最新更新

热门标签：