逐行浏览数据帧



我有一个文本文件里面有多个表,我想知道一个好的方法是什么?下面是文本文件的示例:

Employee Table:
Name Description Type
Bob  Employee    Standard
Jim  Employee    Standard
james Employee   Standard

Tools:
Item    Serial  Tag
Battery  0101    B.
Drill    9292    D.
Phone    8464    P.
Locations:
Station code len
West     12   9
North     1   9
East     21   9

我最初尝试通过索引进行拼接:

instance_of_employee=df.loc[df['x'].str.contains("Employee table", case=True, na=False)]
employees=df.loc[instance_of_employee.index[0]:Instance_of_tools.index[0]-1 ]

但是我发现这些文件可以随机排列。然而,名字总是一样的……"员工表"、"工具"one_answers"位置">

是否有可能逐行读取数据帧,然后在这些标题存在的地方,使它们成为新的数据帧?

这非常接近您的"逐行读取文件并为每个部分创建新数据框架"的想法:

def parse_file(path):
from collections import defaultdict
from io import StringIO

data = {}
with open(path) as fp:
section, content = None, ""
for line in fp:
if line.endswith(":n"):
section = line[:-2]
content = ""
elif line == "n" and section:
data[section] = pd.read_csv(StringIO(content), sep="s+")
section, content = None, ""
else:
content += line

return data

函数返回一个字典,其键是节名,值是表示该节的数据帧:

data = parse_file("data.txt")
data["Employee Table"] # returns the Employee Table section
data["Tools"]          # returns the Tools section

https://regex101.com/r/WELf2x/1

import re
import pandas as pd
from io import StringIO
data = []
with open(path) as fp:
s = fp.read()
regex = r"(?P<name>w[^:]+):(?P<df>[^:]+)$"
matches = re.finditer(regex, s, re.MULTILINE)
for matchNum, match in enumerate(matches, start=1):
name = match['name']
df = pd.read_csv(StringIO(match['df']), sep="s+")
data += [{name:df}]
print(data)

输出
[{'Employee Table':     
Name Description      Type
0    Bob    Employee  Standard
1    Jim    Employee  Standard
2  james    Employee  Standard}, 
{'Tools':       
Item  Serial Tag
0  Battery     101  B.
1    Drill    9292  D.
2    Phone    8464  P.}, 
{'Locations':   
Station  code  len
0    West    12    9
1   North     1    9
2    East    21    9}]

最新更新