我是Python的新手,想写一个以.txt文件为输入并将结果输出到.csv文件的脚本。
.txt文件如下所示
text:eub1
region:euboea
μενανδρεσεμεεποισε
我想写一个脚本,在上面的第三行中为μ或Γ的每个实例创建一个新行。我还希望每一行都包含text
和region
标识符。所以结果应该是这样的:
text,region,letter
eub1,euboea,μ
eub1,euboea,ν
eub1,euboea,μ
我真的不知道从哪里开始编码,所以如果能给我任何关于如何做到这一点的建议,我将不胜感激。
尝试:
import pandas as pd
data = {}
with open("your_file.txt", "r") as f_in:
for line in map(str.strip, f_in):
if line == "":
continue
if line.startswith("text:"):
data["text"] = line.split(":", maxsplit=1)[-1]
elif line.startswith("region:"):
data["region"] = line.split(":", maxsplit=1)[-1]
else:
data["letter"] = [ch for ch in line if ch in "μν"]
df = pd.DataFrame(data)
print(df)
df.to_csv("data.csv", index=False)
打印:
text region letter
0 eub1 euboea μ
1 eub1 euboea ν
2 eub1 euboea ν
3 eub1 euboea μ
并保存data.csv
:
text,region,letter
eub1,euboea,μ
eub1,euboea,ν
eub1,euboea,ν
eub1,euboea,μ
your_file.txt
:的含量
text:eub1
region:euboea
μενανδρεσεμεεποισε
编辑:从此文件加载:
text:eub1
region:euboea
μενανδρεσεμεεποισε
text:eub2
region:xxx
μμμ
text:eub3
region:zzz
abc
你可以试试:
import pandas as pd
data = {}
with open("your_file.txt", "r") as f_in:
for line in map(str.strip, f_in):
if line == "":
continue
if line.startswith("text:"):
data.setdefault("text", []).append(line.split(":", maxsplit=1)[-1])
elif line.startswith("region:"):
data.setdefault("region", []).append(
line.split(":", maxsplit=1)[-1]
)
else:
data.setdefault("letter", []).append(
[ch for ch in line if ch in "μν"]
)
df = pd.DataFrame(data).explode("letter")
print(df)
df.to_csv("data.csv", index=False)
打印:
text region letter
0 eub1 euboea μ
0 eub1 euboea ν
0 eub1 euboea ν
0 eub1 euboea μ
1 eub2 xxx μ
1 eub2 xxx μ
1 eub2 xxx μ
2 eub3 zzz NaN
并保存data.csv
:
text,region,letter
eub1,euboea,μ
eub1,euboea,ν
eub1,euboea,ν
eub1,euboea,μ
eub2,xxx,μ
eub2,xxx,μ
eub2,xxx,μ
eub3,zzz,