使用python将文本转换为csv



我有一个txt文件,看起来像这样:

Quod equidem non reprehendo;
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Quibus natura iure responderit non esse verum aliunde finem beate vivendi, a se principia rei gerendae peti; Quae enim adhuc protulisti, popularia sunt, ego autem a te elegantiora desidero. Duo Reges: constructio interrete. Tum Lucius: Mihi vero ista valde probata sunt, quod item fratri puto. Bestiarum vero nullum iudicium puto. Nihil enim iam habes, quod ad corpus referas; Deinde prima illa, quae in congressu solemus: Quid tu, inquit, huc? Et homini, qui ceteris animantibus plurimum praestat, praecipue a natura nihil datum esse dicemus?
=========================================================================
Planet   Number   festival   animal
colour     book
Mercury  First    firecrack  phone
Venus    Last     kite       computer
Earth    Country  rangoli    tv
Jupiter  C.COD     bomb       
---------------------------------------------------------------------
11      4526      diwali      dog
holi        bigb
12      Joe       diwali      111
45      Doe       sankaranti  acer
65      UK        diwali      pan
67      22        diwali      
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Planet   Number   festival   animal
colour     book
Mercury  First    firecrack  phone
Venus    Last     kite       computer
Earth    Country  rangoli    tv
Jupiter  C.COD     bomb     
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
45       5637      ganesh    tiger
holi      cinema
67       micael    holi      222
78       john      diwali    xamoi
90       france    diwali    hp
34       34        diwali

我想把这个文本文件转换成csv格式。我想展示的输出:输出:输出

我的代码:

from itertools import groupby, chain
with open("file.txt", "r") as fin,
open("file.csv", "w") as fout:
for key, group in groupby(fin, key=lambda line: bool(line.strip())):
if key:
zipped = zip(*(line.rstrip().split() for line in group))
fout.write(",".join(chain(*zipped)) + "n")

这将满足您的要求。这只是一个收集字段的问题,直到我们得到写入它们的触发器,AND忽略开头的文本,AND忽略除第一个之外的所有标题。

fin = open('file.txt')
fout = open('file.csv','w')
gather = []
skipping = True
first = True
for line in fin:
if skipping:
skipping = line.find('====') < 0
elif line.find('----') >= 0:
if gather and (first or gather[0] != 'Planet'):
print( ','.join(gather), file=fout )
gather = []
first = False
else:
gather.extend( line.strip().split() )
if gather:
print( ','.join(gather), file=fout )

文件的相关块似乎有一个大致固定宽度的列结构,因此您可以尝试在它们上使用pandas.read_fwf

from io import StringIO
from itertools import groupby
import pandas as pd
def keep(line): return bool(line.strip()) and not line.startswith("---")
with open('file.txt', 'r') as fin,
open('file.csv','w') as fout:
while True:
if next(fin).startswith("==="): break
first = True
for key, group in groupby(fin, key=keep):
if key:
line = ",".join(
pd.read_fwf(StringIO("".join(group)), header=None)
.stack().sort_index(level=1).dropna().astype(str)
.str.replace(r"^(-?d+).0+$", r"1", regex=True)
) + "n"
if first:
header, first = line, False
fout.write(line)
elif line != header:
fout.write(line)

file.csv:中的结果

Planet,Mercury,Venus,Earth,Jupiter,Number,First,Last,Country,C.COD,festival,colour,firecrack,kite,rangoli,bomb,animal,book,phone,computer,tv
11,12,45,65,67,4526,Joe,Doe,UK,22,diwali,holi,diwali,sankaranti,diwali,diwali,dog,bigb,111,acer,pan
45,67,78,90,34,5637,micael,john,france,34,ganesh,holi,holi,diwali,diwali,diwali,tiger,cinema,222,xamoi,hp

如果您不关心数字格式,则可以删除.str.replace(r"^(-?d+).0+$", r"1", regex=True)

但是:这真的是你文件的真实格式吗?

我相信您可以使用Pandas-lib将txt文件转换为csv

# importing panda library
import pandas as pd

# readinag given csv file
# and creating dataframe
dataframe1 = pd.read_csv("input_file.txt")

# storing this dataframe in a csv file
dataframe1.to_csv('output_file.csv', 
index = None)

相关内容

  • 没有找到相关文章

最新更新