Python脚本将txt文件的一部分提取到excel工作表中



我有一个txt文件,如下所示:


category test_1
aaa.com; test info - tw
bbb.com; test info - al
category test_2
ccc.com; test info - al
ddd.com; test info - tw
eee.com; test info - tw
category test_3
fff.com; test info - tw
ggg.com; test info - al
hhh.com; test info - tw
iii.com; test info - al

我需要帮助创建一个Python脚本,该脚本提取txt文件的一部分并将其导出到excel文件。例如,如果我想导出类别"test_1"中的条目,脚本将在excel文件中生成以下输出


|    A.   |       B.       |   C.  |
---------------------------------------
1. | aaa.com | test info - tw |       |
---------------------------------------
2. | bbb.com | test info - al |       |
---------------------------------------
3. |         |                |       |

我尝试使用下面的代码

我的txt文件以autotest.txt 的形式保存在桌面上


import pandas as pd
df = pd.read_csv(‘C:UsersA12345Desktopautotest.txt’)
df.to_excel(‘output.xlsx’, ‘Sheet1’)

当我运行此代码时,它不会创建excel文件。我还尝试在桌面上添加一个名为"output.xlsx"的excel文件,当我运行脚本时,它也没有将文本添加到excel文件中。

可以将该唯一格式转换为csv,并将"category"作为关键字


file=open("text_file.txt",'r')
data=file.read()
categories=data.split('category')#One approach, if a 'category' string is present
dict_format={}
for categor_data in categories:
items=categor_data.split('n') #split to lines
dict_format[items[0].replace(" ", "")]=items[1:]#removes spaces from name of categories

for name in dict_format:
print(name)
print("which category to export to.csv format?")
answer=input()
with open(answer+".csv",'w') as csv:
for row in dict_format[answer][:-1]:
if row != "": #if not empty.
csv.write(row.replace(";",",")+"n")

csv.write(dict_format[answer][-1].replace(";",","))
csv.close()
#Now you should be able convert that csv file to xlsx using pandas

控制台窗口:

>>>run.py
test_1
test_2
test_3
which category to export to.csv format?
test_1
>>> 

test_1.csv文件的文本格式如下:


aaa.com, test info - tw
bbb.com, test info - al

我使用了模块XlsxWriter;您可以使用pip3 install XlsxWriter进行安装。我写的代码按预期工作:

import xlsxwriter 
# this is used to filter. The code expect for the category num, such as 1, 2 or 3
num = input('Give me category number: ')
# you can do checks here if input should be something different
num = int(num)
start_portion_line = 'category test_{}'.format(num) 
end_portion_line = 'category test_{}'.format(num + 1) 
start_index = 0
end_index = 0
with open('path/to/your/txt/file', 'r') as f:
lines = f.readlines()
# find indexes that define the wanted portion
for i,line in zip(range(len(lines)), lines):
if line.strip() == start_portion_line:
start_index = i
elif line.strip() == end_portion_line:
end_index = i - 1
if end_index == 0:
end_index = len(lines)
# getting only the wanted lines
lines = lines[start_index:end_index]
# removing blank lines
while 'n' in lines:
lines.remove('n')
workbook = xlsxwriter.Workbook('output.xlsx')
worksheet = workbook.add_worksheet()
for i,line in zip(range(len(lines)), lines):
# removing initial spaces
line = line.strip()
# separating tokens
columns = line.split(';')
# writing
for col,j in zip(columns, range(len(columns))):
worksheet.write(i, j, col)
workbook.close()

相关内容

最新更新