我在一个文件夹中有大约3000个文本文件。我想迭代每一个,得到每个的文件名,然后复制前两行,进行转置,然后在之前的结果下通过下一个结果?
一个文件的字段如下所示。
IDRSSD RIAD0497 RIAD4042 RIAD4136 RIAD4141 RIAD4146 RIAD4461
ADVERTISING & MARKETING EXPENSES RENT & OTHER INCOME FR OTHR REAL EST DIRECTORS FEES LEGAL FEES & EXPENSES FDIC DEPOSIT INSURANCE ASSESSMENTS 1ST ITEMIZED AMT OV25% OF ITEM 4078
我想把它变成这个。
file code field
C:UsersryansDownloadsFFIEC CDR Call Schedule RIE 03312001.txt IDRSSD
C:UsersryansDownloadsFFIEC CDR Call Schedule RIE 03312001.txt RIAD0497 ADVERTISING & MARKETING EXPENSES
C:UsersryansDownloadsFFIEC CDR Call Schedule RIE 03312001.txt RIAD4042 RENT & OTHER INCOME FR OTHR REAL EST
C:UsersryansDownloadsFFIEC CDR Call Schedule RIE 03312001.txt RIAD4136 DIRECTORS FEES
C:UsersryansDownloadsFFIEC CDR Call Schedule RIE 03312001.txt RIAD4141 LEGAL FEES & EXPENSES
C:UsersryansDownloadsFFIEC CDR Call Schedule RIE 03312001.txt RIAD4146 FDIC DEPOSIT INSURANCE ASSESSMENTS
C:UsersryansDownloadsFFIEC CDR Call Schedule RIE 03312001.txt RIAD4461 1ST ITEMIZED AMT OV25% OF ITEM 4078
我有一个代码示例,它从每个文件复制/粘贴前两行,但不进行转置。我认为代码的最终版本应该是这样的。。。
### mapping table for regulatory line items
import pandas as pd
import csv
import glob
import os
# Use a list here rather than a dataframe
results=[]
filelist = glob.glob("C:\Users\ryans\Downloads\*.txt")
number_of_lines = 2
for filename in filelist:
with open(filename) as myfile:
lines = myfile.readlines() # you can add strip() or other methods here
file_lines = []
print(file_lines)
for line in lines[:2]:
df = pd.DataFrame(lines[:2])
transposed = df.T
file_lines.append(transposed)
results.append([filename, *file_lines])
# You can build a dataframe from that list at the end if you desire
results_df = pd.DataFrame.from_records(results, columns=['filename', 'file_lines_1', 'file_lines_2'])
但这里有些不对劲。它似乎产生了一堆空列表。不确定这里发生了什么。有什么想法可以让我得到我想要的结果吗?谢谢
for filename in filelist:
with open(filename) as myfile:
lines = myfile.readlines() # you can add strip() or other methods here
file_lines = []
for line in lines[:2]:
file_lines.append(line)
results.append([filename, *file_lines])
number_of_lines=2<-这是一个不需要的变量