我有这样的数据文件:
# coating file for detector A/R
# column 1 is the angle of incidence (degrees)
# column 2 is the wavelength (microns)
# column 3 is the transmission probability
# column 4 is the reflection probability
14.2000 0.531000 0.0618000 0.938200
14.2000 0.532000 0.0790500 0.920950
14.2000 0.533000 0.0998900 0.900110
# it has lots of other lines
# datafile can be obtained from pastebin
输入数据文件的链接是:http://pastebin.com/nanbem3e
我喜欢从此输入创建20个文件,以使每个文件都有注释行。
是:
#out1.txt
#comments
first part of one-twentieth data
# out2.txt
# given comments
second part of one-twentieth data
# and so on upto out20.txt
我们如何在python中这样做?
我的精力尝试就是这样:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# Author : Bhishan Poudel
# Date : May 23, 2016
# Imports
from __future__ import print_function
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# read in comments from the file
infile = 'filecopy_multiple.txt'
outfile = 'comments.txt'
comments = []
with open(infile, 'r') as fi, open (outfile, 'a') as fo:
for line in fi.readlines():
if line.startswith('#'):
comments.append(line)
print(line)
fo.write(line)
#==============================================================================
# read in a file
#
infile = infile
colnames = ['angle', 'wave','trans','refl']
print('{} {} {} {}'.format('nreading file : ', infile, '','' ))
df = pd.read_csv(infile,sep='s+', header = None,skiprows = 0,
comment='#',names=colnames,usecols=(0,1,2,3))
print('{} {} {} {}'.format('length of df : ', len(df),'',''))
# write 20 files
df = df
nfiles = 20
nrows = int(len(df)/nfiles)
groups = df.groupby( np.arange(len(df.index)) / nrows )
for (frameno, frame) in groups:
frame.to_csv("output_%s.csv" % frameno,index=None, header=None,sep='t')
到现在为止,我有二十个分裂的文件。我只想将评论行复制到每个文件。但是问题是:how to do so?
应该比创建仅使用注释的其他20个输出文件并将二十splittit_files附加到它们。
的方法更容易。一些有用的链接如下:
如何将数据框列分为多列
如何在Python中拆分数据框列
拆分大型熊猫数据框
更新:优化代码
fn = r'D:downloadinput.txt'
with open(fn, 'r') as f:
data = f.readlines()
comments_lines = 0
for line in data:
if line.strip().startswith('#'):
comments_lines += 1
else:
break
nfiles = 20
chunk_size = (len(data)-comments_lines)//nfiles
for i in range(nfiles):
with open('d:/temp/output_{:02d}.txt'.format(i), 'w') as f:
f.write(''.join(data[:comments_lines] + data[comments_lines+i*chunk_size:comments_lines+(i+1)*chunk_size]))
if i == nfiles - 1 and len(data) > comments_lines+(i+1)*chunk_size:
f.write(''.join(data[comments_lines+(i+1)*chunk_size:]))
原始答案:
comments = []
data = []
with open('input.txt', 'r') as f:
data = f.readlines()
i = 0
for line in data:
if line.strip().startswith('#'):
comments.append(line)
i += 1
else:
break
data[:] = data[i:]
i=0
for x in range(0, len(data), len(data)//20):
with open('output_{:02d}.txt'.format(i), 'w') as f:
f.write(''.join(comments + data[x:x+20]))
i += 1
应该这样做
# Store comments in this to use for all files
comments = []
# Create a new sub list for each of the 20 files
data = []
for _ in range(20):
data.append([])
# Track line number
index = 0
# open input file
with open('input.txt', 'r') as fi:
# fetch all lines at once so I can count them.
lines = fi.readlines()
# Loop to gather initial comments
line = lines[index]
while line.split()[0] == '#':
comments.append(line)
index += 1
line = lines[index]
# Calculate how many lines of data
numdata = len(lines) - len(comments)
for i in range(index, len(lines)):
# Calculate which of the 20 files I'm working with
filenum = (i - index) * 20 / numdata
# Append line to appropriately tracked sub list
data[filenum].append(lines[i])
for i in range(1, len(data) + 1):
# Open output file
with open('output{}.txt'.format(i), 'w') as fo:
# Write comments
for c in comments:
fo.write(c)
# Write data
for line in data[i-1]:
fo.write(line)