我有一个CSV文件,看起来像这样
S1, 22, MD , 0.022, , 523.324
S2, 22, MD , 4.32, , 342.54
S3, 22, MD , 3.54, , 0.32
S4, 22, MD , 4.32, , 0.54
S1, 33, MD , 5.32, , 0.43
S2, 33, MD , 11.54, , 0.65
S3, 33, MD , 22.5, , 0.324
S4, 33, MD , 45.89 , 0.32
S1, 44, MD , 3.53 , 3.32
S2, 44, MD , 4.5 , 0.322
S3, 44, MD , 43.65 , 45.78
S4, 44, MD, 43.54 , 0.321
文件没有任何头,但是我不关心MD列
我需要我的输出文件看起来像这样: Size , S1` , S2 , S3 , S4
22 , 0.022 , 4.32 , 45.89 , 4.32
33 , 5.32, 11.54 , 22.5, 45.89,
44 , 3.53, 4.5, 43.65, 43.54
3 values, 3 values, 3,values, 3 values
可以看到,输出文件包含头文件。最后一行还表示每列中值的总数。
到目前为止我的代码:
import pandas as pd
import numpy as np
导入csvdf = pd.read_csv (r 'C: testuser 用户桌面 file.csv’,usecols = [0, 1, 2, 3, 4])
df.columns=pd.MultiIndex.from_tuples(zip(['Names','FileSize','x','y','z'],df.columns)) #添加列标题…(这个做错了)
df_out=df.groupby('Names','FileSize').count().reset_index() #假设打印不同的值
df_out.to_csv (processed_data_out.csv,列("名字","文件大小","x",' y ', ' z '],头= False,指数= False)
我没有使用输出中的最后一列,因为如果用户要求查看该信息,应该生成该列。
Pandas
方法在这方面非常好。
读取数据:
import pandas as pd
df = pd.read_csv('data_in.csv', names=['Label','Requirements'], skiprows=1) # This assumes and skips the header row ('TSD' in your question)
>>> df
Label Requirements
0 A 1
1 A 2
2 A 3
3 A 4
4 A 5
5 B 11
6 B 22
7 B 45
8 C NaN
9 C NaN
10 C NaN
数要求:
df_out = df.groupby('Label').count().reset_index()
>>> df_out
Label Requirements
0 A 5
1 B 3
2 C 0
根据需要设置格式:
df_out['Output'] = df_out.apply(lambda row: '%s doesn't have any requirement'%(row['Label']) if row['Requirements']==0 else '%s has %d requirements'%(row['Label'],row['Requirements']), axis=1)
>>> df_out
Label Requirements Output
0 A 5 A has 5 requirements
1 B 3 B has 3 requirements
2 C 0 C doesn't have any requirement
导出为CSV:
df_out.to_csv('processed_data_out.csv', columns=['Output'], header=False, index=False)
我建议使用字典:
my_dict = {}
with open(your_file, 'r') as infile:
for line in infile:
line_list = line.split(' ')
if len(line_list) == 2:
key, requirement = line_list
if key in my_dict:
my_dict[key] += 1
else:
my_dict[key] = 0
elif len(line_list) == 1:
key = line_list[0]
if key not in my_dict:
my_dict[key] = 0
然后将字典my_dict
写入另一个csv文件…
编辑:这是假设你有一个空格分隔的文件,但你可以改变分隔符在line.split(' ')
的任何分隔符…