我试图将。csv中的NULL
值转换为NaN
,然后用这些编辑保存文件。下面代码中的f
的NaN
值在数据中的正确位置。但是,我无法将其保存为。csv。错误打印在代码下面。
#take .csv with NULL and replaces with NaN - write numerical and NaN values to .csv
import csv
import numpy as np
import pandas
f = pandas.read_csv('C:Usersmmso2Google DriveMABL Wind_Semester 2 2016Wind Farm InfoDataBDataB - Copy.csv')#convert file to variable so it can be edited
outfile = open('C:Usersmmso2Google DriveMABL Wind_Semester 2 2016Wind Farm InfoDataBDataB - NaN1.csv','wb')#create empty file to write to
writer = csv.writer(outfile)#writer will write when given the data to write below
result = f[f is 'NULL'] = np.nan
writer.writerows(f)
错误: Traceback (most recent call last):
File "C:/Users/mmso2/Google Drive/MABL Wind/_Semester 2 2016/_PGR Training/CENTA/MATLAB/In class ex/SAR_data/gg_nan.py", line 12, in <module>
writer.writerows(f)
_csv.Error: sequence expected
csv.writer.writerows()
期望一个序列序列(一个行对象序列),而pandas.DataFrame
不是,因为它在迭代时返回一个列名序列:
In [23]: df = pd.DataFrame({'A': range(10)})
In [24]: for x in df:
print(x)
....:
A
这可能会悄无声息地影响您,因为字符串序列实际上是序列序列,因此您最终会得到一个CSV文件,其中包含由列名的字母组成的行。在您的情况下,它失败了,因为您试图替换'NULL'字符串,最终添加了一个列的标签False
(布尔值)。
要遍历行元组,可以使用DataFrame.itertuples()
:
In [27]: for x in df.itertuples(index=False):
print(x)
....:
(0,)
(1,)
(2,)
...
最简单的方法是使用DataFrame.to_csv()
:
filename = 'C:Usersmmso2Google DriveMABL Wind_Semester 2 2016Wind Farm InfoDataBDataB - NaN1.csv'
f.to_csv(filename, na_rep='NaN') # default representation for nans is ''
注意,要替换'NULL'
值,必须使用相等运算符而不是标识运算符is
:
f[f == 'NULL'] = np.nan
使用标识符将有效地添加一个标记为False
的新列,所有值设置为nan:
In [42]: df = pd.DataFrame({'A': ['NULL', 1] * 10})
In [43]: df[df is 'NULL'] = float('nan')
In [44]: df
Out[44]:
A False
0 NULL NaN
1 1 NaN
2 NULL NaN
3 1 NaN
...
f is 'NULL'
求值为False
而不是新的DataFrame
。我只是在学习Python,但你可以在阅读文件时替换null值,如:
file = pd.read_csv('filename.csv', na_values=['NULL'])
甚至为每个列创建一个哨兵值字典,例如:
sentinels = {'column1':['Na', 'empty field'], 'somecolumn':['othervalues']}
file = pd.read_csv('filename.csv', na_values=sentinels)