Python将编辑过的nan写入到.csv变量中



我试图将。csv中的NULL值转换为NaN,然后用这些编辑保存文件。下面代码中的fNaN值在数据中的正确位置。但是,我无法将其保存为。csv。错误打印在代码下面。

#take .csv with NULL and replaces with NaN - write numerical and NaN values to .csv
import csv
import numpy as np
import pandas
f = pandas.read_csv('C:Usersmmso2Google DriveMABL Wind_Semester 2 2016Wind Farm InfoDataBDataB - Copy.csv')#convert file to variable so it can be edited
outfile = open('C:Usersmmso2Google DriveMABL Wind_Semester 2 2016Wind Farm InfoDataBDataB - NaN1.csv','wb')#create empty file to write to
writer = csv.writer(outfile)#writer will write when given the data to write below
result = f[f is 'NULL'] = np.nan
writer.writerows(f)
错误:

 Traceback (most recent call last):
  File "C:/Users/mmso2/Google Drive/MABL Wind/_Semester 2 2016/_PGR Training/CENTA/MATLAB/In class ex/SAR_data/gg_nan.py", line 12, in <module>
    writer.writerows(f)
_csv.Error: sequence expected

csv.writer.writerows()期望一个序列序列(一个行对象序列),而pandas.DataFrame不是,因为它在迭代时返回一个列名序列:

In [23]: df = pd.DataFrame({'A': range(10)})
In [24]: for x in df: 
    print(x)
   ....:     
A

这可能会悄无声息地影响您,因为字符串序列实际上是序列序列,因此您最终会得到一个CSV文件,其中包含由列名的字母组成的行。在您的情况下,它失败了,因为您试图替换'NULL'字符串,最终添加了一个列的标签False(布尔值)。

要遍历行元组,可以使用DataFrame.itertuples():

In [27]: for x in df.itertuples(index=False):
    print(x)
   ....:     
(0,)
(1,)
(2,)
...

最简单的方法是使用DataFrame.to_csv():

filename = 'C:Usersmmso2Google DriveMABL Wind_Semester 2 2016Wind Farm InfoDataBDataB - NaN1.csv'
f.to_csv(filename, na_rep='NaN')  # default representation for nans is ''

注意,要替换'NULL'值,必须使用相等运算符而不是标识运算符is:

f[f == 'NULL'] = np.nan

使用标识符将有效地添加一个标记为False的新列,所有值设置为nan:

In [42]: df = pd.DataFrame({'A': ['NULL', 1] * 10})
In [43]: df[df is 'NULL'] = float('nan')
In [44]: df
Out[44]: 
       A  False
0   NULL    NaN
1      1    NaN
2   NULL    NaN
3      1    NaN
...
因为f is 'NULL'求值为False而不是新的DataFrame

我只是在学习Python,但你可以在阅读文件时替换null值,如:

file = pd.read_csv('filename.csv', na_values=['NULL'])

甚至为每个列创建一个哨兵值字典,例如:

sentinels = {'column1':['Na', 'empty field'], 'somecolumn':['othervalues']}
file = pd.read_csv('filename.csv', na_values=sentinels)

最新更新