我有一个转置的Dataframe tr:
使用正确的名称约定,我将更改您的代码后:
import numpy as np
import pandas as pd
import sys
if sys.version_info[0] < 3:
from StringIO import StringIO
else:
from io import StringIO
s = StringIO("""idx 7128 8719 14051 14636
JDUTC_0 2451957.36 2452149.36 2457243.98 2452531.89
JDUTC_1 2451957.37 2452149.36 2457243.99 2452531.90
JDUTC_2 2451957.37 2452149.36 2457244.00 2452531.91
JDUTC_3 NaN 2452149.36 NaN NaN
JDUTC_4 NaN 2452149.36 NaN NaN
JDUTC_5 NaN 2452149.36 NaN NaN
JDUTC_6 1.23 2452149.37 NaN NaN
JDUTC_7 NaN NaN NaN NaN
JDUTC_8 NaN NaN NaN NaN
JDUTC_9 NaN NaN NaN NaN""")
tr = pd.read_csv(s, sep="t", index_col=0)
(人们应该提供最少的工作代码-但经常忘记提供例如构建数据框架等和导入的代码)
:
a = {}
b = []
for name, values in tr.items():
b.clear() # this is problematic as you know
for ind, val in enumerate(values):
if np.isnan(val):
b.append(ind)
continue
else:
pass
a[name] = b
continue
和pass
是不必要的-它们只是说"继续";有了循环。在Python中,您不必强制给出else
分支:
for name, values in tr.items():
b.clear() # This is still problematic at this state.
for ind, val in enumerate(values):
if np.isnan(val):
b.append(ind)
a[name] = b
使用for循环的这种数据收集最好使用列表推导式来完成:
a = {}
for name, values in tr.items():
b = [ind for ind, val in enumerate(values) if np.isnan(val)]
a[name] = b
# now the result is already correct!
最后,您甚至可以为字典构建列表推导式当熟悉列表推导式时,使整个代码成为一行代码,但易于阅读:
a = {name: [i for i, x in enumerate(vals) if np.isnan(x)] for name, vals in tr.items()}
你可以看到结果:
a
# which returns:
{'7128': [3, 4, 5, 7, 8, 9],
'8719': [7, 8, 9],
'14051': [3, 4, 5, 6, 7, 8, 9],
'14636': [3, 4, 5, 6, 7, 8, 9]}
列表推导式正朝着函数式编程(FP)的方向发展。这正好处理了不应用突变(如b.append()
或b.clear()
方法)的问题,因为—正如您所看到的:您的案例演示了使用突变时如何容易生成错误。——并将有助于讨论——为什么FP——虽然乍一看似乎对大脑不友好——是这样的这是一种对大脑更友好的编程方式。
列表推导式是"map"的python形式。-如果你使用"if"内部列表推导——这在python中相当于"filter"FP的人知道这就像呼吸的第二个大脑。
问题是您将相同的列表分配给所有键。
a = {}
b=[] # < --- You create one Array/list 'b'
for _, contents in tr.items():
b.clear()
for ind, val in enumerate(contents):
if np.isnan(val):
b.append(ind)
continue
else:
pass
print(_)
print(b)
a[_] = b # <-- assign same array to all keys.
print(a)
查看我对上面代码的注释
b.clear()
这一行只是清除相同的数组,它不创建一个新的数组。
要按预期运行代码,请在循环中创建一个新的数组/列表。
a = {}
for _, contents in tr.items():
b = [] # <--- new array/list is created
for ind, val in enumerate(contents):
if np.isnan(val):
b.append(ind)
continue
else:
pass
print(_)
print(b)
a[_] = b # <--- Now you assign the new array 'b' to a[_]
print(a)