正在验证熊猫中的水平行数据



我来这里寻求帮助。我正在处理以下数据:

df1:
name   name1   name2
A      13     13      13
B      13     27      57
C      12     12      12
D      26     23       2

我正在尝试使用这样的代码:

def val(df):
ret = []
for idx, row in df.iterrows():
if row.nunique()==1:
ret.append(f'The values of {idx} in name, name1, name2 are corrects')
else:
ret(["".join(f'*The values in {idx} are:', 
', '.join(f'{c} in {v}' for v,c in row.iteritems()),
'Check your data before compare.']))
return ret

这里的问题是,运行不好。首先,我需要将结果作为字符串而不是列表。我知道"".join()是可能的,但当我尝试代码时,我只得到最后一个结果,而不是我想要的全部答案。请问,如何才能得到完整的答案?。我希望看到更多的选择,而不仅仅是一个。

Example:
-The values of A in name, name1, name3 are corrects. 
- The values in B are:
13 in name, 27 in name3 and 57 in name2.
Check your data before compare.
-The values of C in name, name1, name3 are corrects.
- The values in D are:
26 in name, 23 in name3 and 2 in name2.
Check your data before compare.
def val(df):
ret = []
for idx, row in df.iterrows():
if row.nunique() == 1:
ret.append(f'- The values of {idx} in name, name1, name2 are corrects')
else:
ret.append(
f"- The values in {idx} are:n"             
f"  {row[0]} in name, {row[1]} in name1, {row[2]} in name2.n"
"  Check your data before compare."
)
return ret    
ans = val(df)

输出

for i in ans:
print(i)
- The values of A in name, name1, name2 are corrects
- The values in B are:
13 in name, 27 in name1, 57 in name2.
Check your data before compare.
- The values of C in name, name1, name2 are corrects
- The values in D are:
26 in name, 23 in name1, 2 in name2.
Check your data before compare.
import pandas as pd
df = pd.DataFrame({'name': {'A': 13, 'B': 13, 'C': 12, 'D': 26},
'name1': {'A': 13, 'B': 27, 'C': 12, 'D': 23},
'name2': {'A': 13, 'B': 57, 'C': 12, 'D': 2}})

很难知道如何纠正你的函数,因为它有很多错误。

您可以像使用字符串格式的字典一样使用Pandas系列。

In [25]: s = '{name:} in name, {name1:} in name1, {name2:} in name2'
In [26]: row = df.loc['A',:]
In [27]: print(s.format(**row))
13 in name, 13 in name1, 13 in name2
In [28]: for idx,row in df.iterrows():
...:     print(idx, s.format(**row))
...:     
A 13 in name, 13 in name1, 13 in name2
B 13 in name, 27 in name1, 57 in name2
C 12 in name, 12 in name1, 12 in name2
D 26 in name, 23 in name1, 2 in name2

使用格式化字符串文字(f-string(也是如此。

In [29]: for idx,row in df.iterrows():
...:     print(idx, f'''{row['name']} in name, {row['name1']} in name1, {row['name2']} in name2''')
...:     
A 13 in name, 13 in name1, 13 in name2
B 13 in name, 27 in name1, 57 in name2
C 12 in name, 12 in name1, 12 in name2
D 26 in name, 23 in name1, 2 in name2

像这样把字符串作为np.where子句的一部分怎么样?

所有其他答案都复制了原始的行迭代方法,这在玩具数据集之外效率很低。np.where是一个矢量化操作,因此它将比自定义函数更快,逻辑也更简单。唯一需要注意的是,字符串插值在这里不起作用,因此有点尴尬的多行语法。

import pandas as pd
import numpy as np
from io import StringIO
data = StringIO("""
index  name   name1   name2
A      13     13      13
B      13     27      57
C      12     12      12
D      26     23       2
""")
df = pd.read_csv(data, delim_whitespace=True, index_col="index")
results = np.where(
df.nunique(axis=1) == 1,
'The values in ' + df.index + ' in name, name1, name2 are the samen',
'The values in ' + df.index + ' are:n' + 
df["name"].astype(str)  + ' in name, ' + 
df["name1"].astype(str) + ' in name1, ' + 
df["name2"].astype(str) + ' in name2.nCheck your data.n'
)
print(*results, sep='n')

相关内容

  • 没有找到相关文章

最新更新