函数适用于数据框的每一行，但不使用 df.apply

我有这个熊猫数据帧，每行包含两个样本 X 和 Y：

import pandas as pd
import numpy as np
df = pd.DataFrame({'X': [np.random.normal(0, 1, 10),
np.random.normal(0, 1, 10),
np.random.normal(0, 1, 10)],
'Y': [np.random.normal(0, 1, 10),
np.random.normal(0, 1, 10),
np.random.normal(0, 1, 10)]})

我想在每行上使用一个函数ttest_ind()(以两个样本作为输入的统计测试(，并获取响应的第一个元素(该函数返回两个元素(：

如果我对给定的行(例如第一行(执行此操作，它可以工作：

from scipy import stats
stats.ttest_ind(df['X'][0], df['Y'][0], equal_var = False)[0]
# Returns a float

但是，如果我使用应用程序在每一行上执行此操作，则会出现错误：

df.apply(lambda x: stats.ttest_ind(x['X'], x['Y'], equal_var = False)[0])
# Throws the following error:
Traceback (most recent call last):
File "pandas_libsindex.pyx", line 154, in 
pandas._libs.index.IndexEngine.get_loc
File "pandas_libshashtable_class_helper.pxi", line 759, in 
pandas._libs.hashtable.Int64HashTable.get_item
TypeError: an integer is required
During handling of the above exception, another exception occurred:
...
KeyError: ('X', 'occurred at index X')

我做错了什么？

您只需要指定要应用函数的轴。查看apply()的相关文档。简而言之，axis = 1说"将函数应用于数据帧的每一行"。默认值为axis = 0，它尝试将函数应用于每列。

df.apply(lambda x: stats.ttest_ind(x['X'], x['Y'], equal_var = False)[0], axis=1)
0    0.985997
1   -0.197396
2    0.034277

相关内容

最新更新

热门标签：