我有一个数据帧。我正在尝试查找日期时间的百分位数。我正在使用该功能:
数据帧:
student, attempts, time
student 1,14, 9/3/2019 12:32:32 AM
student 2,2, 9/3/2019 9:37:14 PM
student 3, 5
student 4, 16, 9/5/2019 8:58:14 PM
studentInfo2 = [14, 4, Timestamp('2019-09-04 00:26:36')]
data['time'] = pd.to_datetime(data['time_0001'], errors='coerce')
perc1_first = stats.percentileofscore(data['time'].notnull(), student2Info[2], 'rank')
其中 student2Info[2] 保存特定学生的日期时间。当我尝试这样做时,出现错误:
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
关于如何在列中缺少时间的情况下正确计算百分位数的任何想法?
您需要将时间戳转换为percentileofscore
可以理解的单位。 此外,pd.DataFrame.notnull()
返回一个布尔列表,您可以使用该列表来过滤DataFrame
,它不会返回过滤后的列表,因此我已为您更新了该列表。 这是一个工作示例:
import pandas as pd
import scipy.stats as stats
data = pd.DataFrame.from_dict({
"student": [1, 2, 3, 4],
"attempts": [14, 2, 5, 16],
"time_0001": [
"9/3/2019 12:32:32 AM",
"9/3/2019 9:37:14 PM",
"",
"9/5/2019 8:58:14 PM"
]
})
student2Info = [14, 4, pd.Timestamp('2019-09-04 00:26:36')]
data['time'] = pd.to_datetime(data['time_0001'], errors='coerce')
perc1_first = stats.percentileofscore(data[data['time'].notnull()].time.transform(pd.Timestamp.toordinal), student2Info[2].toordinal(), 'rank')
print(perc1_first) #-> 66.66666666666667