我有一个DataFrame,它有两个值。
索引为datetime,第二列为class_label
我想通过按class_label分组并计数行来重新采样此DataFrame。
datetime class_label
01-01-2020 00:00 1
01-01-2020 00:00 2
01-01-2020 00:00 2
01-02-2020 00:00 2
01-02-2020 00:00 2
01-03-2020 00:00 1
01-04-2020 00:00 1
是否可以按天重新取样并显示标签数量?
datetime class_label count
01-01-2020 00:00 1 1
01-01-2020 00:00 2 2
01-02-2020 00:00 2 2
01-03-2020 00:00 1 1
01-04-2020 00:00 1 1
您似乎希望按datetime
和class_label
进行分组,并计算每组的观察次数。
从Pandas 1.1.0 开始,您可以使用.size()
或.value_counts()
from random import randrange
from datetime import timedelta, date
import numpy as np
import pandas as pd
def random_date(start, end):
delta = end - start
int_delta = (delta.days * 24 * 60 * 60) + delta.seconds
random_second = randrange(int_delta)
return start + timedelta(seconds=random_second)
n = 100
start = date(2020, 1, 1)
end = date(2020, 1, 5)
df = pd.DataFrame({"datetime": [random_date(start, end) for _ in range(n)], "class_label": [np.random.randint(1, 3) for _ in range(n)]})
# using .size()
df.groupby(['datetime', 'class_label'], as_index=False).size().rename(columns={"size": "count"}).sort_values(['datetime', 'class_label'])
# using .value_counts()
df.value_counts().to_frame("counts").sort_values(['datetime', 'class_label']).reset_index()