在特征工具的多个训练窗口中计算特征



>我有一个包含客户和交易的表格。有没有办法获取将过滤过去 3/6/9/12 个月的要素?我想自动生成功能:

  • 过去 3 个月内的跨性别者数量
  • ....
  • 过去 12 个月内的跨性别者数量
  • 过去 3 个月的平均跨性别
  • 过去 12 个月的平均跨性别

我尝试使用training_window =["1 month", "3 months"],,但它似乎没有为每个窗口返回多个功能。

例:

import featuretools as ft
es = ft.demo.load_mock_customer(return_entityset=True)
window_features = ft.dfs(entityset=es,
target_entity="customers",
training_window=["1 hour", "1 day"],
features_only = True)
window_features

我是否必须单独执行单个窗口,然后合并结果?

正如你提到的,在Featuretools 0.2.1中,你必须为每个训练窗口单独构建特征矩阵,然后合并结果。在您的示例中,您将按如下方式执行此操作:

import pandas as pd
import featuretools as ft
es = ft.demo.load_mock_customer(return_entityset=True)
cutoff_times = pd.DataFrame({"customer_id": [1, 2, 3, 4, 5],
"time": pd.date_range('2014-01-01 01:41:50', periods=5, freq='25min')})
features = ft.dfs(entityset=es,
target_entity="customers",
agg_primitives=['count'],
trans_primitives=[],
features_only = True)
fm_1 = ft.calculate_feature_matrix(features, 
entityset=es, 
cutoff_time=cutoff_times,
training_window='1h', 
verbose=True)
fm_2 = ft.calculate_feature_matrix(features, 
entityset=es, 
cutoff_time=cutoff_times,
training_window='1d', 
verbose=True)
new_df = fm_1.reset_index()
new_df = new_df.merge(fm_2.reset_index(), on="customer_id", suffixes=("_1h", "_1d"))

然后,新数据帧将如下所示:

customer_id COUNT(sessions)_1h  COUNT(transactions)_1h  COUNT(sessions)_1d COUNT(transactions)_1d
1           1                   17                      3                 43
2           3                   36                      3                 36
3           0                   0                       1                 25
4           0                   0                       0                 0
5           1                   15                      2                 29

最新更新