查找第一个非零值，并对其后的列进行计数

我有一个关于每个客户id的每日交易的大型数据集，示例如下。我试图计算每个客户的使用寿命，即客户第一次购买后的天数。在示例数据的上下文中，我想计算自上次购买以来2018-04-14的天数：对于id1052，长度为44。

trans_date  2018-03-01  2018-03-02  2018-03-03 ... 2018-04-12   2018-04-13  
id                                                                                   
1052         30.00       0.0         0.0     ...    14.00          0.0        
1053          0.0        0.0         10.64   ...    0.0            0.0        
1054          0.0        8.40        0.0     ...    0.0            9.10       
1055          0.0        0.0         0.0     ...    0.0            0.0        
1056          0.0        0.0         0.0     ...    42.21          0.0

我知道如何找到行中的第一个非零值，但使用column_indextrans_date计算最近性是我的难点。我该怎么做？

检查非零值后，这是idxmax。我们需要为所有0添加一个额外的检查用例，因为idxmax会为该行返回df.columns[0]，这没有意义。在这里，我选择让它使用where返回NaN。

import pandas as pd
#df.columns = pd.to_datetime(df.columns)
my_date = pd.to_datetime('2018-04-14')
u = df.ne(0)
(my_date - u.idxmax(1).where(u.any(1))).dt.days
#id
#1052    44.0
#1053    42.0
#1054    43.0
#1055     NaN
#1056     2.0
#dtype: float64

试试这个：

# Convert the column names to dates
dates = pd.to_datetime(df.columns)
def customer_lifetime(row):
# Indices of the days where the customer has made purchase
purchases = np.where(row > 0)[0]
if len(purchases) == 0:
# The customer never bought anything
return 0
else:
# The index of the first purchase date
i = purchases[0]
# The "current" date
j = len(dates) - 1
# The customer lifetime is the difference between the
# current date and first purchase date
return (dates[j] - dates[i]) / pd.Timedelta(days=1)
df['CustomerLifetime'] = df.apply(customer_lifetime, axis=1)

结果：

2018-03-01  2018-03-02  2018-03-03  2018-04-12  2018-04-13  CustomerLifetime
id                                                                                
1052        30.0         0.0        0.00       14.00         0.0              43.0
1053         0.0         0.0       10.64        0.00         0.0              41.0
1054         0.0         8.4        0.00        0.00         9.1              42.0
1055         0.0         0.0        0.00        0.00         0.0               0.0
1056         0.0         0.0        0.00       42.21         0.0               1.0

相关内容

最新更新

热门标签：