将字符串转换为时间数字



我有一个这样的数据帧。我需要将 colume 使用时间字符串从字符串转换为时间数字来总结此列。

Date        User                Used time       Used
31-07-19    Member (GTV1)       1Hour 1Minute   5083
31-07-19    Member (NHOKON)     4Minute         333
31-07-19    Member (GA)         42Minute        3500
31-07-19    Member (VIET125)    2Hour 32Minute  12667
31-07-19    Member (LUAN123)    40Minute        3333
31-07-19    Member (MMINH)      2Hour 47Minute  13917
31-07-19    Member (MMINH)      0Minute         0
31-07-19    Member (MMINH)      2Hour 11Minute  10917
31-07-19    Member (MMINH)      53Minute        4417
30-07-19    Member (KHANH00)    41Minute        3417
30-07-19    Member (CHIEN2)     1Hour 40Minute  8333
30-07-19    Member (CHIEN2)     9Minute         750
30-07-19    Guest               0Minute         2000
30-07-19    Member (DUYTQ111)   1Hour 3Minute   5250
29-07-19    Member (TRUNGTQ2)   42Minute        3500
29-07-19    Guest               1Hour 9Minute   7000
29-07-19    Member (MMINH)      2Hour 15Minute  11250

我的代码

T = df['Used time'].str.split(expand=True)
H = pd.to_numeric(T[0].str.replace('Hour',''))
M = round(pd.to_numeric(T[1].str.replace('Minute',''))/60, 1)
df['Used time'] = H + M

但它不适用于只有一分钟的行。请帮助我

尝试使用rpartition()而不是 split。

T = df['Used time'].str.rpartition('Hour')
H = pd.to_numeric(T[0]).fillna(0)
M = (pd.to_numeric(T[2].str.strip('Minute'))/60).round(1)
df['Used time'] = H + M
print(df)
Date                   User  Used time   Used
0   31-07-19          Member (GTV1)        1.0   5083
1   31-07-19        Member (NHOKON)        0.1    333
2   31-07-19            Member (GA)        0.7   3500
3   31-07-19       Member (VIET125)        2.5  12667
4   31-07-19       Member (LUAN123)        0.7   3333
5   31-07-19         Member (MMINH)        2.8  13917
6   31-07-19         Member (MMINH)        0.0      0
7   31-07-19         Member (MMINH)        2.2  10917
8   31-07-19         Member (MMINH)        0.9   4417
9   30-07-19       Member (KHANH00)        0.7   3417
10  30-07-19        Member (CHIEN2)        1.7   8333
11  30-07-19        Member (CHIEN2)        0.2    750
12  30-07-19                  Guest        0.0   2000
13  30-07-19      Member (DUYTQ111)        1.0   5250
14  29-07-19      Member (TRUNGTQ2)        0.7   3500
15  29-07-19                  Guest        1.2   7000
16  29-07-19         Member (MMINH)        2.2  11250

如果缺少小时,您可以在开头添加0Hour

df['Used time'] = df['Used time'].map(lambda x : '0Hour ' + x if len(x.split()) == 1 else x)

使用正则表达式查找Hour的数字和Minute的数字,全部转换为分钟和sum怎么样?

import re
T = df['Used time']
h = re.findall("(d+)Hour", T)
m = re.findall("(d+)Minute", T)
df['Used time'] = ((int(h[0])*60 if h else 0) + (int(m[0]) if m else 0)) # in minutes

您可以使用一种正则表达式模式来提取HourMinute

pattern = '(?:(?P<Hour>d+)Hours*)?(?P<Minute>d+)Minute'
new_df = df['Used time'].str.extract(pattern).fillna(0)

输出:

Hour Minute
0     1      1
1     0      4
2     0     42
3     2     32
4     0     40
5     2     47
6     0      0
7     2     11
8     0     53
9     0     41
10    1     40
11    0      9
12    0      0
13    1      3
14    0     42
15    1      9
16    2     15

有关模式说明,请参阅此链接。现在新的使用时间可以计算为:

df['Used time'] = new_df['Hour']*60 + new_df['Minute']

最新更新