我有一个这样的数据帧。我需要将 colume 使用时间字符串从字符串转换为时间或数字来总结此列。
Date User Used time Used
31-07-19 Member (GTV1) 1Hour 1Minute 5083
31-07-19 Member (NHOKON) 4Minute 333
31-07-19 Member (GA) 42Minute 3500
31-07-19 Member (VIET125) 2Hour 32Minute 12667
31-07-19 Member (LUAN123) 40Minute 3333
31-07-19 Member (MMINH) 2Hour 47Minute 13917
31-07-19 Member (MMINH) 0Minute 0
31-07-19 Member (MMINH) 2Hour 11Minute 10917
31-07-19 Member (MMINH) 53Minute 4417
30-07-19 Member (KHANH00) 41Minute 3417
30-07-19 Member (CHIEN2) 1Hour 40Minute 8333
30-07-19 Member (CHIEN2) 9Minute 750
30-07-19 Guest 0Minute 2000
30-07-19 Member (DUYTQ111) 1Hour 3Minute 5250
29-07-19 Member (TRUNGTQ2) 42Minute 3500
29-07-19 Guest 1Hour 9Minute 7000
29-07-19 Member (MMINH) 2Hour 15Minute 11250
我的代码
T = df['Used time'].str.split(expand=True)
H = pd.to_numeric(T[0].str.replace('Hour',''))
M = round(pd.to_numeric(T[1].str.replace('Minute',''))/60, 1)
df['Used time'] = H + M
但它不适用于只有一分钟的行。请帮助我
尝试使用rpartition()
而不是 split。
T = df['Used time'].str.rpartition('Hour')
H = pd.to_numeric(T[0]).fillna(0)
M = (pd.to_numeric(T[2].str.strip('Minute'))/60).round(1)
df['Used time'] = H + M
print(df)
Date User Used time Used
0 31-07-19 Member (GTV1) 1.0 5083
1 31-07-19 Member (NHOKON) 0.1 333
2 31-07-19 Member (GA) 0.7 3500
3 31-07-19 Member (VIET125) 2.5 12667
4 31-07-19 Member (LUAN123) 0.7 3333
5 31-07-19 Member (MMINH) 2.8 13917
6 31-07-19 Member (MMINH) 0.0 0
7 31-07-19 Member (MMINH) 2.2 10917
8 31-07-19 Member (MMINH) 0.9 4417
9 30-07-19 Member (KHANH00) 0.7 3417
10 30-07-19 Member (CHIEN2) 1.7 8333
11 30-07-19 Member (CHIEN2) 0.2 750
12 30-07-19 Guest 0.0 2000
13 30-07-19 Member (DUYTQ111) 1.0 5250
14 29-07-19 Member (TRUNGTQ2) 0.7 3500
15 29-07-19 Guest 1.2 7000
16 29-07-19 Member (MMINH) 2.2 11250
如果缺少小时,您可以在开头添加0Hour
。
df['Used time'] = df['Used time'].map(lambda x : '0Hour ' + x if len(x.split()) == 1 else x)
使用正则表达式查找Hour
的数字和Minute
的数字,全部转换为分钟和sum
怎么样?
import re
T = df['Used time']
h = re.findall("(d+)Hour", T)
m = re.findall("(d+)Minute", T)
df['Used time'] = ((int(h[0])*60 if h else 0) + (int(m[0]) if m else 0)) # in minutes
您可以使用一种正则表达式模式来提取Hour
和Minute
:
pattern = '(?:(?P<Hour>d+)Hours*)?(?P<Minute>d+)Minute'
new_df = df['Used time'].str.extract(pattern).fillna(0)
输出:
Hour Minute
0 1 1
1 0 4
2 0 42
3 2 32
4 0 40
5 2 47
6 0 0
7 2 11
8 0 53
9 0 41
10 1 40
11 0 9
12 0 0
13 1 3
14 0 42
15 1 9
16 2 15
有关模式说明,请参阅此链接。现在新的使用时间可以计算为:
df['Used time'] = new_df['Hour']*60 + new_df['Minute']