Pandas:查找两个时间列之间的差异(持续时间)



我有这个数据帧:

Start_date         End_date           hour1     hour2      
0   2018-01-31 12:00:00 2019-03-17 21:45:00  12:00:00   21:45:00
1   2018-02-28 12:00:00 2019-03-24 21:45:00  12:00:00   21:45:00

我正在尝试创建一个新的列,其持续时间(需要以秒为单位的数值输出(仅基于我的列(hour2和hour1(

我已经用这个代码创建了我的小时专栏。也许错误就在这里。

date_df['hour1'] = date_df['Start_date'].dt.time
date_df['hour2'] = date_df['End_date'].dt.time
date_df

我尝试了这个解决方案:

date_df['hour2'] = pd.to_datetime(date_df['hour2'])
date_df['hour1'] = pd.to_datetime(date_df['hour1'])
date_df['NewColumn2']=date_df['hour2']-date_df['hour1'] 

错误:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-283-b75adc651706> in <module>
----> 1 date_df['hour2'] = pd.to_datetime(date_df['hour2'])
2 date_df['hour1'] = pd.to_datetime(date_df['hour1'])
3 date_df['NewColumn2']=date_df['hour2']-date_df['hour1']

~Anaconda3libsite-packagespandascoretoolsdatetimes.py in to_datetime(arg, errors, dayfirst, yearfirst, utc, format, exact, unit, infer_datetime_format, origin, cache)
801             result = arg.map(cache_array)
802         else:
--> 803             values = convert_listlike(arg._values, format)
804             result = arg._constructor(values, index=arg.index, name=arg.name)
805     elif isinstance(arg, (ABCDataFrame, abc.MutableMapping)):

~Anaconda3libsite-packagespandascoretoolsdatetimes.py in _convert_listlike_datetimes(arg, format, name, tz, unit, errors, infer_datetime_format, dayfirst, yearfirst, exact)
457         assert format is None or infer_datetime_format
458         utc = tz == "utc"
--> 459         result, tz_parsed = objects_to_datetime64ns(
460             arg,
461             dayfirst=dayfirst,

~Anaconda3libsite-packagespandascorearraysdatetimes.py in objects_to_datetime64ns(data, dayfirst, yearfirst, utc, errors, require_iso8601, allow_object)
2042 
2043     try:
-> 2044         result, tz_parsed = tslib.array_to_datetime(
2045             data,
2046             errors=errors,

pandas_libstslib.pyx in pandas._libs.tslib.array_to_datetime()

pandas_libstslib.pyx in pandas._libs.tslib.array_to_datetime()

pandas_libstslib.pyx in pandas._libs.tslib.array_to_datetime_object()

pandas_libstslib.pyx in pandas._libs.tslib.array_to_datetime()

TypeError: <class 'datetime.time'> is not convertible to datetime

我也尝试过这个解决方案:

date_df['NewColumn2']=date_df['hour2']-date_df['hour1']

我收到了一条错误消息:

TypeError                                 Traceback (most recent call last)
~Anaconda3libsite-packagespandascoreopsarray_ops.py in na_arithmetic_op(left, right, op, is_cmp)
142     try:
--> 143         result = expressions.evaluate(op, left, right)
144     except TypeError:
~Anaconda3libsite-packagespandascorecomputationexpressions.py in evaluate(op, a, b, use_numexpr)
232         if use_numexpr:
--> 233             return _evaluate(op, op_str, a, b)  # type: ignore
234     return _evaluate_standard(op, op_str, a, b)
~Anaconda3libsite-packagespandascorecomputationexpressions.py in _evaluate_numexpr(op, op_str, a, b)
118     if result is None:
--> 119         result = _evaluate_standard(op, op_str, a, b)
120 
~Anaconda3libsite-packagespandascorecomputationexpressions.py in _evaluate_standard(op, op_str, a, b)
67     with np.errstate(all="ignore"):
---> 68         return op(a, b)
69 
TypeError: unsupported operand type(s) for -: 'datetime.time' and 'datetime.time'
During handling of the above exception, another exception occurred:
TypeError                                 Traceback (most recent call last)
<ipython-input-286-bf4c33189e88> in <module>
----> 1 date_df['NewColumn2']=date_df['hour2']-date_df['hour1']
~Anaconda3libsite-packagespandascoreopscommon.py in new_method(self, other)
63         other = item_from_zerodim(other)
64 
---> 65         return method(self, other)
66 
67     return new_method
~Anaconda3libsite-packagespandascoreops__init__.py in wrapper(left, right)
341         lvalues = extract_array(left, extract_numpy=True)
342         rvalues = extract_array(right, extract_numpy=True)
--> 343         result = arithmetic_op(lvalues, rvalues, op)
344 
345         return left._construct_result(result, name=res_name)
~Anaconda3libsite-packagespandascoreopsarray_ops.py in arithmetic_op(left, right, op)
188     else:
189         with np.errstate(all="ignore"):
--> 190             res_values = na_arithmetic_op(lvalues, rvalues, op)
191 
192     return res_values
~Anaconda3libsite-packagespandascoreopsarray_ops.py in na_arithmetic_op(left, right, op, is_cmp)
148             #  will handle complex numbers incorrectly, see GH#32047
149             raise
--> 150         result = masked_arith_op(left, right, op)
151 
152     if is_cmp and (is_scalar(result) or result is NotImplemented):
~Anaconda3libsite-packagespandascoreopsarray_ops.py in masked_arith_op(x, y, op)
90         if mask.any():
91             with np.errstate(all="ignore"):
---> 92                 result[mask] = op(xrav[mask], yrav[mask])
93 
94     else:
TypeError: unsupported operand type(s) for -: 'datetime.time' and 'datetime.time'

当我使用下面给出的建议来初始加载数据帧时,错误实际上已经不存在了。但问题是,同样的错误正在影响我的原始数据帧(真正的练习(,所以我需要了解我做错了什么,或者我应该改变什么来解决问题。

我应该如何更改代码?

Tks

我在电脑上运行了你的代码,它没有出错。您的数据帧值不是str

它的"类型已经是datetime。您的错误消息显示了该信息。

TypeError: <class 'datetime.time'> is not convertible to datetime

运行第一个date_df['NewColumn2']=date_df['hour2']-date_df['hour1']

然后应该检查值的类型。

下面是你的代码,我运行我的电脑

date_df = pd.DataFrame(
{
"Start_date": ["2018-01-31 12:00:00", "2018-02-28 12:00:00"],
"End_date": ["2019-03-17 21:45:00", "2019-03-24 21:45:00"],
"hour1": ["12:00:00", "12:00:00"],
"hour2": ["21:45:00", "21:45:00"],
}
)
date_df['hour2'] = pd.to_datetime(date_df['hour2'])
date_df['hour1'] = pd.to_datetime(date_df['hour1'])
date_df['NewColumn2']=date_df['hour2']-date_df['hour1'] 

好吧,现在我明白你做了什么。您必须首先检查您的值的类型。非常重要。

我认为您的'Start_date''End_date'已经是datetime.datetime对象。

your_date_df['NewColumn2'] = your_date_df['End_date'] - your_date_df['Start_date']

如果你只想显示时差。这样做。首先,导入日期时间

import datetime

your_date_df['NewColumn2_onlyTime'] = your_date_df['NewColumn2'].apply(
lambda x: (datetime.datetime.min + x).time())
print(your_date_df)
index   Start_date  End_date    hour1   hour2   NewColumn2  NewColumn2_onlyTime
0   2018-01-31 12:00:00 2019-03-17 21:45:00 12:00:00    21:45:00    410 days 09:45:00   09:45:00
1   2018-02-28 12:00:00 2019-03-24 21:45:00 12:00:00    21:45:00    389 days 09:45:00   09:45:00

如果您想要从开始到结束的差异,则不需要获取小时数。你可以做这个

data='''
Start_date         End_date           hour1     hour2
0   2018-01-31 12:00:00  2019-03-17 21:45:00  12:00:00   21:45:00
1   2018-02-28 12:00:00  2019-03-24 21:45:00  12:00:00   21:45:00'''
df = pd.read_csv(io.StringIO(data), sep=' s+', engine='python')
df['Start_date'] = pd.to_datetime(df['Start_date'])
df['End_date'] = pd.to_datetime(df['End_date'])
df['deltadays_seconds'] = (df.End_date-df.Start_date).dt.total_seconds()
df
Start_date            End_date     hour1     hour2  deltadays_seconds
0 2018-01-31 12:00:00 2019-03-17 21:45:00  12:00:00  21:45:00         35459100.0
1 2018-02-28 12:00:00 2019-03-24 21:45:00  12:00:00  21:45:00         33644700.0

你可以在1小时和2小时内潜水,但你得到的答案是一样的。hour1和hour2只是总日期和时间的表示。

最新更新