我有这个数据帧:
Start_date End_date hour1 hour2
0 2018-01-31 12:00:00 2019-03-17 21:45:00 12:00:00 21:45:00
1 2018-02-28 12:00:00 2019-03-24 21:45:00 12:00:00 21:45:00
我正在尝试创建一个新的列,其持续时间(需要以秒为单位的数值输出(仅基于我的列(hour2和hour1(
我已经用这个代码创建了我的小时专栏。也许错误就在这里。
date_df['hour1'] = date_df['Start_date'].dt.time
date_df['hour2'] = date_df['End_date'].dt.time
date_df
我尝试了这个解决方案:
date_df['hour2'] = pd.to_datetime(date_df['hour2'])
date_df['hour1'] = pd.to_datetime(date_df['hour1'])
date_df['NewColumn2']=date_df['hour2']-date_df['hour1']
错误:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-283-b75adc651706> in <module>
----> 1 date_df['hour2'] = pd.to_datetime(date_df['hour2'])
2 date_df['hour1'] = pd.to_datetime(date_df['hour1'])
3 date_df['NewColumn2']=date_df['hour2']-date_df['hour1']
~Anaconda3libsite-packagespandascoretoolsdatetimes.py in to_datetime(arg, errors, dayfirst, yearfirst, utc, format, exact, unit, infer_datetime_format, origin, cache)
801 result = arg.map(cache_array)
802 else:
--> 803 values = convert_listlike(arg._values, format)
804 result = arg._constructor(values, index=arg.index, name=arg.name)
805 elif isinstance(arg, (ABCDataFrame, abc.MutableMapping)):
~Anaconda3libsite-packagespandascoretoolsdatetimes.py in _convert_listlike_datetimes(arg, format, name, tz, unit, errors, infer_datetime_format, dayfirst, yearfirst, exact)
457 assert format is None or infer_datetime_format
458 utc = tz == "utc"
--> 459 result, tz_parsed = objects_to_datetime64ns(
460 arg,
461 dayfirst=dayfirst,
~Anaconda3libsite-packagespandascorearraysdatetimes.py in objects_to_datetime64ns(data, dayfirst, yearfirst, utc, errors, require_iso8601, allow_object)
2042
2043 try:
-> 2044 result, tz_parsed = tslib.array_to_datetime(
2045 data,
2046 errors=errors,
pandas_libstslib.pyx in pandas._libs.tslib.array_to_datetime()
pandas_libstslib.pyx in pandas._libs.tslib.array_to_datetime()
pandas_libstslib.pyx in pandas._libs.tslib.array_to_datetime_object()
pandas_libstslib.pyx in pandas._libs.tslib.array_to_datetime()
TypeError: <class 'datetime.time'> is not convertible to datetime
我也尝试过这个解决方案:
date_df['NewColumn2']=date_df['hour2']-date_df['hour1']
我收到了一条错误消息:
TypeError Traceback (most recent call last)
~Anaconda3libsite-packagespandascoreopsarray_ops.py in na_arithmetic_op(left, right, op, is_cmp)
142 try:
--> 143 result = expressions.evaluate(op, left, right)
144 except TypeError:
~Anaconda3libsite-packagespandascorecomputationexpressions.py in evaluate(op, a, b, use_numexpr)
232 if use_numexpr:
--> 233 return _evaluate(op, op_str, a, b) # type: ignore
234 return _evaluate_standard(op, op_str, a, b)
~Anaconda3libsite-packagespandascorecomputationexpressions.py in _evaluate_numexpr(op, op_str, a, b)
118 if result is None:
--> 119 result = _evaluate_standard(op, op_str, a, b)
120
~Anaconda3libsite-packagespandascorecomputationexpressions.py in _evaluate_standard(op, op_str, a, b)
67 with np.errstate(all="ignore"):
---> 68 return op(a, b)
69
TypeError: unsupported operand type(s) for -: 'datetime.time' and 'datetime.time'
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last)
<ipython-input-286-bf4c33189e88> in <module>
----> 1 date_df['NewColumn2']=date_df['hour2']-date_df['hour1']
~Anaconda3libsite-packagespandascoreopscommon.py in new_method(self, other)
63 other = item_from_zerodim(other)
64
---> 65 return method(self, other)
66
67 return new_method
~Anaconda3libsite-packagespandascoreops__init__.py in wrapper(left, right)
341 lvalues = extract_array(left, extract_numpy=True)
342 rvalues = extract_array(right, extract_numpy=True)
--> 343 result = arithmetic_op(lvalues, rvalues, op)
344
345 return left._construct_result(result, name=res_name)
~Anaconda3libsite-packagespandascoreopsarray_ops.py in arithmetic_op(left, right, op)
188 else:
189 with np.errstate(all="ignore"):
--> 190 res_values = na_arithmetic_op(lvalues, rvalues, op)
191
192 return res_values
~Anaconda3libsite-packagespandascoreopsarray_ops.py in na_arithmetic_op(left, right, op, is_cmp)
148 # will handle complex numbers incorrectly, see GH#32047
149 raise
--> 150 result = masked_arith_op(left, right, op)
151
152 if is_cmp and (is_scalar(result) or result is NotImplemented):
~Anaconda3libsite-packagespandascoreopsarray_ops.py in masked_arith_op(x, y, op)
90 if mask.any():
91 with np.errstate(all="ignore"):
---> 92 result[mask] = op(xrav[mask], yrav[mask])
93
94 else:
TypeError: unsupported operand type(s) for -: 'datetime.time' and 'datetime.time'
当我使用下面给出的建议来初始加载数据帧时,错误实际上已经不存在了。但问题是,同样的错误正在影响我的原始数据帧(真正的练习(,所以我需要了解我做错了什么,或者我应该改变什么来解决问题。
我应该如何更改代码?
Tks
我在电脑上运行了你的代码,它没有出错。您的数据帧值不是str
。
它的"类型已经是datetime。您的错误消息显示了该信息。
TypeError: <class 'datetime.time'> is not convertible to datetime
运行第一个date_df['NewColumn2']=date_df['hour2']-date_df['hour1']
然后应该检查值的类型。
下面是你的代码,我运行我的电脑
date_df = pd.DataFrame(
{
"Start_date": ["2018-01-31 12:00:00", "2018-02-28 12:00:00"],
"End_date": ["2019-03-17 21:45:00", "2019-03-24 21:45:00"],
"hour1": ["12:00:00", "12:00:00"],
"hour2": ["21:45:00", "21:45:00"],
}
)
date_df['hour2'] = pd.to_datetime(date_df['hour2'])
date_df['hour1'] = pd.to_datetime(date_df['hour1'])
date_df['NewColumn2']=date_df['hour2']-date_df['hour1']
好吧,现在我明白你做了什么。您必须首先检查您的值的类型。非常重要。
我认为您的'Start_date'
和'End_date'
已经是datetime.datetime
对象。
your_date_df['NewColumn2'] = your_date_df['End_date'] - your_date_df['Start_date']
如果你只想显示时差。这样做。首先,导入日期时间
import datetime
your_date_df['NewColumn2_onlyTime'] = your_date_df['NewColumn2'].apply(
lambda x: (datetime.datetime.min + x).time())
print(your_date_df)
index Start_date End_date hour1 hour2 NewColumn2 NewColumn2_onlyTime
0 2018-01-31 12:00:00 2019-03-17 21:45:00 12:00:00 21:45:00 410 days 09:45:00 09:45:00
1 2018-02-28 12:00:00 2019-03-24 21:45:00 12:00:00 21:45:00 389 days 09:45:00 09:45:00
如果您想要从开始到结束的差异,则不需要获取小时数。你可以做这个
data='''
Start_date End_date hour1 hour2
0 2018-01-31 12:00:00 2019-03-17 21:45:00 12:00:00 21:45:00
1 2018-02-28 12:00:00 2019-03-24 21:45:00 12:00:00 21:45:00'''
df = pd.read_csv(io.StringIO(data), sep=' s+', engine='python')
df['Start_date'] = pd.to_datetime(df['Start_date'])
df['End_date'] = pd.to_datetime(df['End_date'])
df['deltadays_seconds'] = (df.End_date-df.Start_date).dt.total_seconds()
df
Start_date End_date hour1 hour2 deltadays_seconds
0 2018-01-31 12:00:00 2019-03-17 21:45:00 12:00:00 21:45:00 35459100.0
1 2018-02-28 12:00:00 2019-03-24 21:45:00 12:00:00 21:45:00 33644700.0
你可以在1小时和2小时内潜水,但你得到的答案是一样的。hour1和hour2只是总日期和时间的表示。