将月份和年份分开的两列合并为一列,以进行天数的减法运算



我的数据框架中有两列,一列有月份,另一列有商店收到竞争对手的年份。我想做的是将这些列连接起来,然后从日期中减去,每天得到值。但是运行我在下面留下的代码时,我遇到了一个错误,我无法揭示这意味着什么。我会让错误出现在代码下面。

# competition since
df2['competition_since'] = df2.apply(lambda x: datetime( year=x['competition_open_since_year'], month=x['competition_open_since_month'], day=1, axis=1 ))
df2['competition_time_month'] = ( ( df2['date'] - df2['competition_since'])/30 ).apply(lambda x: x.days).astype(int)

错误

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~/opt/anaconda3/envs/sales_predict_rossmann/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
3079             try:
-> 3080                 return self._engine.get_loc(casted_key)
3081             except KeyError as err:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index_class_helper.pxi in pandas._libs.index.Int64Engine._check_type()
KeyError: 'competition_open_since_year'
The above exception was the direct cause of the following exception:
KeyError                                  Traceback (most recent call last)
<ipython-input-30-ac751656b323> in <module>
1 # competition since
----> 2 df2['competition_since'] = df2.apply(lambda x: datetime( year=x['competition_open_since_year'], month=x['competition_open_since_month'], day=1, axis=1 ))
3 df2['competition_time_month'] = ( ( df2['date'] - df2['competition_since'])/30 ).apply(lambda x: x.days).astype(int)
~/opt/anaconda3/envs/sales_predict_rossmann/lib/python3.8/site-packages/pandas/core/frame.py in apply(self, func, axis, raw, result_type, args, **kwds)
7766             kwds=kwds,
7767         )
-> 7768         return op.get_result()
7769 
7770     def applymap(self, func, na_action: Optional[str] = None) -> DataFrame:
~/opt/anaconda3/envs/sales_predict_rossmann/lib/python3.8/site-packages/pandas/core/apply.py in get_result(self)
183             return self.apply_raw()
184 
--> 185         return self.apply_standard()
186 
187     def apply_empty_result(self):
~/opt/anaconda3/envs/sales_predict_rossmann/lib/python3.8/site-packages/pandas/core/apply.py in apply_standard(self)
274 
275     def apply_standard(self):
--> 276         results, res_index = self.apply_series_generator()
277 
278         # wrap results
~/opt/anaconda3/envs/sales_predict_rossmann/lib/python3.8/site-packages/pandas/core/apply.py in apply_series_generator(self)
288             for i, v in enumerate(series_gen):
289                 # ignore SettingWithCopy here in case the user mutates
--> 290                 results[i] = self.f(v)
291                 if isinstance(results[i], ABCSeries):
292                     # If we have a view on v, we need to make a copy because
<ipython-input-30-ac751656b323> in <lambda>(x)
1 # competition since
----> 2 df2['competition_since'] = df2.apply(lambda x: datetime( year=x['competition_open_since_year'], month=x['competition_open_since_month'], day=1, axis=1 ))
3 df2['competition_time_month'] = ( ( df2['date'] - df2['competition_since'])/30 ).apply(lambda x: x.days).astype(int)
~/opt/anaconda3/envs/sales_predict_rossmann/lib/python3.8/site-packages/pandas/core/series.py in __getitem__(self, key)
851 
852         elif key_is_scalar:
--> 853             return self._get_value(key)
854 
855         if is_hashable(key):
~/opt/anaconda3/envs/sales_predict_rossmann/lib/python3.8/site-packages/pandas/core/series.py in _get_value(self, label, takeable)
959 
960         # Similar to Index.get_value, but we do not fall back to positional
--> 961         loc = self.index.get_loc(label)
962         return self.index._get_values_for_loc(self, loc, label)
963 
~/opt/anaconda3/envs/sales_predict_rossmann/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
3080                 return self._engine.get_loc(casted_key)
3081             except KeyError as err:
-> 3082                 raise KeyError(key) from err
3083 
3084         if tolerance is not None:
KeyError: 'competition_open_since_year'

当您将lambda函数应用于数据帧时,列的值会传递给它。因此,lambda函数的输入是数据帧的值。我想你正试图做这样的事情:

import datetime
df2['competition_since'] = df2['competition_open_since_month'] + "/1/" + df2['competition_open_since_month']
df2['competition_since'] = df2['competition_since'].apply(lambda x: datetime.datetime.strptime(x, "%m/%d/%Y"))

在此之后,您可以从日期列中减去,以找到每天的差异。像这样(我认为你不需要除以30(:

(df2['date'] - df2['competition_since']).apply(lambda x: x.days)

相关内容

  • 没有找到相关文章