我的数据框架中有两列,一列有月份,另一列有商店收到竞争对手的年份。我想做的是将这些列连接起来,然后从日期中减去,每天得到值。但是运行我在下面留下的代码时,我遇到了一个错误,我无法揭示这意味着什么。我会让错误出现在代码下面。
# competition since
df2['competition_since'] = df2.apply(lambda x: datetime( year=x['competition_open_since_year'], month=x['competition_open_since_month'], day=1, axis=1 ))
df2['competition_time_month'] = ( ( df2['date'] - df2['competition_since'])/30 ).apply(lambda x: x.days).astype(int)
错误
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
~/opt/anaconda3/envs/sales_predict_rossmann/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
3079 try:
-> 3080 return self._engine.get_loc(casted_key)
3081 except KeyError as err:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index_class_helper.pxi in pandas._libs.index.Int64Engine._check_type()
KeyError: 'competition_open_since_year'
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
<ipython-input-30-ac751656b323> in <module>
1 # competition since
----> 2 df2['competition_since'] = df2.apply(lambda x: datetime( year=x['competition_open_since_year'], month=x['competition_open_since_month'], day=1, axis=1 ))
3 df2['competition_time_month'] = ( ( df2['date'] - df2['competition_since'])/30 ).apply(lambda x: x.days).astype(int)
~/opt/anaconda3/envs/sales_predict_rossmann/lib/python3.8/site-packages/pandas/core/frame.py in apply(self, func, axis, raw, result_type, args, **kwds)
7766 kwds=kwds,
7767 )
-> 7768 return op.get_result()
7769
7770 def applymap(self, func, na_action: Optional[str] = None) -> DataFrame:
~/opt/anaconda3/envs/sales_predict_rossmann/lib/python3.8/site-packages/pandas/core/apply.py in get_result(self)
183 return self.apply_raw()
184
--> 185 return self.apply_standard()
186
187 def apply_empty_result(self):
~/opt/anaconda3/envs/sales_predict_rossmann/lib/python3.8/site-packages/pandas/core/apply.py in apply_standard(self)
274
275 def apply_standard(self):
--> 276 results, res_index = self.apply_series_generator()
277
278 # wrap results
~/opt/anaconda3/envs/sales_predict_rossmann/lib/python3.8/site-packages/pandas/core/apply.py in apply_series_generator(self)
288 for i, v in enumerate(series_gen):
289 # ignore SettingWithCopy here in case the user mutates
--> 290 results[i] = self.f(v)
291 if isinstance(results[i], ABCSeries):
292 # If we have a view on v, we need to make a copy because
<ipython-input-30-ac751656b323> in <lambda>(x)
1 # competition since
----> 2 df2['competition_since'] = df2.apply(lambda x: datetime( year=x['competition_open_since_year'], month=x['competition_open_since_month'], day=1, axis=1 ))
3 df2['competition_time_month'] = ( ( df2['date'] - df2['competition_since'])/30 ).apply(lambda x: x.days).astype(int)
~/opt/anaconda3/envs/sales_predict_rossmann/lib/python3.8/site-packages/pandas/core/series.py in __getitem__(self, key)
851
852 elif key_is_scalar:
--> 853 return self._get_value(key)
854
855 if is_hashable(key):
~/opt/anaconda3/envs/sales_predict_rossmann/lib/python3.8/site-packages/pandas/core/series.py in _get_value(self, label, takeable)
959
960 # Similar to Index.get_value, but we do not fall back to positional
--> 961 loc = self.index.get_loc(label)
962 return self.index._get_values_for_loc(self, loc, label)
963
~/opt/anaconda3/envs/sales_predict_rossmann/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
3080 return self._engine.get_loc(casted_key)
3081 except KeyError as err:
-> 3082 raise KeyError(key) from err
3083
3084 if tolerance is not None:
KeyError: 'competition_open_since_year'
当您将lambda函数应用于数据帧时,列的值会传递给它。因此,lambda函数的输入是数据帧的值。我想你正试图做这样的事情:
import datetime
df2['competition_since'] = df2['competition_open_since_month'] + "/1/" + df2['competition_open_since_month']
df2['competition_since'] = df2['competition_since'].apply(lambda x: datetime.datetime.strptime(x, "%m/%d/%Y"))
在此之后,您可以从日期列中减去,以找到每天的差异。像这样(我认为你不需要除以30(:
(df2['date'] - df2['competition_since']).apply(lambda x: x.days)