熊猫:如何生成"year-month"格式的列(句点)?


In [20]: df.head()
Out[20]:
   year  month     capital       sales      income      profit         debt 
0  2000      6 -19250379.0  37924704.0  -4348337.0   2571738.0  192842551.0
1  2000     12 -68357153.0  27870187.0 -49074146.0 -20764204.0  190848380.0
2  2001      6 -65048960.0  30529435.0  -1172803.0   2000427.0  197383572.0
3  2001     12 -90129943.0  17135480.0 -24208501.0  -1012230.0  191464941.0
4  2002      6  14671980.0  31377347.0   2188125.0   3660938.0  101355088.0

我尝试的是:

df['date'] = pd.to_datetime(df['year']*10000 + df['month']*100, format="%Y%m")

但发生了一个错误:

In [21]: df['date'] = pd.to_datetime(df['year']*10000 + df['month']*100, format="%Y%m"
    ...: )
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-21-31bfca8c5941> in <module>()
----> 1 df['date'] = pd.to_datetime(df['year']*10000 + df['month']*100, format="%Y%m")
~/.pyenv/versions/3.6.0/lib/python3.6/site-packages/pandas/core/tools/datetimes.py in to_datetime(arg, errors, dayfirst, yearfirst, utc, box, format, exact, unit, infer_datetime_format, origin)
    507     elif isinstance(arg, ABCSeries):
    508         from pandas import Series
--> 509         values = _convert_listlike(arg._values, False, format)
    510         result = Series(values, index=arg.index, name=arg.name)
    511     elif isinstance(arg, (ABCDataFrame, MutableMapping)):
~/.pyenv/versions/3.6.0/lib/python3.6/site-packages/pandas/core/tools/datetimes.py in _convert_listlike(arg, box, format, name, tz)
    412                     try:
    413                         result = tslib.array_strptime(arg, format, exact=exact,
--> 414                                                       errors=errors)
    415                     except tslib.OutOfBoundsDatetime:
    416                         if errors == 'raise':
pandas/_libs/tslib.pyx in pandas._libs.tslib.array_strptime (pandas/_libs/tslib.c:63753)()
TypeError: 'int' object is unsliceable

我认为这是因为错过了"天"。

我该怎么做?

In [223]: df['date'] = pd.to_datetime(df[['year','month']].assign(day=1)).dt.to_period('M')
In [224]: df
Out[224]:
   year  month     capital       sales      income      profit         debt    date
0  2000      6 -19250379.0  37924704.0  -4348337.0   2571738.0  192842551.0 2000-06
1  2000     12 -68357153.0  27870187.0 -49074146.0 -20764204.0  190848380.0 2000-12
2  2001      6 -65048960.0  30529435.0  -1172803.0   2000427.0  197383572.0 2001-06
3  2001     12 -90129943.0  17135480.0 -24208501.0  -1012230.0  191464941.0 2001-12
4  2002      6  14671980.0  31377347.0   2188125.0   3660938.0  101355088.0 2002-06

In [208]: df['date'] = pd.PeriodIndex(pd.to_datetime(df[['year','month']].assign(day=1)),
                                      freq='M')
In [209]: df
Out[209]:
   year  month     capital       sales      income      profit         debt    date
0  2000      6 -19250379.0  37924704.0  -4348337.0   2571738.0  192842551.0 2000-06
1  2000     12 -68357153.0  27870187.0 -49074146.0 -20764204.0  190848380.0 2000-12
2  2001      6 -65048960.0  30529435.0  -1172803.0   2000427.0  197383572.0 2001-06
3  2001     12 -90129943.0  17135480.0 -24208501.0  -1012230.0  191464941.0 2001-12
4  2002      6  14671980.0  31377347.0   2188125.0   3660938.0  101355088.0 2002-06

您也可以将系列传递给PeriodIndex,因此可以使用:

df['date'] = pd.PeriodIndex(
    year=df['year'],
    month=df['month'],
    freq='M',
)

如果您特别想要Timestamp而不是Period,则可以将dict传递给to_datetime()并将一天设置为1

df['date'] = pd.to_datetime({
    'year': df['year'],
    'month': df['month'],
    'day': 1,
})

最新更新