为什么在追加行时数据类型会更改为对象



创建一个DataFrame,在上面打印信息并附加一行,然后再次打印信息。所有列的数据类型都更改为对象。为什么?

myData = np.array([134.29, 136.97, 250.31, 312.28])
mySeries = pd.Series(myData,index=['IBM','P&G','Microsoft','Home Depot'], name="Stock Price")
myData1 = np.array(['120.573B', '336.72B', '1.885T' , '335.974B'])
mySeries1 = pd.Series(myData1, index=['IBM','P&G','Microsoft','Home Depot'], name="Market Cap")
myData2 = np.array([120_573_000_000, 336_720_000_000, 1_885_000_000_000 , 335_974_000_000])
mySeries2 = pd.Series(myData2, index=['IBM','P&G','Microsoft','Home Depot'], name="Market Cap Raw")
myDataFrame = pd.concat([mySeries, mySeries1, mySeries2], axis=1)
#print(myDataFrame)
print(myDataFrame.info())
# After adding the row below, the dtype of numeric types change to object
myData = np.array([20.99, '100M', 100000000 ])
mySeries = pd.Series(myData, index = myDataFrame.columns, name = 'HML')
myDataFrame = myDataFrame.append(mySeries, ignore_index=False)
#print(myDataFrame)
print(myDataFrame.info())

<class 'pandas.core.frame.DataFrame'>
Index: 4 entries, IBM to Home Depot
Data columns (total 3 columns):
#   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
0   Stock Price     4 non-null      float64
1   Market Cap      4 non-null      object 
2   Market Cap Raw  4 non-null      int64  
dtypes: float64(1), int64(1), object(1)
memory usage: 128.0+ bytes
None
<class 'pandas.core.frame.DataFrame'>
Index: 5 entries, IBM to HML
Data columns (total 3 columns):
#   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
0   Stock Price     5 non-null      object
1   Market Cap      5 non-null      object
2   Market Cap Raw  5 non-null      object
dtypes: object(3)
memory usage: 160.0+ bytes
None

创建包含不同不兼容类型对象的Series对象时,该Series的dtype将变为object

当你第二次创建myDatamySeries时,这正是发生的事情:

>>> myData = np.array([20.99, '100M', 100000000 ])
>>> mySeries = pd.Series(myData, index = myDataFrame.columns, name = 'HML')
>>> mySeries.dtype
dtype('O')

紧接着,将该Series(数据类型为object(附加到数据帧中。由于object类型比数据帧的各个列的数据类型更通用,因此这些列将转换为更通用的object数据类型。

我想好了如何修复它:

tmpSeries = pd.to_numeric(myDataFrame['Stock Price'])
myDataFrame['Stock Price'] = tmpSeries

这会将列从对象更改为float64。to_numeric还可以用于转换为其他数字类型。

最新更新