如何在pandas数据框中只取整数值



我有以下ct_data数据框架

imjp_number,imct_id
182467224,'ed3baabac3ce4d86801d8490ea474963|pXJjGxufodMVq5FBSzHc2A'
307291224,'__gde66a472fe104ab381456ee059751d9d|Qujk8BKa0XkkpJMCstCYBw'
214278175,'mbKKbkKpiTsIAyCE8y07rw|e8133ceeca654d169532b4ad4de661d5'
tes123456,'tMyM0un_ptsHHC-lET6tkQ|87538a4436af47a7a9b8b9bc2b3ec5ba'
Not Found,'pXJjGxufodMVq5FBSzHc2A'

我正在应用下面的逻辑,但它不起作用。

ct_data['imjp_number']  = ct_data.loc[ct_data['imjp_number'].apply(lambda x: isinstance(x,int)), 'imjp_number']

请建议我最好的方法来选择ct_data df只有整数值,并从imjp_number列中删除'tes12345'和'Not found'值

>>> print(df.to_string()) 
imjp_number                                                     imct_id
0   182467224     ed3baabac3ce4d86801d8490ea474963|pXJjGxufodMVq5FBSzHc2A
1   307291224  __gde66a472fe104ab381456ee059751d9d|Qujk8BKa0XkkpJMCstCYBw
2   214278175     mbKKbkKpiTsIAyCE8y07rw|e8133ceeca654d169532b4ad4de661d5
3   tes123456     tMyM0un_ptsHHC-lET6tkQ|87538a4436af47a7a9b8b9bc2b3ec5ba
4   Not Found                                      pXJjGxufodMVq5FBSzHc2A
>>> print(df.imjp_number.str.isdigit().to_string())
0     True
1     True
2     True
3    False
4    False
>>> print(df[df.imjp_number.str.isdigit()].to_string())
imjp_number                                                     imct_id
0   182467224     ed3baabac3ce4d86801d8490ea474963|pXJjGxufodMVq5FBSzHc2A
1   307291224  __gde66a472fe104ab381456ee059751d9d|Qujk8BKa0XkkpJMCstCYBw
2   214278175     mbKKbkKpiTsIAyCE8y07rw|e8133ceeca654d169532b4ad4de661d5
>>>

来自我在评论中链接的第二个问题。

>>> print(df.to_string())
imjp_number                                                     imct_id
0   182467224     ed3baabac3ce4d86801d8490ea474963|pXJjGxufodMVq5FBSzHc2A
1   307291224  __gde66a472fe104ab381456ee059751d9d|Qujk8BKa0XkkpJMCstCYBw
2   214278175     mbKKbkKpiTsIAyCE8y07rw|e8133ceeca654d169532b4ad4de661d5
3   tes123456     tMyM0un_ptsHHC-lET6tkQ|87538a4436af47a7a9b8b9bc2b3ec5ba
4   Not Found                                      pXJjGxufodMVq5FBSzHc2A
>>>
>>> print(pd.to_numeric(df.imjp_number, errors='coerce').to_string())
0    182467224.0
1    307291224.0
2    214278175.0
3            NaN
4            NaN
>>>
>>> print(pd.to_numeric(df.imjp_number, errors='coerce').notnull().to_string())
0     True
1     True
2     True
3    False
4    False
>>>
>>> print(df[pd.to_numeric(df.imjp_number, errors='coerce').notnull()].to_string())
imjp_number                                                     imct_id
0   182467224     ed3baabac3ce4d86801d8490ea474963|pXJjGxufodMVq5FBSzHc2A
1   307291224  __gde66a472fe104ab381456ee059751d9d|Qujk8BKa0XkkpJMCstCYBw
2   214278175     mbKKbkKpiTsIAyCE8y07rw|e8133ceeca654d169532b4ad4de661d5
>>>
>>> df = df[pd.to_numeric(df.imjp_number, errors='coerce').notnull()]              
>>> print(df.to_string())                                                           
imjp_number                                                     imct_id
0   182467224     ed3baabac3ce4d86801d8490ea474963|pXJjGxufodMVq5FBSzHc2A
1   307291224  __gde66a472fe104ab381456ee059751d9d|Qujk8BKa0XkkpJMCstCYBw
2   214278175     mbKKbkKpiTsIAyCE8y07rw|e8133ceeca654d169532b4ad4de661d5
>>>

最新更新