如何在Pandas中划分数据集?


firstpart = D2.loc[(D2['Age'] == "15") 
| (D2['Age'] == "16")
| (D2['City'] == "Paris")
| (D2['City'] == "London")
| (D2['City'] == "Istanbul")
| (D2['Health'] == "Ok")
]

这就是我如何从数据集中得到我想要的,但我想采取数据集的其余部分并将其保存为新数据集。熊猫有一些功能可以很容易地做到这一点吗?

下面是我的示例代码。你可以查看

import pandas as pd
df = pd.DataFrame()
df['Age'] = ["16","17","18"]
df['City'] = ['Paris']*2 + ["London"]
df['Health'] = ['OK']*2 + ['Not Ok']
mask = (df['Age'] == "16") | (df['City'] == "Paris") | (df['Health'] == 'OK')
print(df[mask])
print(df[~mask])

结果

Age   City Health
0  16  Paris     OK
1  17  Paris     OK

Age    City  Health
2  18  London  Not Ok

你可以把你的代码分割成

mask = (D2['Age'] == "15") 
| (D2['Age'] == "16")
| (D2['City'] == "Paris")
| (D2['City'] == "London")
| (D2['City'] == "Istanbul")
| (D2['Health'] == "Ok")
fisrt_part = D2[mask]
rest_data = D2[~mask]

这里,我用~生成False->True,反向

最新更新