根据空值和阻塞值将Dataframe拆分为两个



我有一个数据框,我想把它分成两个基于多列的数据框。

df应该包含所有没有null列的行,并且状态为yes。其余部分应该在df_null

df = vehicle.csv
Status  Country City     Year 
Yes     USA     New York 2001
Yes     Canada           2001
Yes     France  Paris    
No              Rio      1843
No      Germany Berlin   2008
Yes                      2004
# df_null has all the rows with null in the three columns
df_null = df[~df[['Country', 'City', 'Year']].notnull().all(1)]
# df has all columns with not null and status = yes
df = df[df[['Country', 'City', 'Year']].notnull().all(1)]
df = df.loc[df['Status'] == 'Yes']
result = pd.concat([df, df_null])   

与德国的行不在结果数据框中,因为它被Status = Yes过滤掉了。

您可以使用下面的代码创建一个二进制掩码:

# Import Data
df = pd.DataFrame(
[
["Yes", "USA", "New York", 2001],
["Yes", "Canada", None, 2001],
["Yes", "France", "Paris", None],
["No", None, "Rio", 1843],
["No", "Germany", "Berlin", 2008],
["Yes", None, None, 2004],
],
columns=["Status", "Country", "City", "Year"],
)
# Create Mask
valid_rows = (df[["Country", "City", "Year"]].notnull().all(1)) & (df["Status"] == "Yes")
df_null = df[~valid_rows]  # Filter by inverse of mask
df = df[valid_rows]  # Filter by mask

输出df as:

这是你要找的吗?

# Import pandas library 
import pandas as pd 
import numpy as np
# initialize list of lists 
data = [['Yes', 'USA', 'New York' ,2001 ],['Yes', 'Canada','',2001 ], ['Yes', 'France', 'Paris' ,'' ], ['No','' , 'Rio' ,1843 ],['No', 'Germany', 'Berlin' ,2008 ],['Yes', '', '' ,2004 ]] 
# Create the pandas DataFrame 
df = pd.DataFrame(data, columns = ["Status","Country","City","Year"]) 
# Adding filter conditions. 
df_new =  df.replace('', np.nan)
df_new = df_new[df_new.Status == 'Yes'].dropna()
df_null =df[(~df.isin(df_new))].dropna()
# Printing the two dataframes
print(df_new)
print(df_null)

如果您的问题语句是基于Null值拆分数据框,那么只需使用下面的代码:

DF_null = processed_records_DF[processsed_records_df ['ColumnName'].isnull()]

DF_not_null = processed_records_DF[processsed_records_df ['ColumnName'].notnull()]

最新更新