用其他列中的筛选值填充选定列中缺少的值



我在数据帧中有一个名为null的奇怪列,其中包含其他列中缺少的一些值。一列是名为location的lat-lon坐标,另一列是一个整数,表示名为level的目标变量。在locationlevel缺少值的某些情况下(但不是所有情况(,应该存在的值在此null列中。下面是一个示例df:

pd.DataFrame(
{'null': {0: '43.70477575,-72.28844073', 1: '2', 2: '43.70637091,-72.28704334', 3: '4', 4: '3'},
'location': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan},
'level': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan}
}
)

我需要能够根据值是整数还是字符串来过滤null列,然后在此基础上用适当的值填充适当列中缺失的值。我已经尝试过在for循环中使用带有lambda函数的.apply()以及.match().contains()in,但到目前为止没有成功。

最简单(如果不是最简单的话(的方法是用df.null中的值填充df.locationdf.level中所有缺失的值,然后用regex创建一个布尔过滤器,将df.locationdf.level中不适当/分配错误的值返回给np.nan

pd.fillna((

df = pd.DataFrame(
{'null': {0: '43.70477575,-72.28844073', 1: '2', 2: '43.70637091,-72.28704334', 3: '4', 4: '3'},
'location': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan},
'level': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan}
}
)
for col in ['location', 'level']:
df[col].fillna(
value = stress.null,
inplace = True
)

现在,我们将使用字符串表达式来更正错误分配的值。

str.contains((

# Converting columns to type str so string methods work
df = df.astype(str)
# Using regex to change values that don't belong in column to NaN
regex = '[,]'
df.loc[df.level.str.contains(regex), 'level'] = np.nan

regex = '^d.?0?$'
df.loc[df.location.str.contains(regex), 'location'] = np.nan

# Returning `df.level` to float datatype (str is the correct
# datatype for `df.location`
df.level.astype(float)

这是输出:

pd.DataFrame(
{'null': {0: '43.70477575,-72.28844073', 1: '2', 2: '43.70637091,-72.28704334', 3: '4', 4: '3'},
'location': {0: '43.70477575,-72.28844073', 1: nan, 2: '43.70637091,-72.28704334', 3: nan, 4: nan},
'level': {0: nan, 1: '2', 2: nan, 3: '4', 4: '3'}
}
)

让我们试试to_numeric

checker = pd.to_numeric(df.null, errors='coerce')
checker
Out[171]: 
0    NaN
1    2.0
2    NaN
3    4.0
4    3.0
Name: null, dtype: float64

并且应用isnull,如果返回NaN意味着字符串不是int

isstring = checker.isnull()
Out[172]: 
0     True
1    False
2     True
3    False
4    False
Name: null, dtype: bool
# isnumber = checker.notnull()

填充值

df.loc[isnumber, 'location'] = df['null']
df.loc[isstring, 'level'] = df['null']

另一种方法可能使用方法pandas.Series.mask:

>>> df
null  location  level
0  43.70477575,-72.28844073       NaN    NaN
1                         2       NaN    NaN
2  43.70637091,-72.28704334       NaN    NaN
3                         4       NaN    NaN
4                         3       NaN    NaN
>>> df.level.mask(df.null.str.isnumeric(), other = df.null, inplace = True)
>>> df.location.where(df.null.str.isnumeric(), other = df.null, inplace = True)
>>>
>>> df
null                  location level
0  43.70477575,-72.28844073  43.70477575,-72.28844073   NaN
1                         2                       NaN     2
2  43.70637091,-72.28704334  43.70637091,-72.28704334   NaN
3                         4                       NaN     4
4                         3                       NaN     3

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.mask.htmlhttps://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.where.html

相关内容

  • 没有找到相关文章