将函数应用于数据框列时处理Null值



我正试图使用spaCy库对数据帧列中的城市(或非城市(进行分类。我的数据帧如下:

    City Match eLocations Match Country Match Region Match CountryCity Match  Null Count  Null Percent
0  Los Angeles       Long Beach    Long Beach   Long Beach       Los Angeles           0           0.0
2       Santos           Santos        Santos       Santos            Santos           0           0.0
5          NaN          Stewart       Stewart      Stewart               NaN           2          40.0
7          NaN           Meling        Meling       Meling               NaN           2          40.0

我正试图根据库给我的类型创建一个名为"Spacy Type"的附加列。我的初始功能看起来像:

def setSpace(cellValue):
    doc1 = nlp(cellValue)
    for ent in doc1.ents:
        print(ent.text, ent.start_char, ent.end_char, ent.label_)
        return ent.label_

matchCols['Spacey type'] = matchCols['City Match'].apply(setSpace)
#### OUTOUT:
(Los Angeles,)
Los Angeles 0 11 GPE
()
Traceback (most recent call last):
...
TypeError: object of type 'float' has no len()

其中nlp是来自spacy的处理器,它将某个事物分类为城市、公司、个人等。然而,运行它时,我一直得到TypeError: object of type 'float' has no len(),这是有意义的,因为其中2行包含空值。如何处理这些空值?我一辈子都无法摆脱这个错误。我还尝试了其他几种方法:

def setSpace(cellValue):
    doc1 = nlp(cellValue)
    print(doc1.ents)
    gen = (ent for ent in doc1.ents if len(ent) > 0)
    for ent in gen:
        print(ent.text, ent.start_char, ent.end_char, ent.label_)
        return ent.label_

matchCols['Spacey type'] = matchCols['City Match'].apply(setSpace)
##### AND ....

def setSpace(cellValue):
    if cellValue is "nan":
        return 0
    doc1 = nlp(cellValue)
    print(doc1.ents)
    for ent in doc1.ents:
        print(ent.text, ent.start_char, ent.end_char, ent.label_)
        return ent.label_
matchCols['Spacey type'] = matchCols['City Match'].apply(setSpace)

如果列为null,如何应用我的函数从spacy检索类型或返回0?它很好地通过了洛杉矶,但在那之后被绊倒了,因为桑托斯没有从spacy返回任何东西(这是应该的(,然后NaN值被传递。

谢谢

您可以使用pd.isna()检查单个单元格的值是否为null。(文档:https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.isna.html)

>>> import pandas as pd
>>> pd.isna('dog')
False
>>> pd.isna(pd.NA)
True

最新更新