用熊猫筛选CSV

我很难过滤这个CSV文件。

以下是csv表中的一些条目：

Name      Info                                 Bio
Alice     Woman: 21y (USA)                     Actress
Breonna   Woman: (France)                      Singer
Carla     Woman: 30y (Trinidad and Tobago)     Actress
Diana     Woman: (USA)                         Singer

我正在尝试过滤"信息"行，以获得所有国家和频率的列表。随着年龄的增长，我也在努力做同样的事情。正如你所看到的，并不是所有的女性都公布了自己的年龄。

我试过

women= pd.read_csv('women.csv')
women_count = pd.Series(' '.join(women.Info).split()).value_counts()

然而，这会分割所有内容和输出：

Woman:     4
(USA)      2
21y        1
(Trinidad  1
and        1
Tobago)    1
30y        1

我应该补充一点，我试过women_filtered = women[women['Info'] == '(USA)']，但不起作用

我的问题是：

我如何分割字符串以按国家/地区进行筛选，尤其是因为所有国家/地区都在括号中
如何筛选没有年龄的条目

谢谢

import pandas as pd
df = pd.DataFrame(
{'Name':['Alice', 'Breonna', 'Carla', 'Diana'],
'Info':['Woman: 21y (USA)', 'Woman: (France)', 'Woman: 30y (Trinidad and Tobago)', 'Woman: (USA)'],
'Bio':['Actress', 'Singer', 'Actress', 'Singer']}
)
# defining columns using regex
df['country'] = df['Info'].str.extract('(([^)]+))')
df['age'] = df['Info'].str.extract('[s]+([d]{2})y[s]+').astype(float)
df['noage'] = df['age'].isnull().astype(int)
# frequency of countries
sizes = df.groupby('country').size()
sizes

这将输出频率。

country
France                 1
Trinidad and Tobago    1
USA                    2
dtype: int64

我将查找如何编写regex表达式，这样您就可以学习如何自己从字符串中提取信息。Pythex.org是一个很好的网站，可以试用Python中的regex表达式，并提供了一些有用的提示。

打印(df(

Name                       Info      Bio
0    Alice           Woman: 21y (USA)  Actress
1    Carla  30y (Trinidad and Tobago)   Singer
2  Breonna            Woman: (France)  Actress
3    Diana               Woman: (USA)   Singer
#Solution

#Extract Name of countries
df=df.assign(Age=df.Info.str.extract('(d+(?=D))'), Countries=df.Info.str.extract('((.*?))'))
Name                       Info             Bio     Age                   Countries
0    Alice           Woman: 21y (USA)  Actress   21                  USA
1    Carla  30y (Trinidad and Tobago)   Singer   30  Trinidad and Tobago
2  Breonna            Woman: (France)  Actress  NaN               France
3    Diana               Woman: (USA)   Singer  NaN                  USA


#Filter without Age
df[df.Age.isna()]
Name             Info      Bio  Age  Countries
2  Breonna  Woman: (France)  Actress  NaN    France
3    Diana     Woman: (USA)   Singer  NaN       USA

相关内容

最新更新

热门标签：