pandas -使用系列问题(应该很容易)



大家好。我目前正在尝试回答以下问题,我有正确的答案(诺兰和7),但我的答案不是在一个系列的格式,我不知道如何去得到它,有人可以帮助吗?

我已经把前几个问题作为问题的上下文。

import pandas as pd
xls = pd.ExcelFile('imdb.xlsx')
df = xls.parse('imdb')
df_directors = xls.parse('directors')
df_countries = xls.parse('countries')
print("Data Loading Finished.")
""" Q1: 
Join three Dataframes: df, df_directors, and df_countries with an inner join.
Store the joined DataFrames in df.
Here are the steps:
1. Merge df with df_countries and assign it df
2. Merge df with df_directors and assign it to df again
There might be errors if the merge is not in this order, so please be careful.
"""
# your code here
df.head()
df = pd.merge(left=df, right=df_countries, how='inner', left_on='country_id', right_on='id')
df.head()
df = pd.merge(left=df, right=df_directors, how='inner', left_on='director_id', right_on='id')

# After the join, the resulting Dataframe should have 12 columns.
df.shape
""" Q4:
Who is the director with the most movies? First get the number of movies per "director_name", then save the director's name
and count as a series of length 1 called "director_with_most"
"""
# your code here
directors = df['director_name'].value_counts()
print(directors)
director_with_most = directors[]
directors.index[0]
directors[0]
print(director_with_most)

。index给出诺兰的结果,directors[0]给出他在数据库中出现的次数:7。当我检查我的答案(这是来自coursera课程)时,我得到的错误是:

AssertionError: Series Expected type <class 'pandas.core.series.Series'>, found <class 'list'> instead

请帮帮我,我已经被这个问题困了很久了。

欢呼,亚当

把这样的问题发给SO是一种不好的形式,而且,没有理由说导演[0]的数量应该是最高的,所以你离解决方案还很远。

然而,我真的很讨厌这个任务的表述方式。长度为1的序列包含两个值究竟意味着什么?愚蠢的。这样做:

director_with_most = df.director_name.value_counts().loc[lambda x: x == x.max()]

(如果Max不是唯一的,将返回多行)

这可能会解决您的问题:

director_with_most = df['director_name'].value_counts().head(1)

相关内容

  • 没有找到相关文章

最新更新