大家好。我目前正在尝试回答以下问题,我有正确的答案(诺兰和7),但我的答案不是在一个系列的格式,我不知道如何去得到它,有人可以帮助吗?
我已经把前几个问题作为问题的上下文。
import pandas as pd
xls = pd.ExcelFile('imdb.xlsx')
df = xls.parse('imdb')
df_directors = xls.parse('directors')
df_countries = xls.parse('countries')
print("Data Loading Finished.")
""" Q1:
Join three Dataframes: df, df_directors, and df_countries with an inner join.
Store the joined DataFrames in df.
Here are the steps:
1. Merge df with df_countries and assign it df
2. Merge df with df_directors and assign it to df again
There might be errors if the merge is not in this order, so please be careful.
"""
# your code here
df.head()
df = pd.merge(left=df, right=df_countries, how='inner', left_on='country_id', right_on='id')
df.head()
df = pd.merge(left=df, right=df_directors, how='inner', left_on='director_id', right_on='id')
# After the join, the resulting Dataframe should have 12 columns.
df.shape
""" Q4:
Who is the director with the most movies? First get the number of movies per "director_name", then save the director's name
and count as a series of length 1 called "director_with_most"
"""
# your code here
directors = df['director_name'].value_counts()
print(directors)
director_with_most = directors[]
directors.index[0]
directors[0]
print(director_with_most)
。index给出诺兰的结果,directors[0]给出他在数据库中出现的次数:7。当我检查我的答案(这是来自coursera课程)时,我得到的错误是:
AssertionError: Series Expected type <class 'pandas.core.series.Series'>, found <class 'list'> instead
请帮帮我,我已经被这个问题困了很久了。
欢呼,亚当
把这样的问题发给SO是一种不好的形式,而且,没有理由说导演[0]的数量应该是最高的,所以你离解决方案还很远。
然而,我真的很讨厌这个任务的表述方式。长度为1的序列包含两个值究竟意味着什么?愚蠢的。这样做:
director_with_most = df.director_name.value_counts().loc[lambda x: x == x.max()]
(如果Max不是唯一的,将返回多行)
这可能会解决您的问题:
director_with_most = df['director_name'].value_counts().head(1)