我目前正在努力改进我的python,因为我对实际数据分析掌握得很好,但我正在努力开始创建其他人可以运行以返回结果的函数,代码也会向用户发出信息性消息。下面是一个简单的数据集,我正在使用打印前3";天气;每个城市的描述,但正如你在洛杉矶看到的那样,它只有一个描述。
City Weather
0 New York Sunny
1 New York Rain
2 New York cloudy
3 New York Rain
4 New York Sunny
5 New York Sunny
6 New York partly cloudy
7 New York thunderstorm
8 New York Rain
9 New York cloudy
10 New York sunny
11 New York partly cloudy
12 New York partly cloudy
13 New York cloudy
14 New York sunny
15 New York sunny
16 New York rain
17 Austin rain
18 Austin rain
19 Austin cloudy
20 Austin sunny
21 Austin rain
22 Austin partly cloudy
23 Austin partly cloudy
24 Austin partly cloudy
25 Austin Sunny
26 Austin cloudy
27 Austin Sunny
28 Austin Sunny
29 Austin cloudy
30 Austin cloudy
31 Austin partly cloudy
32 Austin partly cloudy
33 Austin Sunny
34 Austin rain
35 Los Angeles Sunny
36 Los Angeles Sunny
37 Los Angeles Sunny
38 Los Angeles Sunny
39 Los Angeles Sunny
40 Los Angeles Sunny
41 Los Angeles Sunny
42 Los Angeles Sunny
43 Los Angeles Sunny
44 Los Angeles Sunny
45 Los Angeles Sunny
46 Los Angeles Sunny
47 Los Angeles Sunny
48 Los Angeles Sunny
49 Los Angeles Sunny
50 Los Angeles Sunny
51 Los Angeles Sunny
52 Los Angeles Sunny
我创建了一个函数来输出每个城市的值,在我自己的工作中,这很好,因为我可以对数据进行一些检查,但对于其他城市,他们需要被告知,对于洛杉矶,由于只有一个天气描述,所以无法给出前三名。我曾尝试使用带有值计数的IF语句,但我不断收到错误消息,如ValueError:序列的真值不明确。使用a.empty、a.bool((、a.item((、.any((或.all((。如果认为我的方法不正确,很难找到这类问题的例子。任何有帮助的指导或链接都将不胜感激!
def weather_valuecount(df):
weather_valcount= df.groupby(['City']).Weather.value_counts().groupby(level=0, group_keys=False).head(3)
return weather_valcount
当我运行上述程序时,我会得到以下结果:
City Weather
Austin partly cloudy 5
Sunny 4
cloudy 4
Los Angeles Sunny 18
New York Rain 3
Sunny 3
cloudy 3
Name: Weather, dtype: int64
它显示了每个城市的前三个描述计数,但洛杉矶只显示了一个,我想在函数中包含一条用户消息,用";无法显示洛杉矶的前三个唯一天气描述和计数,因为没有3个唯一值可用";。
看看这篇文章,谁解释了你为什么会得到Truth value of a Series is ambiguous
关于你的问题,我不确定我是否理解预期的产出。
请参阅下面的代码/结果(考虑到df
是保存数据集的数据帧(:
listOfCites = set(df['City'])
def show_top3_weather(df):
df1 = df.groupby('City').head(3).reset_index(drop=True).assign()
df2 = df1.drop_duplicates().groupby('City', as_index=False).count().rename(columns={'Weather':'WeatherOccu'})
df3 = df1.merge(df2, on='City', how='left').drop_duplicates()
city_name = input("Choose a city: ")
if city_name in listOfCites:
if (df3.loc[df3.City == city_name]['WeatherOccu'] == 3).any():
print(f"Below, the top three weathers of {city_name}:")
print(df3[df3.City == city_name][['City', 'Weather']])
else:
print(f"{city_name} has not three different weathers!")
else:
print(f"{city_name} doesn't exist!")
>>> show_top3_weather(df)
以纽约为输入
Choose a city: New York
Below, the top three weathers of New York:
City Weather
0 New York Sunny
1 New York Rain
2 New York cloudy
以Austin作为输入
Choose a city: Austin
Austin has not three different weathers!
以洛杉矶为输入
Choose a city: Los Angeles
Los Angeles has not three different weathers!
我们可以打印百分比来通知洛杉矶总是有晴朗的天气。作为一种选择,我们还可以添加"其他">以显示被忽略的天气类型的百分比。
考虑到一些项目可能以相同的频率出现,我建议尝试以下代码:
def get_nlargest(df, n, keep):
"n, keep: see help('pandas.Series.nlargest')"
top_n = df.value_counts(normalize=True).nlargest(n, keep)
other = pd.Series({'other': 1 - top_n.sum()})
return pd.concat([top_n, other])
def weather_nlargest(df, n=3, keep='all'):
return (
df
.groupby(['City'])['Weather']
.apply(get_nlargest, n, keep)
)
def print_percentage(df):
print(df.to_string(float_format='{:.0%}'.format))
df['Weather'] = df['Weather'].str.lower() # sunny == Sunny, rain == Rain
print_percentage(weather_nlargest(df))
输出:
City
Austin sunny 28%
partly cloudy 28%
cloudy 22%
rain 22%
other 0%
Los Angeles sunny 100%
other 0%
New York sunny 35%
rain 24%
cloudy 18%
partly cloudy 18%
other 6%
代码查看不超过3种天气类型:
print_percentage(weather_nlargest(df, 3, 'first'))
输出:
City
Austin sunny 28%
partly cloudy 28%
cloudy 22%
other 22%
Los Angeles sunny 100%
other 0%
New York sunny 35%
rain 24%
cloudy 18%
other 24%