代码为每个城市输出前3个最频繁的天气描述字符串,但如果一个城市只有一个字符串,是否要为用户添加消息?(Python)



我目前正在努力改进我的python,因为我对实际数据分析掌握得很好,但我正在努力开始创建其他人可以运行以返回结果的函数,代码也会向用户发出信息性消息。下面是一个简单的数据集,我正在使用打印前3";天气;每个城市的描述,但正如你在洛杉矶看到的那样,它只有一个描述。

City        Weather
0      New York          Sunny
1      New York           Rain
2      New York         cloudy
3      New York           Rain
4      New York          Sunny
5      New York          Sunny
6      New York  partly cloudy
7      New York   thunderstorm
8      New York           Rain
9      New York         cloudy
10     New York          sunny
11     New York  partly cloudy
12     New York  partly cloudy
13     New York         cloudy
14     New York          sunny
15     New York          sunny
16     New York           rain
17       Austin           rain
18       Austin           rain
19       Austin         cloudy
20       Austin          sunny
21       Austin           rain
22       Austin  partly cloudy
23       Austin  partly cloudy
24       Austin  partly cloudy
25       Austin          Sunny
26       Austin         cloudy
27       Austin          Sunny
28       Austin          Sunny
29       Austin         cloudy
30       Austin         cloudy
31       Austin  partly cloudy
32       Austin  partly cloudy
33       Austin          Sunny
34       Austin           rain
35  Los Angeles          Sunny
36  Los Angeles          Sunny
37  Los Angeles          Sunny
38  Los Angeles          Sunny
39  Los Angeles          Sunny
40  Los Angeles          Sunny
41  Los Angeles          Sunny
42  Los Angeles          Sunny
43  Los Angeles          Sunny
44  Los Angeles          Sunny
45  Los Angeles          Sunny
46  Los Angeles          Sunny
47  Los Angeles          Sunny
48  Los Angeles          Sunny
49  Los Angeles          Sunny
50  Los Angeles          Sunny
51  Los Angeles          Sunny
52  Los Angeles          Sunny

我创建了一个函数来输出每个城市的值,在我自己的工作中,这很好,因为我可以对数据进行一些检查,但对于其他城市,他们需要被告知,对于洛杉矶,由于只有一个天气描述,所以无法给出前三名。我曾尝试使用带有值计数的IF语句,但我不断收到错误消息,如ValueError:序列的真值不明确。使用a.empty、a.bool((、a.item((、.any((或.all((。如果认为我的方法不正确,很难找到这类问题的例子。任何有帮助的指导或链接都将不胜感激!

def weather_valuecount(df):
weather_valcount= df.groupby(['City']).Weather.value_counts().groupby(level=0, group_keys=False).head(3)
return weather_valcount

当我运行上述程序时,我会得到以下结果:

City         Weather      
Austin       partly cloudy     5
Sunny             4
cloudy            4
Los Angeles  Sunny            18
New York     Rain              3
Sunny             3
cloudy            3
Name: Weather, dtype: int64

它显示了每个城市的前三个描述计数,但洛杉矶只显示了一个,我想在函数中包含一条用户消息,用";无法显示洛杉矶的前三个唯一天气描述和计数,因为没有3个唯一值可用";。

看看这篇文章,谁解释了你为什么会得到Truth value of a Series is ambiguous

关于你的问题,我不确定我是否理解预期的产出。

请参阅下面的代码/结果(考虑到df是保存数据集的数据帧(:

listOfCites = set(df['City'])
def show_top3_weather(df):
df1 = df.groupby('City').head(3).reset_index(drop=True).assign()
df2 = df1.drop_duplicates().groupby('City', as_index=False).count().rename(columns={'Weather':'WeatherOccu'})
df3 = df1.merge(df2, on='City', how='left').drop_duplicates()
city_name = input("Choose a city: ")

if city_name in listOfCites:
if (df3.loc[df3.City == city_name]['WeatherOccu'] == 3).any():
print(f"Below, the top three weathers of {city_name}:")
print(df3[df3.City == city_name][['City', 'Weather']])
else:
print(f"{city_name} has not three different weathers!")
else:
print(f"{city_name} doesn't exist!")

>>> show_top3_weather(df)

以纽约为输入

Choose a city:  New York
Below, the top three weathers of New York:
City Weather
0  New York   Sunny
1  New York    Rain
2  New York  cloudy

以Austin作为输入

Choose a city:  Austin
Austin has not three different weathers!

以洛杉矶为输入

Choose a city:  Los Angeles
Los Angeles has not three different weathers!

我们可以打印百分比来通知洛杉矶总是有晴朗的天气。作为一种选择,我们还可以添加"其他">以显示被忽略的天气类型的百分比。

考虑到一些项目可能以相同的频率出现,我建议尝试以下代码:

def get_nlargest(df, n, keep):
"n, keep: see help('pandas.Series.nlargest')"
top_n = df.value_counts(normalize=True).nlargest(n, keep)
other = pd.Series({'other': 1 - top_n.sum()})
return pd.concat([top_n, other])

def weather_nlargest(df, n=3, keep='all'):
return (
df
.groupby(['City'])['Weather']
.apply(get_nlargest, n, keep)
)

def print_percentage(df):
print(df.to_string(float_format='{:.0%}'.format))

df['Weather'] = df['Weather'].str.lower()   # sunny == Sunny, rain == Rain
print_percentage(weather_nlargest(df))

输出:

City                      
Austin       sunny            28%
partly cloudy    28%
cloudy           22%
rain             22%
other             0%
Los Angeles  sunny           100%
other             0%
New York     sunny            35%
rain             24%
cloudy           18%
partly cloudy    18%
other             6%

代码查看不超过3种天气类型:

print_percentage(weather_nlargest(df, 3, 'first'))

输出:

City                      
Austin       sunny            28%
partly cloudy    28%
cloudy           22%
other            22%
Los Angeles  sunny           100%
other             0%
New York     sunny            35%
rain             24%
cloudy           18%
other            24%