清洗看起来很可怕,重复尝试除外



有没有办法让这一团乱麻看起来好看点?问题是,有时它会返回一个空数据帧,因此不能转换为浮点数。即使它可以返回零,这也不好,所以我把它设为一个小值0.0001,以避免分母为零。

感谢
cities = list(set(nlp_inst.raw_data_frames['SKY_modified']['City'].unique()))
mapping_data = []
for city in cities:
try:
lat = float(pos_df[pos_df['City']==city]['Latitude'].unique())
except:
lat = np.nan
try:
long = float(pos_df[pos_df['City']==city]['Longitude'].unique())
except:
long = np.nan
try:
city_count_pos = pos_df[pos_df['City']==city].count()[0]
except:
city_count_pos = 0.0001
try:
city_count_neg = neg_df[neg_df['City']==city].count()[0]
except:
city_count_neg = 0.0001
mapping_data.append([lat, long, city_count_pos/(city_count_pos+city_count_neg)])
print(mapping_data[-1])

没有特定的方法可以精确地操作各种tryexcept。但是,我建议您使用以下方法:

cities = list(set(nlp_inst.raw_data_frames['SKY_modified']['City'].unique()))
mapping_data = []
for city in cities:
cords = [pos_df[pos_df['City']==city]['Latitude'].unique(), pos_df[pos_df['City']==city]['Longitude'].unique()] # [lat, long]
for i in range(len(cords)): 
if not cords[i]: cords[i] = np.nan # checks if cord is not null | you can replace this with your empty set
else: cords[i] = float(cords[i])
city_count_int = [pos_df[pos_df['City']==city].count(), neg_df[neg_df['City']==city].count()] # [pos, neg]
for j in range(len(city_count_int)):
if not city_count_int[j]: city_count_int[j] = 0.0001
else: city_count_int[j] = city_count_int[j][0]
mapping_data.append([lat, long, city_count_pos/(city_count_pos+city_count_neg)])
print(mapping_data[-1])

您还可以创建一个基本功能来清除集群。

注意:上面的代码是考虑到你的空集意味着None

最新更新