使用熊猫(或numpy)比较两列并计算百分比差异



免责声明:我正在学习用Python进行开发,我知道这种编码方式可能就像垃圾一样,但我计划在创建程序时不断改进。

所以我正在尝试构建一个刮板来每天使用 Selenium 检查特定航班价格,并且这部分代码已经完成。出发地,目的地,第一个航班日期,第二个航班日期和价格将每天保存。我将这些数据保存到一个文件中,然后比较价格是否有任何变化。

我的目标是让价格变化超过X%,然后为每个比较的航班在脚本中打印一条消息。

import pandas as pd
import os.path
import numpy as np
#This are just sample data before integrating Selenium values
price = 230
departuredate = '20/02/2020'
returndate = '20/02/2020'
fromm = 'BOS'
to = 'JFK'
price2 = 630
departuredate2 = '20/02/2020'
returndate2 = '20/02/2020'
fromm2= 'CDG'
to2= 'JFK'
#End of sample data

flightdata = {'From': [fromm, fromm2], 'To': [to,to2], 'Departure date': [departuredate,departuredate2], 'Return date': [returndate,returndate2], 'Price': [price,price2]}
df = pd.DataFrame(flightdata, columns= ['From', 'To', 'Departure date', 'Return date', 'Price'])

#Check if the script is running for the first time
if os.path.exists('flightstoday.xls') == True:
os.remove("flightsyesterday.xls")
os.rename('flightstoday.xls', 'flightsyesterday.xls') #Rename the flights scraped fromm yesterday
df.to_csv('flightstoday.xls', mode='a', header=True, sep='t')
else:
df.to_csv('flightstoday.xls', mode='w', header=True, sep='t')
#Work with two dataframes
flightsyesterday = pd.read_csv("flightsyesterday.xls",sep='t') 
flightstoday = pd.read_csv("flightstoday.xls",sep='t')

我缺少的是如何比较"价格"列并打印一条消息,说明对于 X 行与"从"、"到"、"出发日期"、"返回日期",航班已更改 X 百分比。

我试过这段代码,但它只在 flighstoday 文件中添加了一列,但没有添加百分比,当然也不会打印价格有任何变化。

flightstoday['PriceDiff'] = np.where(vueloshoy['Price'] == vuelosayer['Price'], 0, vueloshoy['Price'] - vuelosayer['Price'])

对这个新手的任何帮助将不胜感激。 谢谢!

从我收集到的信息来看,我认为这就是你打算做的。

import pandas as pd
import os.path
import numpy as np
# This are just sample data before integrating Selenium values
price = 230
departuredate = '20/02/2020'
returndate = '20/02/2020'
fromm = 'BOS'
to = 'JFK'
price2 = 630
departuredate2 = '20/02/2020'
returndate2 = '20/02/2020'
fromm2 = 'CDG'
to2 = 'JFK'
# Create second set of prices
price3 = 250
price4 = 600
# Generate data to construct DataFrames
today_flightdata = {'From': [fromm, fromm2], 'To': [to, to2], 'Departure date': [
departuredate, departuredate2], 'Return date': [returndate, returndate2], 'Price': [price, price2]}
yesterday_flightdata = {'From': [fromm, fromm2], 'To': [to, to2], 'Departure date': [
departuredate, departuredate2], 'Return date': [returndate, returndate2], 'Price': [price3, price4]}
# Create dataframes for yesterday and today
today = pd.DataFrame(today_flightdata, columns=[
'From', 'To', 'Departure date', 'Return date', 'Price'])
yesterday = pd.DataFrame(yesterday_flightdata, columns=[
'From', 'To', 'Departure date', 'Return date', 'Price'])
# Determine changes
today['price_change'] = (
today['Price'] - yesterday['Price']) / yesterday['Price'] * 100.
# Determine indices of all rows where price_change > threshold
threshold = 1.0
today['exceeds_threshold'] = abs(today['price_change']) >= threshold
exceed_indices = today['exceeds_threshold'][today['exceeds_threshold']].index
# Print out those entries that exceed threshold
for idx in exceed_indices:
row = today.iloc[idx]
print('Flight from {} to {} leaving on {} and returning on {} has changed by {}%'.format(
row['From'], row['To'], row['Departure date'], row['Return date'], row['price_change']))

输出:

Flight from CDG to JFK leaving on 20/02/2020 and returning on 20/02/2020 has changed by 5.0%

我从这篇文章中学习了计算exceed_indices的语法

最新更新