防止 Python Pandas 中的行迭代



我已将以下文件转换为熊猫df:

https://www.fca.org.uk/publication/data/position-limits-contract-names-vpc.xlsx

我已经将相关行(为自己)转换为字典。字典的形式是{principal: [spot, aggregate, set(product codes)]}.我使用以下代码将其转换为此字典:

ifeu_dict = defaultdict(lambda: [0, 0, set()])

for (_, row) in df.iterrows():
if row.loc["Venue MIC"] == "IFEU":
ifeu_dict[row.loc["Principal Venue Product Code"]][2].add(row.loc["Venue Product Codes"])
if type(row.loc["Spot month single limit#"]) == int:
# no need for append as default is to create a dict
ifeu_dict[row.loc["Principal Venue Product Code"]][0] = row.loc["Spot month single limit#"]
ifeu_dict[row.loc["Principal Venue Product Code"]][1] = row.loc["Other month limit#"]
if type(row.loc["Spot month single limit#"]) == str:
try:
val = int(str(row.loc["Spot month single limit#"]).split()[0].replace(",", ""))
val_2 = int(str(row.loc["Other month limit#"]).split()[0].replace(",", ""))
ifeu_dict[row.loc["Principal Venue Product Code"]][0] = val
ifeu_dict[row.loc["Principal Venue Product Code"]][1] = val_2
except ValueError:
pass

然而,这真的很低效,所以我一直在尝试改变我创建这本词典的方式。

一次尝试如下:

ifeu_dict_2 = defaultdict(lambda: [0, 0, set()])
ifeu_mask = df["Venue MIC"] == "IFEU"
ifeu_df = df.loc[ifeu_mask]
spot_mask_int = ifeu_df["Spot month single limit#"].apply(type) == int

def spot_transform(x):
try:
return int(str(x).split()[0].replace(",", ""))
except ValueError:
return

ifeu_df["Spot month single limit#"] = ifeu_df.loc[~spot_mask_int, "Spot month single limit#"].apply(spot_transform)
ifeu_df["Other month limit#"] = ifeu_df.loc[~spot_mask_int, "Other month limit#"].apply(spot_transform)
spot_mask_int = ifeu_df["Spot month single limit#"].apply(type) == int

然后尝试:

temp_df = [~spot_mask_int, ["Principal Venue Product Code", "Spot month single limit#", "Other month limit#"]]
ifeu_dict_2[temp_df.loc["Principal Venue Product Code"]][0] = temp_df.loc["Spot month single limit#"]
# this gives me AttributeError: 'list' object has no attribute 'loc'

或:

ifeu_dict_2[ifeu_df.loc[spot_mask_int, "Principal Venue Product Code"]][2].add(ifeu_df.loc["Venue Product Codes"])
ifeu_dict_2[ifeu_df.loc[spot_mask_int, "Principal Venue Product Code"]][0] = ifeu_df.loc[spot_mask_int, "Spot month single limit#"]
ifeu_dict_2[ifeu_df.loc[spot_mask_int, "Principal Venue Product Code"]][1] = ifeu_df.loc[spot_mask_int, "Other month limit#"]
# this gives me TypeError: 'Series' objects are mutable, thus they cannot be hashed

被困了好一会儿,不知道该如何继续。任何帮助将不胜感激,无论是答案还是有用的链接!(对于链接,我是编码新手,所以示例可以帮助我最好地 atm)。

如果你想玩一个df:

Index(['Commodity Derivative Namen(including associated contracts)',
'Venue MIC', 'Name of Trading Venue', 'Venue Product Codes',
'Principal Venue Product Code', 'Spot month single limit#',
'Other month limit#', 'Conversion Factor', 'Unit of measurement',
'Definition of spot month'],
dtype='object')
API2 Rotterdam Coal Average Price Options (Futures Style Margin),IFEU,INTERCONTINENTAL EXCHANGE - ICE FUTURES EUROPE,RCA,ATW,Aggregated with Principal,Aggregated with Principal,nan,Lots,Calendar Month
Gasoil Diff - Gasoil 50ppm FOB Rotterdam Barges vs Low Sulphur Gasoil 1st Line Future,IFEU,INTERCONTINENTAL EXCHANGE - ICE FUTURES EUROPE,ULH,ULH,2500,2500,nan,Lots,Calendar Month
Marine Fuel 0.5% FOB Rotterdam Barges (Platts) Future,IFEU,INTERCONTINENTAL EXCHANGE - ICE FUTURES EUROPE,MF3,MF3,2500,2500,nan,Lots,Calendar Month
API2 Rotterdam Coal (supporting Cal 1x Options),IFEU,INTERCONTINENTAL EXCHANGE - ICE FUTURES EUROPE,ATC,ATW,Aggregated with Principal,Aggregated with Principal,nan,Lots,Calendar Month
API2 Rotterdam Coal (supporting Qtr 1x Options),IFEU,INTERCONTINENTAL EXCHANGE - ICE FUTURES EUROPE,ATQ,ATW,Aggregated with Principal,Aggregated with Principal,nan,Lots,Calendar Month
API2 Rotterdam Coal Cal 1x Options (Futures Style Margin),IFEU,INTERCONTINENTAL EXCHANGE - ICE FUTURES EUROPE,ATD,ATW,Aggregated with Principal,Aggregated with Principal,nan,Lots,Calendar Month
API2 Rotterdam Coal Early (122 days) Single Expiry Option (Futures Style Margin),IFEU,INTERCONTINENTAL EXCHANGE - ICE FUTURES EUROPE,RDE,ATW,Aggregated with Principal,Aggregated with Principal,nan,Lots,Calendar Month
API2 Rotterdam Coal Early (214 days) Single Expiry Option (Futures Style Margin),IFEU,INTERCONTINENTAL EXCHANGE - ICE FUTURES EUROPE,RDF,ATW,Aggregated with Principal,Aggregated with Principal,nan,Lots,Calendar Month
API2 Rotterdam Coal Early (305 days) Single Expiry Option (Futures Style Margin),IFEU,INTERCONTINENTAL EXCHANGE - ICE FUTURES EUROPE,RDG,ATW,Aggregated with Principal,Aggregated with Principal,nan,Lots,Calendar Month
API2 Rotterdam Coal Futures,IFEU,INTERCONTINENTAL EXCHANGE - ICE FUTURES EUROPE,ATW,ATW,5,550 (24.9%),38,800 (20.5%),nan,Lots,Calendar Month
API2 Rotterdam Coal Options (Futures Style Margin),IFEU,INTERCONTINENTAL EXCHANGE - ICE FUTURES EUROPE,RCO,ATW,Aggregated with Principal,Aggregated with Principal,nan,Lots,Calendar Month
API2 Rotterdam Coal Qtr 1x Options (Futures Style Margin),IFEU,INTERCONTINENTAL EXCHANGE - ICE FUTURES EUROPE,ATH,ATW,Aggregated with Principal,Aggregated with Principal,nan,Lots,Calendar Month

完成的字典中的条目应如下所示:

ATW = [5550, 38800, {'ATH', 'ATC', 'RDF', 'ATQ', 'RCA', 'ATD', 'RCO', 'RDG', 'RDE', 'ATW'}]

看看我现在理解的数据。数据包括每个产品的多个代码,您需要最终得到一个dict,该每组代码都有一个条目。您的方法逐行进行,但更有效的方法是使用DataFrame.groupby方法并一次性处理每个组。

下面的代码应该比逐行执行更有效。

df_ifeu = df[df['Venue MIC ']=='IFEU']
ifeu_dict = {}
for principal,g in df_ifeu.groupby('Principal Venue Product Code'):
# find where the product code is the same as the principal code
pr = g['Venue Product Codes '] == principal
# get the values for the principal
spot_val = g.loc[pr, 'Spot month single limit#'].iloc[0]
other_val = g.loc[pr, 'Other month limit#'].iloc[0]
# get the codes
codes = set(g['Venue Product Codes '])
# add the product to the dict
ifeu_dict[principal] = [spot_val, other_val, codes]
# confirm we have one dict entry per principal product code
assert(len(ifeu_dict)==df_ifeu['Principal Venue Product Code'].nunique())

最新更新