我在python3.5中运行此代码以找到和解(逻辑回归)。
for i in (ones2.index):
for j in (zeros2.index):
pairs_tested = pairs_tested+1
if(ones2.iloc[i,1] > zeros2.iloc[j,1]):
conc = conc+1
elif(ones2.iloc[i,1]==zeros2.iloc[j,1]):
ties = ties+1
else:
disc = disc+1
# Calculate concordance, discordance and ties
concordance = conc/pairs_tested
discordance = disc/pairs_tested
ties_perc = ties/pairs_tested
print("Concordance = %r", concordance)
print("Discordance = %r", discordance)
print("Tied = %r", ties_perc)
print("Pairs = %r", pairs_tested)
zeros2(panda dataframe)和36k行中的 ande2(Panda Dataframe)中有0.15mln行。两个表都有两个变量
[i] 响应器(zeros2中的RESSONDER0 = 0,而响应者1 = 1中的1)。
[ii] 概率(zeros2中的prob0 in zeros2和ones2中的prob1)。
我的问题是: for循环已经花了12小时,并且在问这个问题的时候仍在运行。需要帮忙。如何更快地执行此操作。我正在使用8GB RAM的Windows 64位计算机运行此操作。
您的代码正在进行54亿个计算,这是两个循环(0.15 mil * 36k)的计算:
我会做这样的事情:(感谢@Leon帮助我使这个答案更好)
from bisect import bisect_left, bisect_right
zeros_list = sorted([zeros2.iloc[j,1] for j in zeros2.index])
zeros2_length = len(zeros2_list)
for i in ones2.index:
cur_disc = bisect_left(zeros2_list, ones2.iloc[i,1])
cur_ties = bisect_right(zeros2_list, ones2.iloc[i,1]) - cur_disc
disc += cur_disc
ties += cur_ties
conc += zeros2_length - cur_ties - cur_disc
pairs_tested = zeros2_length * len(ones2.index)
concordance = conc/pairs_tested
discordance = disc/pairs_tested
ties_perc = ties/pairs_tested
print("Concordance = %r", concordance)
print("Discordance = %r", discordance)
print("Tied = %r", ties_perc)
print("Pairs = %r", pairs_tested
或反过来,像这样:
zeros_list = sorted([zeros2.iloc[j,1] for j in zeros2.index])
ones2_list = sorted([ones2.iloc[i,1] for i in ones2.index])
zeros2_length = len(zeros2_list)
ones2_length = len(ones2_list)
for i in zeros2.index:
cur_conc = bisect_left(ones2_list, zeros2.iloc[i,1])
cur_ties = bisect_right(ones2_list, zeros2.iloc[i,1]) - cur_conc
conc += cur_conc
ties += cur_ties
disc += ones2_length - cur_ties - cur_conc
# We could also achieve the above like this too:
# for i in zeros2_list:
# cur_conc = bisect_left(ones2_list, i)
# cur_ties = bisect_right(ones2_list, i) - cur_conc
# conc += cur_conc
# ties += cur_ties
# disc += ones2_length - cur_ties - cur_conc
pairs_tested = zeros2_length * ones2_length
concordance = conc/pairs_tested
discordance = disc/pairs_tested
ties_perc = ties/pairs_tested
print("Concordance = %r", concordance)
print("Discordance = %r", discordance)
print("Tied = %r", ties_perc)
print("Pairs = %r", pairs_tested
Probability = model.predict_proba(data[predictors])
Probability1 = pd.DataFrame(Probability)
Probability1.columns = ['Prob_LoanStatus_0','Prob_LoanStatus_1']
TruthTable = pd.merge(data[[outcome]], Probability1[['Prob_LoanStatus_1']], how='inner', left_index=True, right_index=True)
zeros = TruthTable[(TruthTable['Loan_Status']==0)].reset_index().drop(['index'], axis = 1)
ones = TruthTable[(TruthTable['Loan_Status']==1)].reset_index().drop(['index'], axis = 1)
from bisect import bisect_left, bisect_right
zeros_list = sorted([zeros.iloc[j,1] for j in zeros.index])
zeros_length = len(zeros_list)
disc = 0
ties = 0
conc = 0
for i in ones.index:
cur_conc = bisect_left(zeros_list, ones.iloc[i,1])
cur_ties = bisect_right(zeros_list, ones.iloc[i,1]) - cur_conc
conc += cur_conc
ties += cur_ties
pairs_tested = zeros_length * len(ones.index)
disc = pairs_tested - conc - ties
print("Pairs = ", pairs_tested)
print("Conc = ", conc)
print("Disc = ", disc)
print("Tied = ", ties)
concordance = conc/pairs_tested
discordance = disc/pairs_tested
ties_perc = ties/pairs_tested
print("Concordance = %r", concordance)
print("Discordance = %r", discordance)
print("Tied = %r", ties_perc)
我遵循了Sreyantha Chary的答复,这是优雅的,但是在答案的第一部分中混合了一致性和不一致的百分比。