在列列表上应用自定义函数



我试图通过应用自定义函数来优化一些工作代码,但我不确定如何在大型数据框架中的特定列上进行优化。在下面的例子中,我在我的数据框架中选择开放式问题,这是一个调查。您将看到,我手动输入了每个开放式列,但我只想使用一个自定义函数来遍历开放式列表。

openend = ['Q28','Q56','Q63']
### Change ranges to match the above
open1 = df.iloc[:, 28:29] # isolates 'range'
open1 = open1.iloc[1:] # removes first row
open1 = pd.concat([ids, open1], axis=1) # adds ids
open2 = df.iloc[:, 56:57]
open2 = open2.iloc[1:]
open2 = pd.concat([ids, open2], axis=1)
open3 = df.iloc[:, 63:64]
open3 = open3.iloc[1:]
open3 = pd.concat([ids, open3], axis=1)
open1['question'] = df1['Q28'][0]
open1['answer'] = open1.iloc[:,1:2]
open1 = open1.drop(open1.iloc[:,1:2], axis=1)
open2['question'] = df1['Q56'][0]
open2['answer'] = open2.iloc[:,1:2]
open2 = open2.drop(open2.iloc[:,1:2], axis=1)
open3['question'] = df1['Q63'][0]
open3['answer'] = open3.iloc[:,1:2]
open3 = open3.drop(open3.iloc[:,1:2], axis=1)
open1_stack = open1
open2_stack = open2
open3_stack = open3
open1_stack["answer"] = open1_stack["answer"].str.upper().str.title()
open1_count = open1_stack.answer.str.split(expand=True).stack().value_counts()
open1_count = open1_count.to_frame().reset_index()
open1_count.columns = ['Word', 'Count']
open1_count['question'] = df1['Q28'][0]
open2_stack["answer"] = open2_stack["answer"].str.upper().str.title()
open2_count = open2_stack.answer.str.split(expand=True).stack().value_counts()
open2_count = open2_count.to_frame().reset_index()
open2_count.columns = ['Word', 'Count']
open2_count['question'] = df1['Q56'][0]
open3_stack["answer"] = open3_stack["answer"].str.upper().str.title()
open3_count = open3_stack.answer.str.split(expand=True).stack().value_counts()
open3_count = open3_count.to_frame().reset_index()
open3_count.columns = ['Word', 'Count']
open3_count['question'] = df1['Q63'][0]

谁能告诉我这个例子,你将如何遍历开放列表,并以最优的方式应用这些函数?

提前感谢。

你可以用一个接受openend作为参数的函数来包装你的所有代码,这个函数有如下的签名:

def prepare_survey(openend:list):

然后循环该列表以提取'QXX':

for q in openend:
# process

我看到您使用的是openend的相同内容,除了提取索引的前几个步骤外,没有任何更改。所以,保持它原来的样子,但是提取问题编号,像这样:

import re
def prepare_survey(openend:list):
for q in openend:
# process
idx = int(re.sub("[^0-9]", "", q))  # extract question number
# continue with the steps you have
open1 = df.iloc[:, idx:idx+1]

最新更新