pdf mcq到pandas数据帧


  1. 比较成本优势理论由-----a( 阿尔弗雷德·马歇尔大卫·里卡多c( Taussig d(Heberler
  2. 里卡多的比较成本理论是基于以下哪一个假设a( 共同市场b(同等成本c( 垄断d(自由贸易


The theory of comparative cost advantage theory was Introduced by-----                  Alfred Marshall     David Ricardo     Taussig     Heberler
The Ricardo’s comparative cost theory is based on which of the following assumption     Common Market       Equal cost        Monopoly    Free Trade
  • 逐行用换行符分隔
  • 逐列按正则表达式拆分
rawtxt = """The theory of comparative cost advantage theory was Introduced by----- a) Alfred Marshall b) David Ricardo c) Taussig d) Heberler
The Ricardo’s comparative cost theory is based on which of the following assumption a) Common Market b) Equal cost c) Monopoly d) Free Trade"""
df = pd.DataFrame({"rawtxt":rawtxt.split("n")})


import re
regex = r"(.+)(a).+).+(b).+).+(c).+).+(d).+)"
pdf_txt = """The theory of comparative cost advantage theory was Introduced by----- a) Alfred Marshall b) David Ricardo c) Taussig d) Heberlern 
The Ricardo’s comparative cost theory is based on which of the following assumption a) Common Market b) Equal cost c) Monopoly d) Free Traden"""
matches = re.finditer(regex, pdf_txt, re.MULTILINE)
data = {1 : [], 2 : [], 3 : [], 4 : [], 5 : []}
for match_num, match in enumerate(matches, start=1):
for group_num in range(0, len(match.groups())):
data[group_num + 1].append(match.group(group_num + 1))

df = pd.DataFrame(data)
df.columns = ['Question', 'A', "B", "C", "D"]
