在阅读熊猫的循环时，如何连接串扰

我正在使用python 3.5中的pandas模块从子目录中递归阅读crosstabs，我想在我致电pd.crosstab（）和for for之后，将crosstab在for for for for for循环中加成循环将输出写入Excel文件。我尝试将Table1复制到Table3（请参阅下面的代码），然后拨打PD.Crosstab（），但是如果后者数据文件中不存在某些值，则Table3显示了这些条目的NAN。我看了看pd.concat，但找不到如何在循环中使用它的示例。

数据文件看起来像（有100个文件和许多列，但这里仅显示我感兴趣的列）：

    First Data File
    StudentID    Grade      
    3            A
    2            B
    1            A
    Second Data File
    StudentID   Grade
    1            B
    2            A
    3            A
    Third Data File
    StudentID   Grade
    2            C
    1            B
    3            A
    and so on ....
    At the end the output should be like:
    Grade       A   B   C
    StudentID
    1           1   2   0
    2           1   1   1
    3           3   0   0

我的python程序看起来像（从文件顶部删除导入）

.....

fields = ['StudentID', 'Grade']
path= 'C:/script_testing/'
i=0
for filename in glob.glob('C:/script_testing/**/*.txt', recursive=True):
    temp = pd.read_csv(filename, sep=',', usecols=fields)
    table1 = pd.crosstab(temp.StudentID, temp.Grade)
    # Note the if condition is executed only once to initlialize table3
    if(i==0):
        table3 = table1
        i = i + 1
    table3 = table3 + table1
writer = pd.ExcelWriter('Report.xlsx', engine='xlsxwriter')
table3.to_excel(writer, sheet_name='StudentID_vs_Grade')
writer.save()

pd.concat([df1, df2, df3]).pipe(lambda d: pd.crosstab(d.StudentID, d.Grade))
Grade      A  B  C
StudentID         
1          1  2  0
2          1  1  1
3          3  0  0

我试图翻译您的代码

fields = ['StudentID', 'Grade']
path= 'C:/script_testing/'
i=0
parse = lambda f: pd.read_csv(f, usecols=fields)
table3 = pd.concat(
    [parse(f) for f in glob.glob('C:/script_testing/**/*.txt', recursive=True)]
).pipe(lambda d: pd.crosstab(d.StudentID, d.Grade))
writer = pd.ExcelWriter('Report.xlsx', engine='xlsxwriter')
table3.to_excel(writer, sheet_name='StudentID_vs_Grade')
writer.save()

相关内容

最新更新

热门标签：