在 Python 或 R 中合并具有不同标头的数据



我有一个 excel 文件,其中包含需要合并的多张工作表。但是,列标题彼此不同。目前数据如下所示。

Sheet 1
+-------------+--------------+----------+--------+---------+---------+
| FISCAL_YEAR | COMPANY_CODE | ACCOUNTS | Header | Header1 | Header2 |
+-------------+--------------+----------+--------+---------+---------+
|          17 | Data         | Data     |      0 |       0 |       0 |
|          17 | Data         | Data     |      0 |       0 |       0 |
+-------------+--------------+----------+--------+---------+---------+
Sheet 2
+-------------+--------------+----------+---------+---------+
| FISCAL_YEAR | COMPANY_CODE | ACCOUNTS | Header3 | Header2 |
+-------------+--------------+----------+---------+---------+
|          15 | Data         | Data     |       0 |       0 |
|          15 | Data         | Data     |       0 |       0 |
+-------------+--------------+----------+---------+---------+
Sheet 3
+-------------+--------------+----------+---------+---------+---------+
| FISCAL_YEAR | COMPANY_CODE | ACCOUNTS | Header4 | Header1 | Header3 |
+-------------+--------------+----------+---------+---------+---------+
|          16 | Data         | Data     |       0 |       0 |       0 |
|          16 | Data         | Data     |       0 |       0 |       0 |
+-------------+--------------+----------+---------+---------+---------+
OUTPUT
+-------------+--------------+----------+--------+---------+---------+---------+---------+-----------+
| FISCAL_YEAR | COMPANY_CODE | ACCOUNTS | Header | Header1 | Header2 | Header3 | Header4 | SheetName |
+-------------+--------------+----------+--------+---------+---------+---------+---------+-----------+
|          17 | Data         | Data     | 0      | 0       | 0       | null    | null    | Sheet1    |
|          17 | Data         | Data     | 0      | 0       | 0       | null    | null    | Sheet1    |
|          15 | Data         | Data     | null   | null    | 0       | 0       | null    | Sheet2    |
|          15 | Data         | Data     | null   | null    | 0       | 0       | null    | Sheet2    |
|          16 | Data         | Data     | null   | 0       | null    | 0       | 0       | Sheet3    |
|          16 | Data         | Data     | null   | 0       | null    | 0       | 0       | Sheet3    |
+-------------+--------------+----------+--------+---------+---------+---------+---------+-----------+

我对Python比较陌生。我用过熊猫和麻瓜。我有多达 60 张纸要工作。谁能帮助我了解如何实现这一目标?如果不是python,我应该使用其他工具/方法吗?我真的可以使用代码示例来开始。

非常感谢您的帮助。提前谢谢你

使用R,这很容易做到。

library(openxlsx) # to read xlsx files
library(purrr)    # for the "map" function
wb <- loadWorkbook("path/filename.xlsx")
all_sheets <- names(wb)
merged_data <- map_df(all_sheets, ~ read.xlsx(wb, sheet = .x)

在 R 中使用 for 循环和rbind

for (i in file.list) {
    data <- rbind(data, read.xlsx(i, sheetIndex = 1))
}

rbind用法:要垂直连接两个数据框(数据集(,请使用 rbind 函数。两个数据框必须具有相同的变量,但不必具有相同的顺序。

total <- rbind(data frameA, data frameB) 
import pandas as pd
filepath = r"filePath here"
sheets_dict = pd.read_excel(filepath, sheet_name=None)
full_table = pd.DataFrame()
#loop through sheets
for name, sheet in sheets_dict.items():
    sheet['sheet'] = name
    #sheet = sheet.rename(columns=lambda x: x.split('n')[-1])
    full_table = full_table.append (sheet)
full_table.reset_index (inplace=True, drop=True)
#Write to Excel
writer = pd.ExcelWriter('consolidated_TB1.xlsx', engine='xlsxwriter')
full_table.to_excel(writer,'Sheet1')
# Close the Pandas Excel writer and output the Excel file.
writer.save()

相关内容

  • 没有找到相关文章

最新更新