通过简短的电子表格迭代并将匹配值复制到大型电子表格中

我正在与pandas合作，尝试使用设备列表，device_master.xlsx，在我的主电子表格中自动填充许多列，nevened_billing.xlsx。

我可以阅读两张纸的内容，并且可以对我的主电子表格进行其他转换，但是我是新手它可以/填充我的主表中所需的列。

到目前为止

import numpy as np
import pandas as pd
import csv
#import re
# import the list of devices with work order numbers and project codes:
devmaster_xls = pd.read_excel('device_master.xlsx', 'device master', header = [0], index_col = None)
print('Device Master sheet columns:', devmaster_xls.columns, 'n') #debug, check the columns are right

# import the billing information which will need transforming with work order/ project codes:
data_xls = pd.read_excel('DetailedBilling.xlsx', 'Sheet1', header = [0], index_col = None)
print('Billing sheet columns read in:', data_xls.columns, 'n') #debug, check the columns before
data_xls.insert(13, 'WO Ref', '')
data_xls.insert(14, 'WO Description', '')
data_xls.insert(15, 'Project Code', '')
print('Billing sheet columns with WO additions:', 
data_xls.columns, 'n') #debug, check the columns after
wait = input("Press enter to continue...")

# magic sauce to add work order and cost tracking goes in here
# magic sauce to add work order and cost tracking goes in here
#


data_writer = pd.ExcelWriter('DetailedBilling_edit.xlsx', engine = 'xlsxwriter')
data_xls.to_excel(data_writer, index = False)
#defining the book/sheet to work with
workbook = data_writer.book
worksheet = data_writer.sheets['Sheet1']
# formatting changes
worksheet.set_zoom(85)
server_fmt = workbook.add_format({'font_color': '#800000', 'bold': True})
dollar_fmt = workbook.add_format({'num_format': """_($* #,##0.00_);_($* -#,##0.00;_($* "0.00"_);_(@_)""", 'bold': True})
bold_fmt = workbook.add_format({'bold': True})
worksheet.set_column('A:A', 34, server_fmt)
worksheet.set_column('B:B', 85)
worksheet.set_column('F:F', 28)
worksheet.set_column('G:G', 9)
worksheet.set_column('H:K', 11, dollar_fmt)
worksheet.set_column('L:P', 12.5)
worksheet.set_column('O:O', 85)
#
#what size is this sheet? 
count_row = len(data_xls.index)
#count_row = data_xls.shape[0]
print("Total rows: ", count_row, 'n')
data_writer.save()

我想说的是：

data_xls['WO Ref'].loc[(data_xls['Server'] = devmaster_xls['Device Name'])] = devmaster_xls['WO Ref']

我尝试将其放在循环中，但还没有远远 - 任何帮助都将不胜感激！

编辑：感谢@frankyjuang，我得到了正确的数据，这很棒，但是由于某些原因，我无法将其写入电子表格中。我这样做：

>>> for index, row in data_xls.iterrows(): 
... rowdata = devmaster_xls.loc[devmaster_xls['Device Name'] == row['Server']]
... print(index, rowdata['WO Ref'])

看起来不错，返回类似：

555 19    REF###
Name: WO Ref, dtype: object
556 19    REF###
Name: WO Ref, dtype: object
557 19    REF###
Name: WO Ref, dtype: object
558 19    REF###
Name: WO Ref, dtype: object
559 19    REF###
Name: WO Ref, dtype: object
560 19    REF###
Name: WO Ref, dtype: object
561 19    REF###
Name: WO Ref, dtype: object
562 19    REF###
Name: WO Ref, dtype: object
563 19    REF###
Name: WO Ref, dtype: object

我试图插入以下内容：

>>> for index, row in data_xls.iterrows(): 
... rowdata = devmaster_xls.loc[devmaster_xls['Device Name'] == row['Server']] 
... row['WO Ref'] = rowdata['WO Ref']

，但print(data_xls['WO Ref'])显示行是nan。

迭代 data_xls中的行

for index, row in data_xls.iterrows():
    row['WO Ref']    # Get the data in this way

通过

找到相应的行

devmaster_xls.loc[devmaster_xls['Device Name'] == some_value]

组合它们

for index, row in data_xls.iterrows():
    the_row_you_want = devmaster_xls.loc[devmaster_xls['Device Name'] == row['WO Ref']]
    # do the operations you want

注意：

如果您想做很多次，首先制作索引更有效，然后使用.loc

devmaster_xls = devmaster_xls.set_index(['Device Name'])
devmaster_xls.loc[row['WO Ref']]

更新：

请注意rowdata仍然是只有一行的小数据框架。为了获得其值，您不能直接执行rowdata['COLUMN']。而是通过iloc[0]

删除单行

row['WO Ref'] = rowdata.iloc[0]['WO Ref']

或，在rowdata = ...之后附加iloc[0]

rowdata = devmaster_xls.loc[devmaster_xls['Device Name'] == row['Server']].loc[0]

由于我的经验不足，我在@frankyjuang提供的答案中挣扎 - 我无法满足自己所获得的结果，而不是想实现的目标。因此，经过更多的研究，我提出了以下解决方案，这解决了问题：

首先，我们需要用共享密钥为两个电子表格索引。在这种情况下，它是Servernames，以servername1.com，servername2.com等格式，等等。但是 - 我不想永久更改我的数据框，因此，将创建一个可以用作索引的新列。

。

此复制服务器列，将其转换为小写，以考虑任何情况不匹配的任何情况，并将其设置为索引：

data_xls['Serverindex'] = data_xls['Server'].str.lower() 
data_xls.set_index('Serverindex', inplace = True)

抓住我的设备主表：

devmaster_xls = pd.read_excel('device_master.xlsx', 'device master', header = [0], index_col = None)

和上述，从现有列创建索引，将其转换为小写：

devmaster_xls['Devindex'] = devmaster_xls['Device Name'].str.lower() 
devmaster_xls.set_index('Devindex', inplace = True)

然后将相关数据从设备主表复制到主表中很简单：

data_xls.loc[:,'WO Ref'] = devmaster_xls.loc[:,'WO Ref'] 
data_xls.loc[:,'WO Description'] = devmaster_xls.loc[:,'WO Description'] 
data_xls.loc[:,'Project Code'] = devmaster_xls.loc[:,'Project code']

最后，我们不想写出该索引，所以：

data_xls = data_xls.reset_index(drop = True)
devmaster_xls = devmaster_xls.reset_index(drop = True)

如果这种方法确实是不好的练习，我很感兴趣地找出原因，以及我可以做些什么来改进它。但是它确实解决了问题并很快实施！

注意：

更新：

相关内容

最新更新

热门标签：