通过与另一个数据帧进行比较来过滤一个数据帧



我对使用Python非常陌生,并且一直在编写一个脚本,该脚本将允许我下载csv,清理数据并创建可以正确导入shopify商店的输出文件。

到目前为止,我已经成功地完成了这个任务,生成了一个每日导入文件,其中包含要导入到我的商店的相关数据。

为了进一步增强代码,我想做的是将输出文件过滤为仅包含已更改的行。换句话说,如果一个产品的成本或价格发生了变化,或者它脱离了预购状态,我希望将其包括在内,但是如果产品列表和最新库存提要中的产品行相同,则应该将其排除在导入之外。

最近几天我一直在做这件事,我想我离成功更近了,但我仍然不能迈出最后一步。

我已经得到的点,我已经创建了两个dataframe的比较:df_mergeddf_prod_filtered.

我从脚本中生成了以下用于故障排除建议:

Columns:  ['ID', 'Handle', 'Variant Compare At Price', 'Template Suffix', 'Variant Inventory Policy', 'Variant Cost']
Index column of df_merged:  None
Number of rows:  1802
Column types:
ID                            int64
Handle                       object
Variant Compare At Price    float64
Template Suffix              object
Variant Inventory Policy     object
Variant Cost                float64
dtype: object
df_prod_filtered
Columns:  ['ID', 'Handle', 'Variant Compare At Price', 'Template Suffix', 'Variant Inventory Policy', 'Variant Cost']
Index column of df_prod_filtered:  None
Number of rows:  1802
Column types:
ID                            int64
Handle                       object
Variant Compare At Price    float64
Template Suffix              object
Variant Inventory Policy     object
Variant Cost                float64
dtype: object 

很明显,dataframe的结构相同,并且具有相同的行数,在删除"Discontinued"之后将函数移动到物品。

我最近的尝试是使用掩码比较两个dataframe。

# create a mask by comparing the two dataframes based on their index (ID column)
mask = df_merged.eq(df_prod_filtered.loc[df_merged.index])
# create a new column in df_merged that shows if there is a difference or not
df_merged['Diff'] = ~mask.all(axis=1)

但是,这导致所有的行都被标记为相同的(任何单元格中没有差异)。

为了确认情况并非如此,我手动编辑了df_prod_filtered中的几个单元格,并重复了该过程,得到了另一个没有明显差异的列表。

所以我又被困在这里了。我需要比较两个dataframe,现在称为df_mergeddf_prod_filtered,使用ID列作为索引。我需要合并或生成一个新的数据框,它只包含df_merged数据不同于df_prod_filtered中的数据的行,它需要从df_merged获取新的行数据.

例如,如果df_merged有以下内容:

ID,Handle,Variant Compare At Price,Template Suffix,Variant Inventory Policy,Variant Cost
6548971520023,128194,49.99,,deny,32.49
6548974206999,128356,9.99,,deny,6.49

df_prod_filtered

ID,Handle,Variant Compare At Price,Template Suffix,Variant Inventory Policy,Variant Cost
6548971520023,128194,49.99,,deny,32.49
6548974206999,128356,8.99,,deny,6.29

New DataFrame应该只包含

ID,Handle,Variant Compare At Price,Template Suffix,Variant Inventory Policy,Variant Cost
6548974206999,128356,9.99,,deny,6.49

现在考虑到我对Python和Pandas的新手性质,我的问题可能在脚本的其他地方,例如如何df_prod_filtereddf_merged是生成的。这是完整的脚本,链接编辑以供参考:

import pandas as pd
import datetime
import requests
import re
import os
# Set filename as current date
now = datetime.datetime.now()
filename = now.strftime("%B%d") + '.csv'
# Define a function to remove the word 'Brick' from the title
def remove_brick(title):
if 'Booster' in title and 'Brick' in title:
return title.replace('Brick', '').strip()
else:
return title

# Clear existing file        
if os.path.exists('stockfeed.csv'):
os.remove('stockfeed.csv')

# Download todays stockfeed.
url = '<CSV URL Goes HERE>'
response = requests.get(url)
if response.status_code == 200:
with open('stockfeed.csv', 'wb') as f:
f.write(response.content)
else:
print('Failed to download CSV file')
# Load the CSV file into a pandas DataFrame
df_a = pd.read_csv('stockfeed.csv')
# Keep only the required columns
df_a = df_a[['item_number', 'name', 'image_path', 'rrp', 'description', 'barcode', 'manufacturer_sku', 'availability', 'publisher', 'price_ex_gst', 'item_group', 'board_game_genre', 'weight_kg', 'game_family']]
# Rename the columns
df_a = df_a.rename(columns={'item_number': 'Handle', 'name': 'Title','image_path': 'Image Src', 'description': 'Body HTML', 'manufacturer_sku': 'Variant SKU','barcode': 'Variant Barcode', 'weight_kg': 'Variant Weight', 'price_ex_gst': 'Variant Cost', 'publisher': 'Vendor'})
# Filter the rows based on criteria
df_a = df_a[(df_a['availability'] == 'Pre-Order')
& (~df_a['item_group'].isin(['Board Games', 'CCG', 'Puzzles']))
& (~df_a['game_family'].isin(['Traveller', 'Keyforge', 'SLA Industries', 'Successors', 'Dungeon Crawl', 'Judge Dredd', 'KULT RPG', 'War of the Ring', 'Rocketmen', 'Great Wyrms of Draka', 'Epic Card Game', '13th Age RPG', 'A Game of Thrones A Song of Ice and Fire', 'A Song of Ice and Fire', 'Achtung Cthulhu', 'Achtung Cthulhu 2d20', 'Achtung! Cthulhu Miniatures', 'Adventures & Academia', 'Alien RPG', 'Ashen Stars RPG', 'Avatar Legends', 'Battletech', 'BeyBlade', 'Bicycle', 'Black Void RPG', 'Blue Rose RPG', 'Boss Monster', 'Call to Adventure', 'Castles and Crusades RPG', 'Conan RPG', 'Corolis RPG', 'Cypher', 'Dark Souls', 'Digimon Card Game', 'Doctor Who', 'Divinity', 'Disney', 'Dungeon Crawl Classics', 'Dungeonology', 'Elder Scrolls Call to Arms', 'Fallout RPG', 'Fallout Wasteland Warfare', 'Fantasy AGE', 'Fear Itself RPG', 'Fire & Stone', 'Folklore', 'Forbidden Lands RPG', 'Forbidden Lands', 'Fragged Empire RPG', 'G.I. Joe', 'GameMastery', 'Galaxy Defenders', 'Gatekeeper Dice', 'Halfsies Dice', 'Hero Realms', 'Heroclix', 'Homeworld Revelations RPG', 'Hunter: The Reckoning', 'Infinity RPG', 'John Carter of Mars RPG', 'Jack Vance RPG', 'Invisible Sun', 'Kem Arrow', 'Kids on Bikes', 'Knights of the Round', 'Kobolds Ate My Baby', 'Lamentations RPG', 'Last Aurora', 'Liminal RPG', 'Lord of the Rings RPG', 'Masks', 'Metamorphosis Alpha', 'Modern Age RPG', 'Mutant City Blues RPG', 'Mork Borg RPG', 'Mutant Crawl Classics', 'Mutant Year Zero RPG', 'Mutants & Masterminds', 'My Little Pony', 'Mythos RPG', 'Nerf', 'Night's Black Agents RPG', 'Numenera', 'Odyssey of the Dragonlords RPG', 'One Piece', 'Original Adventures Reincarnated', 'Overlight', 'Paladins of the Western Kingdoms', 'Paradox Initiative', 'Pasion de las Pasiones', 'Pirate Borg', 'Planegea RPG', 'PolyHero', 'Power Rangers', 'Robin Laws RPG', 'Rocketmen', 'Ruins of Symbaroum', 'Ruins of Symbaroum RPG', 'Shadowrun', 'Shadows of Brimstone', 'Sorcerer', 'Sorcerer's Arena', 'Spirograph', 'Sprue Wave 2', 'Star Realms', 'Star Trek Adventures', 'Star Wars X Wing', 'Starfinder', 'Stargate SG-1 RPG', 'Symbaroum RPG', 'Symbaroum', 'Tales from the Loop RPG', 'Symbaroum RPG - Thistle Hold', 'Tally Ho', 'The Art of', 'The Borellus', 'The Esoterrorists RPG', 'The Excellents RPG', 'The Fantasy Trip', 'The Lost Citadel RPG', 'The Spy Game RPG', 'The One Ring RPG', 'The Strange', 'The Yellow King RPG', 'Things from the Flood RPG', 'Trail of Cthulhu RPG', 'TimeWatch RPG', 'Transformers', 'Tripods & Triplanes', 'Twilight 2000', 'U-Boot', 'Upzone', 'Vaesen Nordic Horror', 'Vornheim RPG The Complete City', 'Vurt RPG', 'Warhammer Fantasy Roleplay', 'World of Tanks', 'World War Cthulhu', 'Yggdrasil']))
& (~df_a['Image Src'].isin(['https://letsplaygames.com.au/media//catalog/product/placeholder/default/Placeholder_Image-_LPG_Transparent.png']))
& (~df_a['Title'].str.contains('Coriolis|Power Rangers|Starfinder|Counter|Dice Cups|Dice Bag|Marvel|Homeworld Revelations|LUGU|Card Game|Essence20 Roleplaying System|Infinity Collectible|Class Deck|Castle Falkenstein|Transformers|LPG|Stand with Paints|G.I. Joe|Haunted West|Playing Cards|Metal Coasters|Teenagers From Outer Space|Mekton Zeta|Core Fuzion|Display|Poster|Everyday Heroes RPG', case=False))
& (~df_a['Vendor'].isin(['Wizards of the Coast', 'Arcane Tinmen', 'Steve Jackson Games', 'Ultra Pro', 'Akora Cards', 'Word Forge Games', 'Ultimate Guard', 'The Op', 'Studio 9 Games', 'Steamforged Games', 'Rebellion Unplugged', 'Plaid Hat Games', 'Pinfinity', 'Pelgrane Press', 'Monte Cook Games', 'Lynnvander Studios', 'Mantic Games', 'Loke BattleMats', 'Lamentations of the Flame', 'Goodman Games', 'Green Ronin Publishing', 'Funko', 'Gamelyn Games', 'Fantasy Flight Games', 'Edge Studios', 'Darrington Press', 'CMON', 'Chaosium', 'Battle Systems', 'Black Site Studios', 'Bandai', 'Atomic Overmind Press', 'Atomic Mass Games', 'Atlas Games', 'Archon Studio', 'Archon Games', '9th Level Games', 'Modiphius Entertainment', 'Troll Lord Games', '','Two Little Mice', 'Dark Horse Books']))]
df_a = df_a[~((df_a['Vendor'].isna()) & (df_a['Variant SKU'].str.contains('BPG0')))]
# Generate New Columns
df_a['Variant Compare At Price'] = round(df_a['rrp'] / 1.1, 4)
df_a['Variant Price'] = df_a['Variant Compare At Price']
df_a['Template Suffix'] = df_a['availability'].apply(lambda x: 'pre-order' if x == 'Pre-Order' else '')
df_a['Variant Inventory Policy'] = df_a['availability'].apply(lambda x: 'continue' if x == 'Pre-Order' else 'deny')
df_a['Type'] = ''
df_a['Tags'] = ''
df_a['Variant Weight Unit'] = 'kg'
df_a['Variant Taxable'] = 'TRUE'
df_a['Variant Requires Shipping'] = 'TRUE'
df_a['Variant Inventory Tracker'] = 'shopify'
# Set Product Keywords
rpgkey = ['Sourcebook', 'Kobold Press', 'Adventure Path', 'Campaign Setting', 'Pathfinder First Edition', 'Pathfinder Second Edition', 'Cyberpunk 2020', 'Cyberpunk RED','Vampire: The Masquarade']
paintkey = ['AK Interactive Auxiliaries', 'Primers', 'Metallics', 'Tones', 'Primer', 'Paint Thinner', 'Acrylic', 'Acrylics', 'Paint Stripper', 'Paint Set', 'Speedpaint', '3Gen Sets', 'Dual Exo Set', 'Pigments', 'Aerosol', 'Colour Set', 'Color Set']
miniaturekey = ['D&D Classic Collection', 'Icons of the Realms', 'Action Figure', 'Miniatures','Idols of the Realms', 'Miniature', 'Pathfinder Battles', 'D&D Frameworks', 'TinkerTurf', 'Wizkids Dungeon Dressings', 'Deep Cuts']
colectablekey = ['Medallion', 'Life-Sized', 'Replicas of the Realms', 'Trophy', 'Plaque', 'Dicelings', 'D&D Cartoon Classics', 'Collectibles','Collectible', 'Collectables', 'Collectable', 'Statue', 'Plush']
accessorykey = ['Mat', 'Playmat', 'Paintbrush', 'Brush', 'Tool', 'Tray', 'Palette', 'Glue', 'Battlemat', 'Megamat', 'Brush Set', 'Insert', 'Drybrush', 'Folio', 'Pathfinder Accessories', 'Flip-Mat', 'Dry Brushes', 'Marker', 'Sheets', 'Brushes Set'] 
dicekey = ['7-Die', 'D20', 'D12', 'D10', 'D100', 'D4', 'd6', 'Dice Set', 'Dice'] 
pbkey = ['Brush', 'PaintBrush']
monkey = ['Monument Hobbies']
mlkey = ['ml']
# Set Product Types 
df_a.loc[df_a['Title'].str.contains('|'.join([rf'b{re.escape(kw)}b' for kw in paintkey]), case=False), 'Type'] = 'Paint'
df_a.loc[df_a['Title'].str.contains('|'.join([rf'b{re.escape(kw)}b' for kw in rpgkey]), case=False), 'Type'] = 'RPG Book'
df_a.loc[df_a['Title'].str.contains('|'.join([rf'b{re.escape(kw)}b' for kw in miniaturekey]), case=False), 'Type'] = 'Miniature'
df_a.loc[df_a['Title'].str.contains('|'.join([rf'b{re.escape(kw)}b' for kw in colectablekey]), case=False), 'Type'] = 'Collectable'
df_a.loc[df_a['Title'].str.contains('|'.join([rf'b{re.escape(kw)}b' for kw in accessorykey]), case=False), 'Type'] = 'Accessory'
df_a.loc[df_a['Title'].str.contains('|'.join([rf'b{re.escape(kw)}b' for kw in dicekey]), case=False), 'Type'] = 'Dice'
df_a.loc[(df_a['Title'].str.contains('Critical Role')) & (df_a['Title'].str.contains('Boxed Set')), 'Type'] = 'Miniature'
df_a.loc[df_a['Vendor'] == 'Monument Hobbies', 'Type'] = 'Accessory'
df_a.loc[df_a['Title'].str.contains('|'.join(mlkey), case=False), 'Type'] = 'Paint'
# Remove invalid product types. 
df_a = df_a.dropna(subset=['Type'])
#Define paint colors
paint_colors =  ['Black', 'Blue', 'Brown', 'Green', 'Grey', 'Orange', 'Purple', 'Red', 'Yellow', 'Wood', 'Marble', 'Ashes', 'Dirt', 'Earth', 'Soil', 'Dust', 'Flesh', 'Topaz', 'Rust', 'Acid', 'Magic', 'Arcane', 'Frost', 'Blood', 'Moss', 'Varnish', 'Wash', 'Thinner', 'Satin', 'Umber', 'Sepia', 'Violet', 'Ink', 'Fluorescent', 'Magenta', 'White', 'Skin', 'Brass', 'Gold', 'Bronze', 'Copper', 'Mystic', 'Gloomy', 'Deep', 'Martian', 'Nuclear', 'Imperial', 'Plague', 'Space', 'Lotus', 'Cardinal', 'Velvet', 'Plasma', 'Magma', 'Lava', 'Metal', 'Metallic', 'Stone', 'Neutral', 'Concrete', 'Glacier', 'Glue', 'Crystal', 'Antishine', 'Enamel']
# define functions to split tags and generate new tags column
def split_tags(tags):
return [tag.strip() for tag in re.findall(r'"[^"]+"|w+', tags)]
def generate_tags(row):
tags = []
if row['Type'] == 'Paint' and ('Set' in row['Title'] or 'Sets' in row['Title'] or 'Briefcase' in row['Title'] or 'Case' in row['Title']):
tags.append('Paint Set')
if row['Type'] == 'Miniature' and 'Booster' in row['Title']:
tags.append('Booster')
if row['Type'] == 'Miniature' and 'Icons of the Realm' in row['Title']:
tags.append('Painted')
tags.append('Icons of the Realm')
if 'D&D' in row['Title'] or 'Dungeons & Dragons' in row['Title']:
tags.append('D&D')
if 'Aerosol' in row ['Title'] or 'Spray' in row ['Title']:
tags.append('Aerosol')
if 'Critical Role' in row['Title']:
tags.append('Critical Role')   
if 'Cyberpunk RED' in row['Title']:
tags.append('Cyberpunk RED')  
if 'Cyberpunk 2020' in row['Title']:
tags.append('Cyberpunk 2020')  
if 'Cyberpunk' in row['Title']:
tags.append('Cyberpunk')          
if 'Honor Among Thieves' in row['Title']:
tags.append('Honor Among Thieves')  
if 'Painted' in row['Title']:
tags.append('Painted')  
if 'Unpainted' in row['Title'] or 'Collectors Series' in row ['Title'] or 'Nolzurs Marvelous Miniatures' in row ['Title'] or 'Deep Cuts' in row ['Title']:
tags.append('Unpainted')  
if 'Vampire: The Masquerade' in row['Title']:
tags.append('VTM')  
if 'Nolzurs Marvelous' in row['Title']:
tags.append('Nolzurs Marvelous')  
if 'Pre-Order' in row['availability']:
tags.append('Pre-Order')  
if 'd6' in row['Title'] and 'Block' in row ['Title']:
tags.append('D6 Set')
if 'Dice Set 7' in row['Title'] or '7-Die Set' in row ['Title']:
tags.append('RPG Set')
if 'battlemat' in row['Title'] or 'playmat' in row['Title'] or'megamat' in row['Title'] or'flip mat' in row['Title'] or' mat' in row['Title'] or 'Flip-mat' in row['Title'] or 'Hobby Mat' in row['Title'] or 'Hobby Mat' in row['Title']:
tags.append('Pre-Order')
if 'Drybrush' in row['Title'] or 'Paintbrush' in row['Title']:
tags.append('Paint Brush')         
if 'Brush Set' in row['Title'] or 'Brushes' in row['Title']:
tags.append('Paint Brush')
tags.append('Brush Set')
if 'Pathfinder' in row['Title']:
tags.append('Pathfinder')
if 'Warhammer' in row['Title']:
tags.append('Warhammer')
if 'Idols of the Realm' in row['Title']:
tags.append('2D')
tags.append('Idols of the Realm')
if 'Replicas of the Realm' in row['Title']:
tags.append('Replicas of the Realm')
tags.append('Collectable')
if 'Exandria' in row['Title']:
tags.append('Exandria')
if 'Dungeon Dressings' in row['Title']:
tags.append('Dungeon Dressings')
if 'Portal 2' in row['Title']:
tags.append('Portal 2')
if row['Type'] == 'Paint':
# Check for color in title
for color in paint_colors:
if color in row['Title']:
tags.append(color)
# Check for color in description
for color in paint_colors:
if color in row['Body HTML']:
tags.append(color)
return ",".join(tags)
df_a['Tags'] = df_a.apply(generate_tags, axis=1)
# Remove unwanted columns
df_a = df_a.drop(['rrp', 'availability'], axis=1)
# Drop rows with missing values in the "Type" column
df_a = df_a.dropna(subset=['Type'])
# Import the new CSV as df_b
df_b = pd.read_csv('Products.csv')
# Select only the 'Handle' column from df_b
df_b = df_b[['Handle']]
# Convert Handle column to string
df_a['Handle'] = df_a['Handle'].astype(str)
df_b['Handle'] = df_b['Handle'].astype(str)
# Add "-single" to the handle for rows with "Booster" or "Brick" in the title
mask = df_a['Title'].str.contains('Booster Brick')
mask &= ~df_a['Handle'].str.endswith('-single') # Exclude rows with '-single' already present in the Handle
df_a.loc[mask, 'Handle'] = df_a.loc[mask, 'Handle'].astype(str) + '-single'
# Divide Prices of Boosters to single amounts. 
df_a.loc[df_a['Title'].str.contains('Booster|Brick'), ['Variant Price', 'Variant Compare At Price', 'Variant Cost']] /= 8
# Filter out rows where 'Handle' value is already in df_b
df_a = df_a[~df_a['Handle'].isin(df_b['Handle'])]

# Reorder the columns
df_a = df_a[['Handle','Title','Body HTML','Vendor','Type','Tags','Template Suffix','Variant SKU','Variant Barcode','Variant Weight','Variant Weight Unit','Variant Price','Variant Compare At Price','Variant Taxable','Variant Inventory Policy','Variant Requires Shipping','Variant Cost','Image Src']]

#Export New Products List
df_a.to_csv('NewProducts' + filename, index=False)
# Creating the Update Product List
# Load the CSV files into a pandas DataFrame
df_a = pd.read_csv('stockfeed.csv')
df_b = pd.read_csv('Products.csv')[['ID', 'Handle']]
df_prod = pd.read_csv('Products.csv')[['ID', 'Handle', 'Variant Compare At Price', 'Template Suffix', 'Variant Inventory Policy', 'Variant Cost']]
# Keep only the required columns
df_a = df_a[['item_number', 'rrp', 'availability', 'price_ex_gst']]
# Rename the columns
df_a = df_a.rename(columns={'item_number': 'Handle', 'price_ex_gst': 'Variant Cost'})
# Create new columns
df_a['Variant Compare At Price'] = round(df_a['rrp'] / 1.1, 2)
df_a['Template Suffix'] = df_a['availability'].apply(lambda x: 'pre-order' if x == 'Pre-Order' else '')
df_a['Variant Inventory Policy'] = df_a['availability'].apply(lambda x: 'continue' if x == 'Pre-Order' else 'deny')
# Remove unwanted columns
df_a = df_a.drop(['rrp', 'availability'], axis=1)
# Reorder the columns
df_a = df_a[['Handle', 'Variant Compare At Price', 'Template Suffix', 'Variant Inventory Policy', 'Variant Cost']]
# Convert Handle column in CSV A to string
df_a['Handle'] = df_a['Handle'].astype(str)
# Merge CSV files on Handle column
df_merged = pd.merge(df_a, df_b, on='Handle')
# Reorder the columns
df_merged = df_merged[['ID', 'Handle', 'Variant Compare At Price', 'Template Suffix', 'Variant Inventory Policy', 'Variant Cost']]
# Round Variant Compare At Price to 2 decimal places
df_merged['Variant Compare At Price'] = df_merged['Variant Compare At Price'].round(2)
# Export merged CSV file
df_merged.to_csv('ChangedProducts' + filename, index=False)

# Read in the Handle columns from df_b and df_merged
df_c = df_b[['Handle']]
df_d = df_merged[['Handle']]
# Identify discontinued Handles
df_discon = df_c[~df_c['Handle'].isin(df_d['Handle'])]

# Export discontinued Handles to CSV
df_discon.to_csv('UnmatchedProducts' + filename, index=False)
# start filter test 
# create a list of the handles in df_discon
discon_handles = df_discon['Handle'].tolist()
# remove rows from df_prod that have handles in discon_handles
df_prod_filtered = df_prod[~df_prod['Handle'].isin(discon_handles)]
df_prod_filtered = df_prod_filtered.reset_index(drop=True)
# Print column headings, number of rows, and column types for df_merged
print("df_merged")
print("Columns: ", df_merged.columns.tolist())
print("Index column of df_merged: ", df_merged.index.name)
print("Number of rows: ", len(df_merged))
print("Column types: n", df_merged.dtypes)
# Print column headings, number of rows, and column types for df_prod_filtered
print("ndf_prod_filtered")
print("Columns: ", df_prod_filtered.columns.tolist())
print("Index column of df_prod_filtered: ", df_prod_filtered.index.name)
print("Number of rows: ", len(df_prod_filtered))
print("Column types: n", df_prod_filtered.dtypes)

# create a mask by comparing the two dataframes based on their index (ID column)
mask = df_merged.eq(df_prod_filtered.loc[df_merged.index])
# create a new column in df_merged that shows if there is a difference or not
df_merged['Diff'] = ~mask.all(axis=1)
#print(df_merged)
# Print only rows where Diff is False and a string
#print(df_merged[df_merged["Diff"] == 'False'])
# Print only rows where Diff is False as a boolean
##df_merged = df_merged[df_merged["Diff"] != True]
#print(df_merged)
# end Filter test

Ok!我想了一下,结果发现可能我已经尝试过的许多方法都可以解决这个问题。

我遇到的问题是,在没有意识到的情况下,熊猫认为NaN值与其他NaN值不同。

通过将NaN值替换为占位符字符串,我能够仅使用更新的条目创建数据帧,然后再次使用numpy将占位符替换为NaN值。

我还没有将新代码插入到整个脚本中,但是这个问题的具体目标/问题已经解决了,作为参考,这里是我在测试和解决这个问题时创建的脚本:

import pandas as pd
import numpy as np
# Load the UpdateProducts and base product list dataframes
df_upd = pd.read_csv('ChangedProducts.csv')
df_prod = pd.read_csv('Products.csv')
df_prod = df_prod[['ID', 'Variant Compare At Price', 'Template Suffix', 'Variant Inventory Policy', 'Variant Cost']]
df_upd = df_upd[['ID', 'Variant Compare At Price', 'Template Suffix', 'Variant Inventory Policy', 'Variant Cost']]
# set index to ID column
df_prod.set_index('ID', inplace=True)
df_upd.set_index('ID', inplace=True)
# replace nan values with placeholder
df_upd = df_upd.fillna('EMPTYCELLPLACEHOLDER')
df_prod = df_prod.fillna('EMPTYCELLPLACEHOLDER')
# compare the two dataframes element-wise
comparison = df_upd.eq(df_prod)
# get the rows where all values are True
all_same_rows = comparison.all(axis=1)
# get only the rows that are different
different_rows = df_upd[~all_same_rows]
# replace the placeholder
different_rows = different_rows.replace('EMPTYCELLPLACEHOLDER', np.nan)
# save the different rows to a new CSV file
different_rows.to_csv('DifferentRows.csv', index=True)

最新更新