我正在使用这个csv文件。这是一个笔记本电脑信息的小数据集。
laptops = pd.read_csv('laptops.csv',encoding="Latin-1")
laptops["Operating System"].value_counts()
Windows 1125
No OS 66
Linux 62
Chrome OS 27
macOS 13
Mac OS 8
Android 2
Name: Operating System, dtype: int64
我想把macOS和Mac OS的变化合并到一个值"macOS"
我已经试过了,这个有效。
mapping_dict = {
'Android': 'Android',
'Chrome OS': 'Chrome OS',
'Linux': 'Linux',
'Mac OS': 'macOS',
'No OS': 'No OS',
'Windows': 'Windows',
'macOS': 'macOS'
}
laptops["Operating System"] = laptops["Operating System"].map(mapping_dict)
laptops["Operating System"].value_counts()
Windows 1125
No OS 66
Linux 62
Chrome OS 27
macOS 21
Android 2
Name: Operating System, dtype: int64
这是唯一的方法还是最好的方法?假设这样的需求可能出现在多个值(而不仅仅是macOS)。
laptops['Operating System'] = laptops['Operating System'].str.replace(r'(?i)(mac|mc).*os', 'macOS', regex=True)
这段代码可以达到目的。但你必须提前知道可能的变体。如果事先不可能知道它们,这将是另一个不在python和pandas标签下讨论的问题。
df['Operating System'][df['Operating System'].str.lower().isin(['mac', 'osx', 'macos'])] = 'Mac OS'
你可以直接做
laptops['Operating System'] = laptops['Operating System'].replace('Mac OS', 'macOS')
我会这样做:
# Generate a dict of list, where each key is the name you want
# to assign and the lists contain the variations of the main name
aliases = {
"macOS": ["mac", "osx", "Mac OS"],
"Windows": ["win", "windows", "Windows"],
}
# Create a map so it's easier to lookup all the names
aliases_map = {v: k for k, v in aliases.items() for v in v}
# Replace all of the aliases with its respective main name
laptops["Operating System"] = laptops["Operating System"].replace(aliases_map)
laptops["Operating System"].value_counts()
输出:
Windows 1125
No OS 66
Linux 62
Chrome OS 27
macOS 21
Android 2
Name: Operating System, dtype: int64