这是我的dataframe:
Name Job
A Back-end Engineer
B Front-end Engineer;Product Manager
C Product Manager;Business Development;System Analyst
我想将这些数据框架转换为假人(一个热编码(:
Name Back-end Engineer Business Development Front-end Engineer Product Manager System Analyst
A 1 0 0 0 0
B 0 0 1 1 0
C 0 1 0 1 0
我尝试使用pandas.get_dummies,但由于变量是多变量而失败的。
您可以尝试这样的事情:
import pandas as pd
from collections import defaultdict
df = pd.read_csv("path/to/your.csv")
jobs = df["Job"]
job_list = set()
for job in jobs:
job_names = job.split(";")
for job_name in job_names:
job_list.add(job_name)
new_df = defaultdict(list)
for index, row in df.iterrows():
new_df["Name"].append(row["Name"])
for job in job_list:
if job in row["Job"]:
new_df[job].append(1)
else:
new_df[job].append(0)
new_df = pd.DataFrame.from_dict(new_df)
new_df.to_csv("/path/to/new.csv")