用合成数据创建一个简单的csv-Python



我正在学习python和机器学习,并试图从合成数据创建一个非常简单的csv。有人能帮我调整一下,让它在PyCharm中工作吗?我试图从每一列中的选择中输入一个随机值。非常感谢


import random
import pandas as pd

marriage_status = {'single', 'married', 'divorced', 'widowed', 'complicated'}
children = {'yes', 'no'}
employment = {'employed', 'self_employed', 'unemployed', 'student'}
income_abroad = {'yes', 'no'}
gender = {'M', 'F'}
response = {'refund', 'payment'}
columns = ['marriage_status', 'children', 'employment',
'income_abroad', 'age', 'gender', 'income', 'expenses', 'response']
df = pd.DataFrame(columns=columns)
for i in range(1000):
marriage_status = random.choice(list(marriage_status))
children = random.choice(list(children))
employment = random.choice(list(employment))
income_abroad = random.choice(list(income_abroad))
gender = random.choice(list(gender))
response = random.choice(list(response))
age = random.randint(18, 70)
income = random.randint(0, 100000)
expenses = random.randint(0, 10000)
df = [marriage_status, children, employment, income_abroad, age, gender, income, expenses, response]
df[6].to_csv('taxfix_data.csv')
index = False

如果你要使用panda,最简单的方法就是这样做

import pandas as pd 
df = pd.DataFrame(
{"marriage_status" : ['single' ,'married', 'divorced', 'widowed', 'complicated],
"children" : ['yes', 'no'],
"employment" : ['employed', 'self_employed', 'unemployed', 'student'],
"gender" : ['M', 'F'],
"response" : ['refund', 'payment'],
"income_abroad" : ['yes', 'no']}
index = [1, 2, 3])

这里还有一个非常有用的大熊猫备忘单https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf

最新更新