使用Python在excel中根据列名和数据类型生成随机/伪数据集



有人能建议一种自动在teradata中创建表并根据模式加载随机记录的方法吗?

预期输出应该是基于模式的随机50-100条记录,比如如果数据类型是varchar,那么输出可以是任何随机字符串(比如本例中的crime、doctor、sdfsdc、asdfadf(,类似地,int和任何其他数据类型也可以。

我尝试了一段根据data_type生成数据的代码,但我被标题部分卡住了。如何在column_name的基础上创建输出文件的头?

我尝试的代码


import pandas as pd
import numpy as np
import openpyxl
import random
wb=openpyxl.Workbook()
ws=wb.active
fake_data=Faker()
df=openpyxl.load_workbook('inputcsv.xlsx')
df_sheet=df['inputcsv']
ColNames = {}
Current = 0
for COL in df_sheet.iter_cols(1, df_sheet.max_column):
ColNames[COL[0].value] = Current
Current += 1
counter=0
for row_cells in df_sheet.iter_rows():
if row_cells[ColNames['type']].value == 'int':
for i in range(2,201):
ws.cell(row=i,column=counter).value = random.randint(-10000,10000)
if row_cells[ColNames['type']].value == 'char':
for i in range(2,201):
ws.cell(row=i,column=counter).value = chr(random.randint(65,90))
if row_cells[ColNames['type']].value == 'string':
for i in range(2,201):
ws.cell(row=i,column=counter).value = fake_data.name()
counter+=1
wb.save('testdata.xlsx') ```

My input file looks like:

| col_name| type   |
|:--------|:------:|
| name    | string |
| address | string | 
| city    | char   |
| phonenum| int    |

My output looks like:
| saurabh | adgsdf | C | 4534  |
| eeshan  | sgsdfsd| A | 356356| 
| ravinder| adgadf | G | 45343 |

Can someone please help on getting headers of output file from the input file column col_name?
output should look like this:
| name    | address | city| phonenum|
|---------|---------|-----|---------|
| saurabh | adgsdf  | C   | 4534    |
| eeshan  | sgsdfsd | A   | 356356  | 
| ravinder| adgadf  | G   | 45343   |
import pandas as pd
import numpy as np
import openpyxl
import random
wb=openpyxl.Workbook()
ws=wb.active
fake_data=Faker()
df=openpyxl.load_workbook('inputcsv.xlsx')
df_sheet=df['inputcsv']
ColNames = {}
Current = 0
for COL in df_sheet.iter_cols(1, df_sheet.max_column):
ColNames[COL[0].value] = Current
Current += 1
counter=0
for row_cells in df_sheet.iter_rows():
// add this:
if counter!=0:
ws.cell(row=1,column=counter).value = row_cells[0].value
if row_cells[ColNames['type']].value == 'int':
for i in range(2,201):
ws.cell(row=i,column=counter).value = random.randint(-10000,10000)
if row_cells[ColNames['type']].value == 'char':
for i in range(2,201):
ws.cell(row=i,column=counter).value = chr(random.randint(65,90))
if row_cells[ColNames['type']].value == 'string':
for i in range(2,201):
ws.cell(row=i,column=counter).value = fake_data.name()
counter+=1
wb.save('testdata.xlsx')

我的输入文件看起来像:

城市
col_nametype
namestring
地址string
phonenumint

相关内容

  • 没有找到相关文章

最新更新