有人能建议一种自动在teradata中创建表并根据模式加载随机记录的方法吗?
预期输出应该是基于模式的随机50-100条记录,比如如果数据类型是varchar,那么输出可以是任何随机字符串(比如本例中的crime、doctor、sdfsdc、asdfadf(,类似地,int和任何其他数据类型也可以。
我尝试了一段根据data_type生成数据的代码,但我被标题部分卡住了。如何在column_name的基础上创建输出文件的头?
我尝试的代码
import pandas as pd
import numpy as np
import openpyxl
import random
wb=openpyxl.Workbook()
ws=wb.active
fake_data=Faker()
df=openpyxl.load_workbook('inputcsv.xlsx')
df_sheet=df['inputcsv']
ColNames = {}
Current = 0
for COL in df_sheet.iter_cols(1, df_sheet.max_column):
ColNames[COL[0].value] = Current
Current += 1
counter=0
for row_cells in df_sheet.iter_rows():
if row_cells[ColNames['type']].value == 'int':
for i in range(2,201):
ws.cell(row=i,column=counter).value = random.randint(-10000,10000)
if row_cells[ColNames['type']].value == 'char':
for i in range(2,201):
ws.cell(row=i,column=counter).value = chr(random.randint(65,90))
if row_cells[ColNames['type']].value == 'string':
for i in range(2,201):
ws.cell(row=i,column=counter).value = fake_data.name()
counter+=1
wb.save('testdata.xlsx') ```
My input file looks like:
| col_name| type |
|:--------|:------:|
| name | string |
| address | string |
| city | char |
| phonenum| int |
My output looks like:
| saurabh | adgsdf | C | 4534 |
| eeshan | sgsdfsd| A | 356356|
| ravinder| adgadf | G | 45343 |
Can someone please help on getting headers of output file from the input file column col_name?
output should look like this:
| name | address | city| phonenum|
|---------|---------|-----|---------|
| saurabh | adgsdf | C | 4534 |
| eeshan | sgsdfsd | A | 356356 |
| ravinder| adgadf | G | 45343 |
import pandas as pd
import numpy as np
import openpyxl
import random
wb=openpyxl.Workbook()
ws=wb.active
fake_data=Faker()
df=openpyxl.load_workbook('inputcsv.xlsx')
df_sheet=df['inputcsv']
ColNames = {}
Current = 0
for COL in df_sheet.iter_cols(1, df_sheet.max_column):
ColNames[COL[0].value] = Current
Current += 1
counter=0
for row_cells in df_sheet.iter_rows():
// add this:
if counter!=0:
ws.cell(row=1,column=counter).value = row_cells[0].value
if row_cells[ColNames['type']].value == 'int':
for i in range(2,201):
ws.cell(row=i,column=counter).value = random.randint(-10000,10000)
if row_cells[ColNames['type']].value == 'char':
for i in range(2,201):
ws.cell(row=i,column=counter).value = chr(random.randint(65,90))
if row_cells[ColNames['type']].value == 'string':
for i in range(2,201):
ws.cell(row=i,column=counter).value = fake_data.name()
counter+=1
wb.save('testdata.xlsx')
我的输入文件看起来像:
col_name | type |
---|---|
name | string |
地址 | string |
phonenum | int |