使用从另一个CSV中选择的列创建新的CSV

我试图创建一个脚本，从CSV文件中选择一些列，并将它们保存到另一个(理想情况下指定列标头)。这是我开始的查询，它将复制所有列。如何将其更改为只复制其中的一部分?

# importing openpyxl module
import openpyxl as xl;

# opening the source excel file
filename ="C:\Users\...\input.clv"
wb1 = xl.load_workbook(filename)
ws1 = wb1.worksheets[0]

# opening the destination excel file 
filename1 ="C:\Users\...\output.clv"
wb2 = xl.load_workbook(filename1)
ws2 = wb2.active

# calculate total number of rows and 
# columns in source excel file
mr = ws1.max_row
mc = ws1.max_column

# copying the cell values from source 
# excel file to destination excel file
for i in range (1, mr + 1):
for j in range (1, mc + 1):
# reading cell value from source excel file
c = ws1.cell(row = i, column = j)

# writing the read value to destination excel file
ws2.cell(row = i, column = j).value = c.value

# saving the destination excel file
wb2.save(str(filename1))

提前感谢!

这是我使用Python文件读取/写入的方法。

def readCsv(fileName):
data = []
myFile = open(fileName, "r")
for line in myFile:
lineList = line.split(",")
lineList[len(lineList)-1] = lineList[len(lineList) - 1].replace("n", "")
data.append(lineList)
myFile.close()
return data
def writeCsv(data):
dataString = ""
for line in data:
dataString =dataString + ','.join(line)+"n"
myNewFile = open("output.csv", "w")
myNewFile.write(dataString)
myNewFile.close()
data = readCsv("yourCsv.csv")
# Remove the data you don't need
writeCsv(dataAfterRemovingColumns)

我的readCsv函数生成一个2D List，其中每个项目是CSV文件一行中的数据列表。因此，在我注释了# Remove the data you don't need的地方，您将遍历2D列表，从组成您想要删除的列的一部分的每一行中删除项。希望这能说得通!

可以从stdlib:

#!/usr/bin/env python
import csv
inputCsvFilePath  = 'input.csv'
outputCsvFilePath = 'output.csv'
inputCsvColumnNumbers  = [1,3,5]
outputCsvColumnHeaders = ['one', 'three', 'five']
# reading/writing row by row (high IO, low memory):
with open(inputCsvFilePath) as inputCsv:
inputCsvReader = csv.reader(inputCsv)
with open(outputCsvFilePath, 'w') as outputCsv:
outputCsvWriter = csv.writer(outputCsv)
# write custom csv header:
outputCsvWriter.writerow(outputCsvColumnHeaders)
# skip input file header:
inputCsvReader.__next__()
for inputRow in inputCsvReader:
outputCsvWriter.writerow( [inputRow[i] for i in inputCsvColumnNumbers] )

我个人会使用sqlite:

#!/bin/bash
sqlite3 <<EOF
-- input:
.separator ',' "n"
.import 'input.csv' inputData

-- output:
.mode csv
.header on
.once 'output.csv'
select
user_id  as "one"
, login_id as "three"
, password as "five"
from inputData
;
EOF

根据您将一些列从csv文件保存到另一个csv文件的目的，您可以如下使用pandas库:

import pandas as pd
def save_csv(df, path, cols):
df[cols].to_csv(path, index=False)
with open('path/to/csv', r) as f:
df = pd.read_csv(f)
# Assuming you want to save columns colA and colB
save_csv(df, path/to/dest/csv, ['colA', 'colB'])

您也可以使用csv DictReader, dictwwriter作为另一种方法，它的代码更长，但在时间上更快(基于我的计时):

import csv
def use_csv():
def new_dict(d, cols):
new_dict = {}
for col in cols:
new_dict[col] = d[col]
return new_dict
with open('path/to/csv', 'r') as f:
df = csv.DictReader(f)
with open('path/to/dest/csv', 'w') as csvfile:
fieldnames = ['colA', 'colB']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
for row in df:
data = new_dict(row, fieldnames)
writer.writerow(data)

我发现使用DictReader &DictWriter:

import csv
with open('original.csv', newline='') as originalfile:
reader = csv.DictReader(originalfile)

with open('new.csv', 'w', newline='') as newfile:
fieldnames = ['asset_id', 'treatments']
writer = csv.DictWriter(newfile, fieldnames=fieldnames)
writer.writeheader()
for row in reader:   
writer.writerow({'asset_id':row['Asset ID'], 'treatments':row['Treatments Identified']})

相关内容

最新更新

热门标签：