使用Python,我想创建一个循环,当一行包含文本时在CSV文件中写入文本。
原始CSV格式为:
user_id, text
0,
1,
2,
3, sample text
4, sample text
我正在寻找添加另一列"text_number"这将插入字符串"text_x",其中x表示列中的文本数。我想迭代这个,并为每个新文本增加字符串的值+1。最终产品看起来像:
user_id, Text, text_number
0,
1,
2,
3, sample text, text_0
4, sample text, text_1
用我的工作代码,我可以插入标题"text_number",但我有困难放在一起的循环为text_x。
import csv
output = list()
with open("test.csv") as file:
csv_reader = csv.reader(file)
for i, row in enumerate(csv_reader):
if i == 0:
output = [row+["text_number"]]
continue
# here's where I'm stuck
with open("output2.csv", "w", newline="") as file:
csv_writer = csv.writer(file, delimiter=",")
for row in output:
csv_writer.writerow(row)
任何想法吗?
在注释中查找描述
# asuming the file
# user_id,text
# 0,
# 1,
# 2,
# 3,sample text
# 4,sample text
# 5,
# 6,sample text
# import the library
import pandas as pd
df = pd.read_csv('test.csv').fillna('')
# creating column text_number initializing with ''
df['text_number'] = ''
# getting the index where text is valid
index = df.loc[df['text'].str.strip().astype(bool)].index
# finally creating the column text_number with increment as 0, 1, 2 ...
df.loc[index, 'text_number'] = [f'text_{i}' for i in range(len(index))]
print(df)
# save it to disk
df.to_csv('output2.csv')
# user_id text text_number
# 0 0
# 1 1
# 2 2
# 3 3 sample text text_0
# 4 4 sample text text_1
# 5 5
# 6 6 sample text text_2
您可以尝试对第一部分进行以下修改:
output = list()
with open("test.csv") as file:
csv_reader = csv.reader(file)
output.append(next(csv_reader) + ['text_number'])
text_no = 0
for row in csv_reader:
if row[1].strip():
row.append(f'text_{text_no}')
text_no += 1
output.append(row)
你可以试试:
import csv
output = list()
x=0
with open("test.csv") as file:
csv_reader = csv.reader(file)
for i, row in enumerate(csv_reader):
row[1]=row[1].strip()
if i == 0:
row.append("text_number")
else:
if row[1]=="":
row.append(" ")
else:
row.append(f"text_{x}")
x+=1
output.append(row)
with open("output2.csv", "w", newline="") as file:
csv_writer = csv.writer(file, delimiter=",")
for row in output:
csv_writer.writerow(row)
我没有更改您的代码中应该更改的任何内容。我我只是adding
, new element
, row
, every iteration
。和append
,每一个row
在output
,为制造新的list of row
。
如果你对pandas
很满意,那么你也可以试试这个:
import pandas as pd
df=pd.read_csv("test.csv")
r=[]
x=0
for i in range(df.shape[0]):
if df[" text"][i].strip()=="":
r.append(f" ")
else:
r.append(f"text_{x}")
x+=1
df["text_number"]=r
print(df)
"""
user_id text text_number
0 0
1 1
2 2
3 3 sample text text_0
4 4 sample text text_1
"""
pd.to_csv("output2.csv")
这是text_number
列的列表。