用熊猫读取、计算和分组多个文件的数据

我正在尝试制作一个小脚本来自动化我的工作。我有大量的文本文件，需要将它们分组到一个大的数据帧中进行绘制。

文件具有类似的一般结构

5.013130280 4258.0
5.039390845 4198.0
...         ...
49.944957015 858.0
49.971217580 833.0

我想做的是

保留第一列作为最终数据帧的列(因为这些值对于所有文件都是相同的(
数据帧的其余部分只是提取每个文件的第二列，对其进行规范化并将所有内容分组在一起
使用文件名作为提取列(从点到(的标题，以便在打印数据时使用

对，我只能做第2步，这是代码

import os
import pandas as pd
import glob
path = "mypath"
extension = 'xy'
os.chdir(path)
dir = os.listdir(path)
files = glob.glob(path + "/*.xy")
li = []
for file in files:
df = pd.read_csv(file, names=('angle','int'), delim_whitespace=True)
df['int_n']=data['int']/data['int'].max()
li_norm.append(df['int_n'])

norm_files = pd.concat(li_norm, axis = 1)

那么，有什么方法可以简单地解决这个问题吗？

假设所有文件的长度(行数(和角度值完全相同，那么就不需要制作一堆数据帧并将它们连接在一起。

如果我理解正确的话，你只需要一个最终的数据帧，每个文件(用文件名命名(都有一个新的列，包含"int"数据，并使用仅来自特定文件的所有值进行规范化

在第一个文件中，您可以创建一个数据帧作为最终输出，然后在每个后续文件中添加列

for idx,file in enumerate(files):
df = pd.read_csv(file, names=('angle','int'), delim_whitespace=True)
filename = file.split('\')[-1][:-3] #get filename from splitting full path and removing last 3 characters (file extension)
df[filename]=df['int']/df['int'].max()    #use the filename itself as the new column name
if idx == 0:    #create norm_files output dataframe on first file
norm_files = df[['angle',file]]
else:           #add column to norm_files for all subsequent files
norm_files[file] = df[file]

您可以非常简单地添加一个计算列，尽管我不确定这是否是您所要求的。

for file in files:
df = pd.read_csv(file, names=('angle','int'), delim_whitespace=True)
df[file.split('.')[0]]=data['int']/data['int'].max()
li_norm.append(df['int_n'])

相关内容

最新更新

热门标签：