预先感谢您的任何指导。我正在尝试使用scikit-learn通过逻辑回归进行分类,其中x是截然的,一个字段是一系列称为Heartrate的Heartrate数据。基于研究其他人也面临此错误的其他人,我确保漏斗阵列的形状/大小相同。
它在sklearn/utils/veration.py Line 382中获取值错误,在check_array中,在该行中的check_array中,通过array = np.Array(array,dtype = dtype,order ofers = order = order = order = dyper = dype = dype = dype = dype = dype ==复制)。我怀疑我的阵列在内存中不连续,这就是构成问题的原因,但不确定...
这是一些代码剪辑,可以帮助解决问题:
def get_training_set(self):
training_set = []
after_date = datetime.utcnow() - timedelta(weeks=8)
before_date = datetime.utcnow() - timedelta(hours=12)
activities = self.strava_client.get_activities(after=after_date, before=before_date)
for act in activities:
if act.has_heartrate:
streams = self.strava_client.get_activity_streams(activity_id=act.id, types=['heartrate'])
heartrate = np.array(list(filter(lambda x: x is not None, streams['heartrate'].data)))
fixed_heartrate = np.pad(heartrate, (0, 15000 - len(heartrate)), 'constant')
item = {'activity_type': self.classes.index(act.type),'heartrate': fixed_heartrate}
training_set.append(item)
return pd.DataFrame(training_set)
def train(self):
df = self.get_training_set()
df['Intercept'] = np.ones((len(df),))
y = df[['activity_type']]
X = df[['Intercept', 'heartrate']]
y = np.ravel(y)
#
model = LogisticRegression()
self.debug('y={}'.format(y))
model = model.fit(X,y)
例外发生在fit ...
中事先感谢您的任何指导。
尊重,
Mike
从评论中复制以改进格式:
/python3.5/site-packages/sklearn/linear_model/logistic.py", line 1173, in
fit order="C")
File "/python3.5/site-packages/sklearn/utils/validation.py", line 521, in
check_X_y ensure_min_features, warn_on_dtype, estimator)
File "/lib/python3.5/site-packages/sklearn/utils/validation.py", line 382, in
check_array array = np.array(array, dtype=dtype, order=order, copy=copy)
ValueError: setting an array element with a sequence
和其他评论:
x和y看起来像这样:
X.shape=(29, 2)
y.shape=(29,)
X=[[1 array([74, 74, 77, ..., 0, 0, 0])]
[1 array([66, 67, 69, ..., 0, 0, 0])]
...
[1 array([92, 92, 91, ..., 0, 0, 0])]
[1 array([79, 79, 79, ..., 0, 0, 0])]]
y=[ 0 11 11 0 1 0 11 0 11 1 0 11 0 0 11 0 0 0 0 0 11 0 11 0 0 0 11 0 0]
如果您更改train(),那么事情会更好吗?
def train(self):
df = self.get_training_set()
df['Intercept'] = 1 # (a)
y = df['activity_type'].values # (b)
X = [np.concatenate(( np.array([col1]), col2 )) for col1, col2 in df[['Intercept', 'heartrate']].values.T]
model = LogisticRegression()
model.fit(X,y) # (c)
(a)将生成正确长度的序列
(b)使用值返回numpy数组而不是另一个数据框架
(c)拟合在Inplophe