MySQL 8.0.23在Select查询时出现Segfault而崩溃



我希望有人能对哪里出了问题以及我们如何解决它有更多的想法。我们的MySQL数据库一直崩溃,由于信号11 (Segfault)从操作系统。

服务器的快速概述:

Centos 7

MySQL Server Community Edition 8.0.23 via官方MySQL repository

4核

8 gb RAM

100GB HDD with 85GB free space

我们开始得到崩溃与以下日志:

15:09:23 UTC - mysqld got signal 11 ;
Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware.
Thread pointer: 0x7f678c000d20
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7f6800141c30 thread_stack 0x46000
/usr/sbin/mysqld(my_print_stacktrace(unsigned char const*, unsigned long)+0x3d) [0x21e102d]
/usr/sbin/mysqld(handle_fatal_signal+0x313) [0x1009643]
/lib64/libpthread.so.0(+0xf5d0) [0x7f68181975d0]
/usr/sbin/mysqld() [0x24c7e0e]
/usr/sbin/mysqld(trx_undo_update_rec_get_update(unsigned char const*, dict_index_t const*, unsigned long, unsigned long, unsigned long, unsigned long, trx_t*, mem_block_info_t*, upd_t**, lob::undo_vers_t*, type_cmpl_t&)+0x7bb) [0x24c9d3b]
/usr/sbin/mysqld(trx_undo_prev_version_build(unsigned char const*, mtr_t*, unsigned char const*, dict_index_t const*, unsigned long*, mem_block_info_t*, unsigned char**, mem_block_info_t*, dtuple_t const**, unsigned long, lob::undo_vers_t*)+0x43d) [0x24cb7cd]
/usr/sbin/mysqld(row_vers_build_for_consistent_read(unsigned char const*, mtr_t*, dict_index_t*, unsigned long**, ReadView*, mem_block_info_t**, mem_block_info_t*, unsigned char**, dtuple_t const**, lob::undo_vers_t*)+0x279) [0x2476cb9]
/usr/sbin/mysqld(row_search_mvcc(unsigned char*, page_cur_mode_t, row_prebuilt_t*, unsigned long, unsigned long)+0x2b34) [0x2462c14]
/usr/sbin/mysqld(ha_innobase::index_read(unsigned char*, unsigned char const*, unsigned int, ha_rkey_function)+0x29a) [0x22c7e2a]
/usr/sbin/mysqld(handler::ha_index_read_map(unsigned char*, unsigned char const*, unsigned long, ha_rkey_function)+0x347) [0x11184e7]
/usr/sbin/mysqld() [0xe8c415]
/usr/sbin/mysqld(join_read_const_table(JOIN_TAB*, POSITION*)+0x87) [0xe8c4d7]
/usr/sbin/mysqld(JOIN::extract_func_dependent_tables()+0x4c3) [0xeaff03]
/usr/sbin/mysqld(JOIN::make_join_plan()+0x10e7) [0xec0fb7]
/usr/sbin/mysqld(JOIN::optimize()+0xbcb) [0xec247b]
/usr/sbin/mysqld(SELECT_LEX::optimize(THD*)+0xb6) [0xf1ebe6]
/usr/sbin/mysqld(SELECT_LEX_UNIT::optimize(THD*, TABLE*, bool)+0x7b) [0xf92dab]
/usr/sbin/mysqld(Sql_cmd_dml::execute_inner(THD*)+0x3d) [0xf1d63d]
/usr/sbin/mysqld(Sql_cmd_dml::execute(THD*)+0x525) [0xf272e5]
/usr/sbin/mysqld(mysql_execute_command(THD*, bool)+0x9f0) [0xecb7a0]
/usr/sbin/mysqld(dispatch_sql_command(THD*, Parser_state*)+0x42b) [0xecff7b]
/usr/sbin/mysqld(dispatch_command(THD*, COM_DATA const*, enum_server_command)+0x1ed6) [0xed22f6]
/usr/sbin/mysqld(do_command(THD*)+0x19c) [0xed302c]
/usr/sbin/mysqld() [0xffa6b8]
/usr/sbin/mysqld() [0x2782d4e]
/lib64/libpthread.so.0(+0x7dd5) [0x7f681818fdd5]
/lib64/libc.so.6(clone+0x6d) [0x7f6816575ead]
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (7f678c009a68): SELECT job_project_id AS projectid, user_id_fk AS userid, job_name AS jobname      FROM job WHERE job_id=2463
Connection ID (thread ID): 9
Status: NOT_KILLED
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.

从日志中可以看出,错误的原因是:

SELECT job_project_id AS projectid, user_id_fk AS userid,
job_name AS jobname
FROM job
WHERE job_id=2463

job有以下列:

Columns:
job_id int AI PK 
job_name varchar(50) 
job_status json 
job_fileset_list json 
user_id_fk int 
job_project_id int 
job_start_time datetime 
job_end_time datetime 
job_pipeline json 
job_task_blobs json 
job_exe_blobs json

my.cnf为:

# For advice on how to change settings please see
# http://dev.mysql.com/doc/refman/8.0/en/server-configuration-defaults.html
[mysqld]
#
# Remove leading # and set to the amount of RAM for the most important data
# cache in MySQL. Start at 70% of total RAM for dedicated server, else 10%.
innodb_buffer_pool_size = 5500M
#
# Remove the leading "# " to disable binary logging
# Binary logging captures changes between backups and is enabled by
# default. It's default setting is log_bin=binlog
# disable_log_bin
#
# Remove leading # to set options mainly useful for reporting servers.
# The server defaults are faster for transactions and fast SELECTs.
# Adjust sizes as needed, experiment to find the optimal values.
# join_buffer_size = 128M
# sort_buffer_size = 2M
# read_rnd_buffer_size = 2M
#
# Remove leading # to revert to previous value for default_authentication_plugin,
# this will increase compatibility with older clients. For background, see:
# https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_default_authentication_plugin
# default-authentication-plugin=mysql_native_password
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid

max_connections=500

我们只在上面的查询中看到过这个错误。这种段错误半定期地发生,我们可以运行查询几次,然后它就会出现段错误。我们尝试按顺序运行查询,没有区别。我们试着以1秒的休息时间运行它们,没有区别。

服务器肯定没有内存不足,当段故障出现时,内存足够。

我完全没有主意了,我也找不到有同样问题的人。我发现最接近的是一个bug报告

然而,这是一个老版本的mysql,并没有提供任何分辨率。

如果有人有任何想法,预感,甚至自己有这个问题之前,我将非常感谢您的输入!

所以这并不是一个真正的解决问题的方法。

但是感谢Don,我们已经将原因缩小到最有可能是MySQL本身。我们已经切换到MariaDB,这个问题不再发生了。在可预见的将来,我们将继续使用MariaDB。

最新更新