ubuntu16.04死机,log里反复有NMI watchdog: BUG: soft lockup - CPU#6

系统安装、升级讨论
版面规则
我们都知道新人的确很菜,也喜欢抱怨,并且带有浓厚的Windows习惯,但既然在这里询问,我们就应该有责任帮助他们解决问题,而不是直接泼冷水、简单的否定或发表对解决问题没有任何帮助的帖子。乐于分享,以人为本,这正是Ubuntu的精神所在。
回复
shanshiwu
帖子: 3
注册时间: 2017-09-29 9:53
系统: ubuntu16.04

ubuntu16.04死机,log里反复有NMI watchdog: BUG: soft lockup - CPU#6

#1

帖子 shanshiwu » 2017-09-29 10:55

各位大神:

公司里在一台安装了ubuntu16.04作为编译服务器,电脑型号是联想扬天T4900c-00,配置如下:
内存:7.9G
处理器:Intel(R) Core(TM) I7-4900 cpu@3.60GHZ*8
图形:Gallium 0.4 on NV106
操作系统: ubuntu 16.04 LTS 32位
磁盘:976.0 GB

系统上装了SSH server,samba,git,svn,等等软件,我们是通过putty登录到这台电脑进行软件编译,samba作为文件共享
但是服务器经常出现死机的现象,一天都要死个一两次,非常郁闷,之前系统是WIN7+ubuntu双系统,现在卸掉了win7,重新只装了ubuntu,还是会死机。
死机后,重启电脑后,抓出的log如下:

Sep 28 18:01:06 buildserver NetworkManager[866]: <info> [1506592866.9899] manager: Networking is enabled by state file
Sep 28 18:01:06 buildserver NetworkManager[866]: <info> [1506592866.9900] Loaded device plugin: NMVxlanFactory (internal)
Sep 28 18:01:06 buildserver NetworkManager[866]: <info> [1506592866.9900] Loaded device plugin: NMVlanFactory (internal)
Sep 28 18:01:06 buildserver NetworkManager[866]: <info> [1506592866.9900] Loaded device plugin: NMVethFactory (internal)
Sep 28 18:01:06 buildserver NetworkManager[866]: <info> [1506592866.9900] Loaded device plugin: NMTunFactory (internal)
Sep 28 18:01:06 buildserver NetworkManager[866]: <info> [1506592866.9900] Loaded device plugin: NMMacvlanFactory (internal)
Sep 28 18:01:06 buildserver NetworkManager[866]: <info> [1506592866.9900] Loaded device plugin: NMIPTunnelFactory (internal)
Sep 28 18:01:06 buildserver NetworkManager[866]: <info> [1506592866.9901] Loaded device plugin: NMInfinibandFactory (internal)
Sep 28 18:01:06 buildserver NetworkManager[866]: <info> [1506592866.9901] Loaded device plugin: NMEthernetFactory (internal)
Sep 28 18:01:06 buildserver NetworkManager[866]: <info> [1506592866.9901] Loaded device plugin: NMBridgeFactory (internal)
Sep 28 18:01:06 buildserver NetworkManager[866]: <info> [1506592866.9901] Loaded device plugin: NMBondFactory (internal)
Sep 28 18:01:06 buildserver NetworkManager[866]: <info> [1506592866.9904] Loaded device plugin: NMWifiFactory (/usr/lib/i386-linux-gnu/NetworkManager/libnm-device-plugin-wifi.so)
Sep 28 18:01:06 buildserver NetworkManager[866]: <info> [1506592866.9906] Loaded device plugin: NMAtmManager (/usr/lib/i386-linux-gnu/NetworkManager/libnm-device-plugin-adsl.so)
Sep 28 18:01:07 buildserver NetworkManager[866]: <info> [1506592867.0258] Loaded device plugin: NMBluezManager (/usr/lib/i386-linux-gnu/NetworkManager/libnm-device-plugin-bluetooth.so)
Sep 28 18:01:07 buildserver NetworkManager[866]: <info> [1506592867.0261] Loaded device plugin: NMWwanFactory (/usr/lib/i386-linux-gnu/NetworkManager/libnm-device-plugin-wwan.so)
Sep 28 18:01:07 buildserver NetworkManager[866]: nm_device_get_device_type: assertion 'NM_IS_DEVICE (self)' failed
Sep 28 18:01:07 buildserver NetworkManager[866]: <info> [1506592867.0324] device (lo): link connected
Sep 28 18:01:07 buildserver NetworkManager[866]: <info> [1506592867.0329] manager: (lo): new Generic device (/org/freedesktop/NetworkManager/Devices/0)
Sep 28 18:01:07 buildserver NetworkManager[866]: <info> [1506592867.0348] device (enp5s0): link connected
Sep 28 18:01:07 buildserver NetworkManager[866]: <info> [1506592867.0354] manager: (enp5s0): new Ethernet device (/org/freedesktop/NetworkManager/Devices/1)
Sep 28 18:01:07 buildserver NetworkManager[866]: <info> [1506592867.0358] manager: startup complete
Sep 28 18:01:07 buildserver NetworkManager[866]: <info> [1506592867.0358] manager: NetworkManager state is now CONNECTED_GLOBAL
Sep 28 18:01:07 buildserver NetworkManager[866]: <info> [1506592867.0372] urfkill disappeared from the bus
Sep 28 18:01:07 buildserver NetworkManager[866]: <info> [1506592867.0399] ofono is now available
Sep 28 18:01:07 buildserver NetworkManager[866]: <warn> [1506592867.0402] failed to enumerate oFono devices: GDBus.Error:org.freedesktop.DBus.Error.ServiceUnknown: The name org.ofono was not provided by any .service files
Sep 28 18:01:07 buildserver NetworkManager[866]: <info> [1506592867.0405] ModemManager available in the bus
Sep 28 18:01:16 buildserver NetworkManager[866]: <info> [1506592876.6081] manager: WiFi hardware radio set enabled
Sep 28 18:01:16 buildserver NetworkManager[866]: <info> [1506592876.6081] manager: WWAN hardware radio set enabled
Sep 28 18:35:19 buildserver kernel: [ 2065.183249] INFO: rcu_sched detected stalls on CPUs/tasks:
Sep 28 18:35:19 buildserver kernel: [ 2065.183265] 7-...: (11 GPs behind) idle=8d5/1/0 softirq=19851/19852 fqs=120
Sep 28 18:35:19 buildserver kernel: [ 2065.183266] (detected by 6, t=15002 jiffies, g=10726, c=10725, q=441)
Sep 28 18:35:19 buildserver kernel: [ 2065.183267] Task dump for CPU 7:
Sep 28 18:35:19 buildserver kernel: [ 2065.183268] swapper/7 R running task 0 0 1 0x00000008
Sep 28 18:35:19 buildserver kernel: [ 2065.183269] Call Trace:
Sep 28 18:35:19 buildserver kernel: [ 2065.183272] ? cpuidle_enter_state+0x156/0x350
Sep 28 18:35:19 buildserver kernel: [ 2065.183274] ? cpuidle_enter+0x14/0x20
Sep 28 18:35:19 buildserver kernel: [ 2065.183275] ? call_cpuidle+0x21/0x40
Sep 28 18:35:19 buildserver kernel: [ 2065.183276] ? do_idle+0x164/0x1d0
Sep 28 18:35:19 buildserver kernel: [ 2065.183278] ? cpu_startup_entry+0x6d/0x70
Sep 28 18:35:19 buildserver kernel: [ 2065.183279] ? start_secondary+0x15c/0x1b0
Sep 28 18:35:19 buildserver kernel: [ 2065.183280] ? startup_32_smp+0x16b/0x16d
Sep 28 18:35:19 buildserver kernel: [ 2065.183282] rcu_sched kthread starved for 14762 jiffies! g10726 c10725 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x0
Sep 28 18:35:19 buildserver kernel: [ 2065.183282] rcu_sched R running task 0 7 2 0x00000000
Sep 28 18:35:19 buildserver kernel: [ 2065.183283] Call Trace:
Sep 28 18:35:19 buildserver kernel: [ 2065.183285] __schedule+0x264/0x720
Sep 28 18:35:19 buildserver kernel: [ 2065.183287] ? lock_timer_base+0x67/0x80
Sep 28 18:35:19 buildserver kernel: [ 2065.183296] schedule+0x2e/0x80
Sep 28 18:35:19 buildserver kernel: [ 2065.183298] schedule_timeout+0x198/0x360
Sep 28 18:35:19 buildserver kernel: [ 2065.183299] ? del_timer_sync+0x50/0x50
Sep 28 18:35:19 buildserver kernel: [ 2065.183300] rcu_gp_kthread+0x4ea/0x880
Sep 28 18:35:19 buildserver kernel: [ 2065.183302] kthread+0xdb/0x110
Sep 28 18:35:19 buildserver kernel: [ 2065.183303] ? rcu_note_context_switch+0x100/0x100
Sep 28 18:35:19 buildserver kernel: [ 2065.183303] ? kthread_create_on_node+0x30/0x30
Sep 28 18:35:19 buildserver kernel: [ 2065.183304] ret_from_fork+0x21/0x2c
Sep 28 18:35:46 buildserver kernel: [ 2091.923860] NMI watchdog: BUG: soft lockup - CPU#6 stuck for 23s! [kworker/6:1:132]
Sep 28 18:35:46 buildserver kernel: [ 2091.923861] Modules linked in: snd_hda_codec_hdmi joydev input_leds intel_rapl x86_pkg_temp_thermal intel_powerclamp snd_hda_codec_realtek coretemp snd_hda_codec_generic kvm snd_hda_intel irqbypass snd_hda_codec crc32_pclmul pcbc snd_hda_core snd_hwdep snd_pcm aesni_intel snd_seq_midi aes_i586 snd_seq_midi_event crypto_simd cryptd snd_rawmidi intel_cstate intel_rapl_perf snd_seq snd_seq_device lpc_ich snd_timer snd mei_me mei soundcore shpchp mac_hid parport_pc ppdev lp parport autofs4 hid_generic usbhid hid nouveau mxm_wmi i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ahci drm r8169 libahci mii wmi fjes video
Sep 28 18:35:46 buildserver kernel: [ 2091.923889] CPU: 6 PID: 132 Comm: kworker/6:1 Not tainted 4.10.0-28-generic #32~16.04.2-Ubuntu
Sep 28 18:35:46 buildserver kernel: [ 2091.923890] Hardware name: LENOVO 90ETCTO1WW/ , BIOS FCKT77AUS 12/22/2015
Sep 28 18:35:46 buildserver kernel: [ 2091.923893] Workqueue: events netstamp_clear
Sep 28 18:35:46 buildserver kernel: [ 2091.923894] task: f55c0000 task.stack: f55ca000
Sep 28 18:35:46 buildserver kernel: [ 2091.923896] EIP: smp_call_function_many+0x1ca/0x220
Sep 28 18:35:46 buildserver kernel: [ 2091.923897] EFLAGS: 00000202 CPU: 6
Sep 28 18:35:46 buildserver kernel: [ 2091.923897] EAX: 00000007 EBX: f5df4450 ECX: 00000007 EDX: 00000001
Sep 28 18:35:46 buildserver kernel: [ 2091.923898] ESI: f5dde540 EDI: f5dde544 EBP: f55cbe68 ESP: f55cbe44
Sep 28 18:35:46 buildserver kernel: [ 2091.923898] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Sep 28 18:35:46 buildserver kernel: [ 2091.923899] CR0: 80050033 CR2: bfb28d10 CR3: 2bf31740 CR4: 001406f0
Sep 28 18:35:46 buildserver kernel: [ 2091.923899] Call Trace:
Sep 28 18:35:46 buildserver kernel: [ 2091.923902] ? arch_unregister_cpu+0x20/0x20
Sep 28 18:35:46 buildserver kernel: [ 2091.923904] ? netif_receive_skb_internal+0x19/0x90
Sep 28 18:35:46 buildserver kernel: [ 2091.923905] ? arch_unregister_cpu+0x20/0x20
Sep 28 18:35:46 buildserver kernel: [ 2091.923906] on_each_cpu+0x2a/0x50
Sep 28 18:35:46 buildserver kernel: [ 2091.923917] ? netif_receive_skb_internal+0x19/0x90
Sep 28 18:35:46 buildserver kernel: [ 2091.923918] ? netif_receive_skb_internal+0x1a/0x90
Sep 28 18:35:46 buildserver kernel: [ 2091.923919] text_poke_bp+0x5d/0xd0
Sep 28 18:35:46 buildserver kernel: [ 2091.923920] ? netif_receive_skb_internal+0x19/0x90
Sep 28 18:35:46 buildserver kernel: [ 2091.923922] arch_jump_label_transform+0x83/0x110
Sep 28 18:35:46 buildserver kernel: [ 2091.923923] ? netif_receive_skb_internal+0x1e/0x90
Sep 28 18:35:46 buildserver kernel: [ 2091.923924] __jump_label_update+0x6c/0x80
Sep 28 18:35:46 buildserver kernel: [ 2091.923934] jump_label_update+0x74/0x80
Sep 28 18:35:46 buildserver kernel: [ 2091.923935] static_key_slow_inc+0xad/0xc0
Sep 28 18:35:46 buildserver kernel: [ 2091.923936] static_key_enable+0x1c/0x50
Sep 28 18:35:46 buildserver kernel: [ 2091.923937] netstamp_clear+0x2a/0x40
Sep 28 18:35:46 buildserver kernel: [ 2091.923939] process_one_work+0x121/0x400
Sep 28 18:35:46 buildserver kernel: [ 2091.923940] worker_thread+0x37/0x4b0
Sep 28 18:35:46 buildserver kernel: [ 2091.923941] kthread+0xdb/0x110
Sep 28 18:35:46 buildserver kernel: [ 2091.923942] ? process_one_work+0x400/0x400
Sep 28 18:35:46 buildserver kernel: [ 2091.923943] ? kthread_create_on_node+0x30/0x30
Sep 28 18:35:46 buildserver kernel: [ 2091.923944] ret_from_fork+0x21/0x2c
Sep 28 18:35:46 buildserver kernel: [ 2091.923945] Code: 00 89 f8 e8 29 97 30 00 3b 05 54 ce c3 d9 0f 8d af fe ff ff 8b 1e 03 1c 85 e0 d1 b1 d9 8b 53 0c 83 e2 01 74 0e 8d 74 26 00 f3 90 <8b> 53 0c 83 e2 01 75 f6 0f ae e8 89 f6 eb bf 0f b6 45 e0 89 04
Sep 28 18:36:14 buildserver kernel: [ 2119.924489] NMI watchdog: BUG: soft lockup - CPU#6 stuck for 23s! [kworker/6:1:132]
Sep 28 18:36:14 buildserver kernel: [ 2119.924491] Modules linked in: snd_hda_codec_hdmi joydev input_leds intel_rapl x86_pkg_temp_thermal intel_powerclamp snd_hda_codec_realtek coretemp snd_hda_codec_generic kvm snd_hda_intel irqbypass snd_hda_codec crc32_pclmul pcbc snd_hda_core snd_hwdep snd_pcm aesni_intel snd_seq_midi aes_i586 snd_seq_midi_event crypto_simd cryptd snd_rawmidi intel_cstate intel_rapl_perf snd_seq snd_seq_device lpc_ich snd_timer snd mei_me mei soundcore shpchp mac_hid parport_pc ppdev lp parport autofs4 hid_generic usbhid hid nouveau mxm_wmi i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ahci drm r8169 libahci mii wmi fjes video
Sep 28 18:36:14 buildserver kernel: [ 2119.924521] CPU: 6 PID: 132 Comm: kworker/6:1 Tainted: G L 4.10.0-28-generic #32~16.04.2-Ubuntu
Sep 28 18:36:14 buildserver kernel: [ 2119.924522] Hardware name: LENOVO 90ETCTO1WW/ , BIOS FCKT77AUS 12/22/2015
Sep 28 18:36:14 buildserver kernel: [ 2119.924526] Workqueue: events netstamp_clear
Sep 28 18:36:14 buildserver kernel: [ 2119.924527] task: f55c0000 task.stack: f55ca000
Sep 28 18:36:14 buildserver kernel: [ 2119.924529] EIP: smp_call_function_many+0x1c8/0x220
Sep 28 18:36:14 buildserver kernel: [ 2119.924529] EFLAGS: 00000202 CPU: 6
Sep 28 18:36:14 buildserver kernel: [ 2119.924530] EAX: 00000007 EBX: f5df4450 ECX: 00000007 EDX: 00000001
Sep 28 18:36:14 buildserver kernel: [ 2119.924531] ESI: f5dde540 EDI: f5dde544 EBP: f55cbe68 ESP: f55cbe44
Sep 28 18:36:14 buildserver kernel: [ 2119.924531] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068

昨天9月28号,重装了系统,18:36的时候看起来又挂了,并且在以后的几个小时内一直出现“NMI watchdog: BUG: soft lockup - CPU#6 stuck for 23s! [kworker/6:1:132]”,类似的log.
网上搜索,也没有解决,这个问题困扰我好久了
求各位大神给支个招,谢谢啦

附件里有完整的log
附件
kern.7z
(123.59 KiB) 已下载 72 次
poloshiao
论坛版主
帖子: 18279
注册时间: 2009-08-04 16:33

Re: ubuntu16.04死机,log里反复有NMI watchdog: BUG: soft lockup - CP

#2

帖子 poloshiao » 2017-09-29 12:10

一直出现“NMI watchdog: BUG: soft lockup - CPU#6 stuck for 23s! [kworker/6:1:132]”
1. 換裝個 kernel 版本 試試
1-1. 然後重新啟動
使用新安裝 kernel 版本 登入
1-2. 參見
https://github.com/lxc/lxc/issues/1088
Problem on latest kernel: kernel:NMI watchdog: BUG: soft lockup
1-2-1. https://github.com/lxc/lxc/issues/1088# ... -240370737
Our lockup happened on June 31 and right after that I upgraded the kernel to latest Ubuntu 16.04 kernel.
shanshiwu
帖子: 3
注册时间: 2017-09-29 9:53
系统: ubuntu16.04

Re: ubuntu16.04死机,log里反复有NMI watchdog: BUG: soft lockup - CP

#3

帖子 shanshiwu » 2017-09-29 12:19

poloshiao 写了:
一直出现“NMI watchdog: BUG: soft lockup - CPU#6 stuck for 23s! [kworker/6:1:132]”
1. 換裝個 kernel 版本 試試
1-1. 然後重新啟動
使用新安裝 kernel 版本 登入
1-2. 參見
https://github.com/lxc/lxc/issues/1088
Problem on latest kernel: kernel:NMI watchdog: BUG: soft lockup
1-2-1. https://github.com/lxc/lxc/issues/1088# ... -240370737
Our lockup happened on June 31 and right after that I upgraded the kernel to latest Ubuntu 16.04 kernel.
非常感谢您的回复,我已经装过好几版linux了,14.04,16.04都试过,包括32位 64位的,可都是报类似的错误
poloshiao
论坛版主
帖子: 18279
注册时间: 2009-08-04 16:33

Re: ubuntu16.04死机,log里反复有NMI watchdog: BUG: soft lockup - CP

#4

帖子 poloshiao » 2017-09-29 17:54

參閱
https://bugs.launchpad.net/ubuntu/+sour ... ug/1530405
NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [kerneloops:814]
使用 kworker 搜尋

試試 這個暫時解決方案 是否有效
https://bugs.launchpad.net/ubuntu/+sour ... omments/71
gksudo gedit /etc/sysctl.conf # 需要先安裝 gksu 套件
# 加上這一行
kernel.watchdog_thresh=30
sudo systemctl reboot

還沒有統一的解決方案

有興趣 也可以貼文跟他們交流心得
shanshiwu
帖子: 3
注册时间: 2017-09-29 9:53
系统: ubuntu16.04

Re: ubuntu16.04死机,log里反复有NMI watchdog: BUG: soft lockup - CP

#5

帖子 shanshiwu » 2017-09-30 9:50

poloshiao 写了:參閱
https://bugs.launchpad.net/ubuntu/+sour ... ug/1530405
NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [kerneloops:814]
使用 kworker 搜尋

試試 這個暫時解決方案 是否有效
https://bugs.launchpad.net/ubuntu/+sour ... omments/71
gksudo gedit /etc/sysctl.conf # 需要先安裝 gksu 套件
# 加上這一行
kernel.watchdog_thresh=30
sudo systemctl reboot

還沒有統一的解決方案

有興趣 也可以貼文跟他們交流心得
hi poloshiao:
您好,非常感谢您的回复,我安装了gksu套件,并且将kernel.watchdog_thresh设为30,还是死机了
我再试试装其它版本的kernel试试,谢谢~
回复