Skip to content

mt7925e crashes occasionally in AP mode #977

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
memfissoftware opened this issue May 1, 2025 · 3 comments
Open

mt7925e crashes occasionally in AP mode #977

memfissoftware opened this issue May 1, 2025 · 3 comments

Comments

@memfissoftware
Copy link

memfissoftware commented May 1, 2025

in ap mode mt7925e crashes occasionally.
and there is not much output from dmesg.

i tried to disable ASPM, fastboot but result is same.
trying older and newer fw's did not changed situtations.

when crash occurs i get some output like that from dmesg:
[ 14.010930] mt7925e 0000:01:00.0: enabling device (0000 -> 0002)
[ 14.011227] mt7925e 0000:01:00.0: disabling ASPM L0s L1
[ 14.020518] mt7925e 0000:01:00.0: ASIC revision: 79250000
[ 14.106534] mt7925e 0000:01:00.0: HW/SW Version: 0x8a108a10, Build Time: 20231227093012a
[ 14.445882] mt7925e 0000:01:00.0: WM Firmware Version: ____000000, Build Time: 20231227093232
[ 17.414227] mt7925e 0000:01:00.0 wlan0: entered allmulticast mode
[ 17.414415] mt7925e 0000:01:00.0 wlan0: entered promiscuous mode
[40042.476822] mt7925e 0000:01:00.0: HW/SW Version: 0x8a108a10, Build Time: 20231227093012a
[40042.818661] mt7925e 0000:01:00.0: WM Firmware Version: ____000000, Build Time: 20231227093232

lspci output for this card:
lspci.txt

time to time i see also a timeout error like but not always:
mt7925e 0000:04:00.0: Message 00020003 (seq 15) timeout

interesting thing is that when it crashed i saw always on 'iw dev' normal but in reality card crashed and ssid lost.

@memfissoftware
Copy link
Author

it seems that issue is adding new device. that is the reason we saw only deauth because adding new sta crashes afterwards.
finally i reproduce with kernel panic and get the crash log

<4>[  222.120550] Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
<4>[  222.120585] CPU: 1 UID: 0 PID: 1072 Comm: hostapd Tainted: G        W  O       6.11.0-24-generic #24~24.04.1-Ubuntu
<4>[  222.120643] Tainted: [W]=WARN, [O]=OOT_MODULE
<4>[  222.120667] Hardware name: Default string Default string/Default string, BIOS M6_MAX V0.06 02/19/2025
<4>[  222.120712] RIP: 0010:mt7925_sta_set_decap_offload+0xd3/0x180 [mt7925_common]
<4>[  222.120778] Code: 00 00 00 b8 01 00 00 00 f3 48 0f bc c0 41 89 c6 3c 0e 77 b5 49 8d 87 30 02 00 00 48 89 45 b8 49 8b 87 18 06 00 00 41 0f b6 ce <66> 83 78 98 00 74 6d 48 63 c1 80 f9 0e 77 7b 49 8b 84 c7 a0 05 00
<4>[  222.120858] RSP: 0018:ffff9cb5810f73a8 EFLAGS: 00010293
<4>[  222.120890] RAX: 0000000000000000 RBX: ffff8918508c2020 RCX: 0000000000000000
<4>[  222.120927] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
<4>[  222.120963] RBP: ffff9cb5810f7400 R08: 0000000000000000 R09: 0000000000000000
<4>[  222.120999] R10: 0000000000000000 R11: 0000000000000000 R12: ffff89185feb9d38
<4>[  222.121034] R13: 0000000000000001 R14: 0000000000000000 R15: ffff8918421b8a98
<4>[  222.121070] FS:  00007d8ff3952740(0000) GS:ffff891bafa80000(0000) knlGS:0000000000000000
Oops#1 Part4
<4>[  222.121116] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[  222.121147] CR2: ffffffffffffff98 CR3: 00000001013aa000 CR4: 0000000000f50ef0
<4>[  222.121185] PKRU: 55555554
<4>[  222.121205] Call Trace:
<4>[  222.121224]  <TASK>
<4>[  222.121245]  ? show_regs+0x6c/0x80
<4>[  222.121278]  ? __die+0x24/0x80
<4>[  222.121304]  ? page_fault_oops+0x96/0x1b0
<4>[  222.121336]  ? kernelmode_fixup_or_oops.isra.0+0x69/0x90
<4>[  222.121373]  ? __bad_area_nosemaphore+0x1a1/0x2d0
<4>[  222.121404]  ? radix_tree_lookup+0xd/0x20
<4>[  222.121434]  ? start_flush_work+0x227/0x2e0
<4>[  222.121468]  ? bad_area_nosemaphore+0x16/0x30
<4>[  222.121496]  ? do_kern_addr_fault+0x78/0xa0
<4>[  222.121524]  ? exc_page_fault+0x1b0/0x1c0
<4>[  222.121557]  ? asm_exc_page_fault+0x27/0x30
<4>[  222.121590]  ? mt7925_sta_set_decap_offload+0xd3/0x180 [mt7925_common]
<4>[  222.121647]  ? mt7925_sta_set_decap_offload+0x50/0x180 [mt7925_common]
<4>[  222.121706]  drv_sta_set_decap_offload+0x98/0x1e0 [mac80211]
<4>[  222.122015]  ieee80211_check_fast_rx+0x315/0x420 [mac80211]
<4>[  222.122301]  _sta_info_move_state+0x38e/0x3f0 [mac80211]
<4>[  222.122551]  sta_info_move_state+0x13/0x20 [mac80211]
<4>[  222.122798]  sta_apply_auth_flags.isra.0+0x5a/0x1e0 [mac80211]
<4>[  222.123082]  sta_apply_parameters+0x26c/0x350 [mac80211]
<4>[  222.123362]  ieee80211_add_station+0xde/0x1a0 [mac80211]
<4>[  222.123615]  nl80211_new_station+0x4e3/0x780 [cfg80211]
<4>[  222.123839]  genl_family_rcv_msg_doit+0xf7/0x160
<4>[  222.123873]  genl_family_rcv_msg+0x182/0x250
<4>[  222.123901]  ? __pfx_nl80211_pre_doit+0x10/0x10 [cfg80211]
<4>[  222.124107]  ? __pfx_nl80211_new_station+0x10/0x10 [cfg80211]
Oops#1 Part3
<4>[  222.124314]  ? __pfx_nl80211_post_doit+0x10/0x10 [cfg80211]
<4>[  222.124517]  genl_rcv_msg+0x4c/0xb0
<4>[  222.124538]  ? __pfx_genl_rcv_msg+0x10/0x10
<4>[  222.124561]  netlink_rcv_skb+0x5a/0x110
<4>[  222.124588]  genl_rcv+0x28/0x50
<4>[  222.124606]  netlink_unicast+0x245/0x390
<4>[  222.124633]  netlink_sendmsg+0x213/0x470
<4>[  222.124661]  ____sys_sendmsg+0x3a8/0x410
<4>[  222.124688]  ___sys_sendmsg+0x9a/0xf0
<4>[  222.124718]  __sys_sendmsg+0x89/0xf0
<4>[  222.124742]  __x64_sys_sendmsg+0x1d/0x30
<4>[  222.124765]  x64_sys_call+0x912/0x25f0
<4>[  222.124791]  do_syscall_64+0x7e/0x170
<4>[  222.124816]  ? __sys_setsockopt+0x76/0xe0
<4>[  222.124842]  ? aa_sk_perm+0x46/0x240
<4>[  222.124866]  ? syscall_exit_to_user_mode+0x4e/0x250
<4>[  222.124895]  ? copy_from_sockptr_offset.constprop.0+0x24/0x30
<4>[  222.124924]  ? do_sock_setsockopt+0xbe/0x190
<4>[  222.124950]  ? __sys_setsockopt+0x76/0xe0
<4>[  222.124975]  ? syscall_exit_to_user_mode+0x4e/0x250
<4>[  222.125003]  ? do_syscall_64+0x8a/0x170
<4>[  222.125026]  ? syscall_exit_to_user_mode+0x18d/0x250
<4>[  222.125058]  ? do_syscall_64+0x8a/0x170
<4>[  222.125083]  ? __rseq_handle_notify_resume+0x36/0x70
<4>[  222.125112]  ? irqentry_exit_to_user_mode+0x43/0x250
<4>[  222.126008]  ? irqentry_exit+0x43/0x50
<4>[  222.126852]  ? sysvec_apic_timer_interrupt+0x57/0xc0
<4>[  222.127704]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
<4>[  222.128549] RIP: 0033:0x7d8ff312c004
<4>[  222.129383] Code: 15 19 6e 0d 00 f7 d8 64 89 02 b8 ff ff ff ff eb bf 0f 1f 44 00 00 f3 0f 1e fa 80 3d 45 f0 0d 00 00 74 13 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 4c c3 0f 1f 00 55 48 89 e5 48 83 ec 20 89 55
Oops#1 Part2
<4>[  222.130251] RSP: 002b:00007ffcb5182188 EFLAGS: 00000202 ORIG_RAX: 000000000000002e
<4>[  222.131128] RAX: ffffffffffffffda RBX: 00005632b6a8b4e0 RCX: 00007d8ff312c004
<4>[  222.132006] RDX: 0000000000000000 RSI: 00007ffcb51821c0 RDI: 0000000000000005
<4>[  222.132881] RBP: 00007ffcb51821b0 R08: 0000000000000004 R09: 00000000000000f0
<4>[  222.133745] R10: 00007ffcb51822cc R11: 0000000000000202 R12: 00005632b6ae94c0
<4>[  222.134602] R13: 00005632b6a8b3f0 R14: 00007ffcb51821c0 R15: 00007ffcb51822cc
<4>[  222.135452]  </TASK>
<4>[  222.136280] Modules linked in: cmac ccm snd_sof_pci_intel_tgl snd_sof_pci_intel_cnl snd_sof_intel_hda_generic soundwire_intel soundwire_cadence qrtr snd_sof_intel_hda_common snd_sof_intel_hda_mlink snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp snd_sof x86_pkg_temp_thermal intel_powerclamp snd_hda_codec_hdmi snd_sof_utils snd_soc_hdac_hda aic8800_fdrv(O) snd_soc_acpi_intel_match snd_hda_codec_realtek soundwire_generic_allocation snd_hda_codec_generic snd_soc_acpi snd_hda_scodec_component soundwire_bus snd_soc_avs snd_soc_hda_codec snd_hda_ext_core snd_soc_core coretemp snd_compress ac97_bus snd_pcm_dmaengine kvm_intel snd_hda_intel kvm snd_intel_dspcfg bridge snd_intel_sdw_acpi stp snd_hda_codec crct10dif_pclmul llc polyval_clmulni snd_hda_core polyval_generic snd_hwdep ghash_clmulni_intel sha256_ssse3 mt7925e sha1_ssse3 snd_pcm aesni_intel mt7925_common snd_seq_midi crypto_simd binfmt_misc snd_seq_midi_event mt792x_lib cryptd processor_thermal_device_pci mt76_connac_lib processor_thermal_device snd_rawmidi
Oops#1 Part1
<4>[  222.136317]  ip6table_nat mt76 cmdlinepart i915 ip6_tables processor_thermal_wt_hint spi_nor snd_seq mac80211 xt_conntrack rapl drm_buddy processor_thermal_rfim mtd snd_seq_device cfg80211(O) intel_rapl_msr mei_pxp mei_hdcp snd_timer processor_thermal_rapl nls_iso8859_1 spi_intel_pci ttm i2c_i801 nft_chain_nat snd intel_cstate intel_rapl_common xt_MASQUERADE libarc4 i2c_mux spi_intel drm_display_helper aic_load_fw(O) soundcore processor_thermal_wt_req mei_me i2c_smbus processor_thermal_power_floor mei cec nf_nat processor_thermal_mbox rc_core int340x_thermal_zone i2c_algo_bit igen6_edac nf_conntrack intel_pmc_core nf_defrag_ipv6 intel_vsec pmt_telemetry nf_defrag_ipv4 intel_hid acpi_pad nft_compat pmt_class sparse_keymap acpi_tad nf_tables libcrc32c joydev input_leds mac_hid serio_raw sch_fq_codel msr parport_pc ppdev lp parport efi_pstore nfnetlink dmi_sysfs ip_tables x_tables autofs4 hid_generic rndis_host usbhid uas cdc_ether hid usbnet usb_storage mii sdhci_pci cqhci r8169 ahci intel_ish_ipc xhci_pci crc32_pclmul
<4>[  222.140788]  sdhci realtek libahci intel_ishtp xhci_pci_renesas video wmi pinctrl_alderlake
<4>[  222.146678] CR2: ffffffffffffff98
<4>[  222.147703] ---[ end trace 0000000000000000 ]---

it seems that there is a null pointer somewhere most possibly in mt7925_sta_set_decap_offload()

@konsergg
Copy link

konsergg commented May 9, 2025

The same problem is running in AP mode

Fri May 9 16:39:55 2025 daemon.notice hostapd: phy1-ap0: ACS-COMPLETED freq=5180 channel=36
Fri May 9 16:39:55 2025 daemon.notice hostapd: phy1-ap0: interface state ACS->HT_SCAN
Fri May 9 16:39:56 2025 daemon.notice netifd: Network device 'phy1-ap0' link is up
Fri May 9 16:39:56 2025 kern.info kernel: [ 19.691718] br-wifi: port 2(phy1-ap0) entered blocking state
Fri May 9 16:39:56 2025 kern.info kernel: [ 19.692173] br-wifi: port 2(phy1-ap0) entered forwarding state
Fri May 9 16:39:56 2025 daemon.notice hostapd: phy1-ap0: interface state HT_SCAN->ENABLED
Fri May 9 16:39:56 2025 daemon.notice hostapd: phy1-ap0: AP-ENABLED

it runs from 1 minute to 6 hours and an error occurs

Fri May 9 17:08:59 2025 kern.err kernel: [ 1763.455660] mt7925e 0000:01:00.0: Message 00020002 (seq 12) timeout
Fri May 9 17:09:00 2025 kern.info kernel: [ 1763.548292] mt7925e 0000:01:00.0: HW/SW Version: 0x8a108a10, Build Time: 20250425072955a
Fri May 9 17:09:00 2025 kern.info kernel: [ 1763.548292]
Fri May 9 17:09:00 2025 kern.info kernel: [ 1763.889211] mt7925e 0000:01:00.0: WM Firmware Version: ____000000, Build Time: 20250425073109

But in openwrt, it looks like wifi is running, but no Wifi client sees it. According to the logs, all clients are disconnected

Fri May 9 17:13:55 2025 daemon.notice hostapd: phy1-ap0: AP-STA-DISCONNECTED xxxxxx
Fri May 9 17:13:55 2025 daemon.info hostapd: phy1-ap0: STA xxxxxxx IEEE 802.11: disassociated due to inactivity
Fri May 9 17:13:56 2025 daemon.info hostapd: phy1-ap0: STA xxxxxxx IEEE 802.11: deauthenticated due to inactivity (timer DEAUTH/REMOVE)

@lukasz1992
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants