Discussion:
Cubox-i: Kernel-Oops: Unable to handle kernel NULL pointer dereference at virtual address 00000000
Add Reply
Rainer Dorsch
2025-02-09 15:20:01 UTC
Reply
Permalink
Hello,

during reboot of Cubox-i with stable kernel 6.1.0-29-armmp, I got a kernel
Oops (though the reboot did complete eventually):

[2406987.476525] 8<--- cut here ---
[2406987.479798] Unable to handle kernel NULL pointer dereference at virtual
address 00000000
[2406987.488157] [00000000] *pgd=00000000
[2406987.491976] Internal error: Oops: 5 [#1] SMP ARM
[2406987.496795] Modules linked in: ip6t_REJECT nf_reject_ipv6 xt_comment
ip6_tables xt_recent ipt_REJECT nf_reject_ipv4 xt_conntrack xt_hashlimit
xt_addrtype xt_mark nft_chain_nat xt_MASQUERADE xt_CT xt_tcpudp nft_compat
xt_NFLOG nfnetlink_log xt_LOG nf_log_syslog nf_nat_tftp nf_nat_snmp_basic
nf_conntrack_snmp nf_nat_sip nf_nat_pptp nf_nat_irc nf_nat_h323 nf_nat_ftp
nf_nat_amanda ts_kmp nf_conntrack_amanda nf_nat nf_conntrack_sane
nf_conntrack_tftp nf_conntrack_sip nf_conntrack_pptp nf_conntrack_netlink
nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_irc
nf_conntrack_h323 nf_conntrack_ftp nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4
rpcsec_gss_krb5 nfsv4 dns_resolver nfs nf_tables libcrc32c fscache netfs
nfnetlink zram(-) zsmalloc binfmt_misc caam_jr caamhash_desc caamalg_desc
crypto_engine authenc libdes dw_hdmi_ahb_audio dw_hdmi_cec brcmfmac evdev
brcmutil imx6_media_csi(C) v4l2_fwnode ftdi_sio ch341 cfg80211 usbserial
rfkill snd_soc_imx_spdif caam error video_mux coda_vpu
[2406987.497296] imx_thermal snd_soc_fsl_spdif snd_soc_fsl_utils
imx6_media(C) dw_hdmi_imx dw_hdmi imx_pcm_dma drm_display_helper
imx_media_common(C) v4l2_jpeg imx_vdoa snd_soc_core v4l2_mem2mem imx2_wdt
videobuf2_dma_contig snd_pcm_dmaengine v4l2_async videobuf2_vmalloc
videobuf2_memops videobuf2_v4l2 videobuf2_common snd_pcm snd_timer videodev
imxdrm cec snd etnaviv drm_dma_helper mc gpu_sched soundcore drm_kms_helper
imx_ipu_v3 gpio_ir_recv rc_core leds_pwm imx6q_cpufreq 8021q garp mrp stp llc
nfsd auth_rpcgss nfs_acl lockd fuse loop drm grace dm_mod configfs sunrpc
ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 crc32c_generic at803x
ci_hdrc_imx ci_hdrc ulpi fec selftests ahci_imx roles of_mdio libahci_platform
ehci_hcd fixed_phy libahci fwnode_mdio udc_core libphy phy_generic
nvmem_imx_ocotp usbcore sdhci_esdhc_imx i2c_imx sdhci_pltfm cqhci mux_mmio
mux_core libata sdhci usbmisc_imx scsi_mod scsi_common anatop_regulator
phy_mxs_usb pwm_imx27 gpio_mxc
[2406987.669806] CPU: 0 PID: 9106 Comm: rmmod Tainted: G C
6.1.0-29-armmp #1 Debian 6.1.123-1
[2406987.679578] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
[2406987.686300] PC is at zcomp_cpu_dead+0x14/0x58 [zram]
[2406987.691486] LR is at cpuhp_invoke_callback+0xd4/0x6fc
[2406987.696745] pc : [<bf6c52b4>] lr : [<c034bac0>] psr: 60070013
[2406987.703204] sp : f0bbde10 ip : c142394c fp : 00000000
[2406987.708621] r10: 2da6a000 r9 : eed93350 r8 : bf6c52a0
[2406987.714034] r7 : 00000008 r6 : 00000044 r5 : c1329350 r4 : 00000000
[2406987.720753] r3 : c140b750 r2 : 00000001 r1 : 00000008 r0 : 00000000
[2406987.727472] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment
none
[2406987.734806] Control: 10c5387d Table: 16fec04a DAC: 00000051
[2406987.740742] Register r0 information: NULL pointer
[2406987.745645] Register r1 information: non-paged memory
[2406987.750892] Register r2 information: non-paged memory
[2406987.756137] Register r3 information: non-slab/vmalloc memory
[2406987.761993] Register r4 information: NULL pointer
[2406987.766890] Register r5 information: non-slab/vmalloc memory
[2406987.772746] Register r6 information: non-paged memory
[2406987.777993] Register r7 information: non-paged memory
[2406987.783237] Register r8 information: 7-page vmalloc region starting at
0xbf6c5000 allocated at load_module+0xa70/0x2148
[2406987.794250] Register r9 information: non-slab/vmalloc memory
[2406987.800107] Register r10 information: non-paged memory
[2406987.805440] Register r11 information: NULL pointer
[2406987.810423] Register r12 information: non-slab/vmalloc memory
[2406987.816364] Process rmmod (pid: 9106, stack limit = 0x589ba9ab)
[2406987.822481] Stack: (0xf0bbde10 to 0xf0bbe000)
[2406987.827035] de00: 00000000 c1329350
00000044 c034bac0
[2406987.835414] de20: 00000002 c0d29540 c142394c 00000000 f0bbdea4 f0bbdea4
c7102040 f0bbde3c
[2406987.843789] de40: f0bbde3c b877c437 00000000 00000000 00000000 c140b210
00000008 c1329350
[2406987.852166] de60: c14232e0 c140b0a8 00000004 c034c81c 00000000 c0d28b54
00000000 00000044
[2406987.860542] de80: c140b210 00000008 c1329350 c034cb00 00000000 c60e1e00
00000000 00000000
[2406987.868917] dea0: c140b750 c60e1e10 c2081540 bf6c5318 00000004 bf6c6d58
00000000 c60e1e00
[2406987.877294] dec0: 00000000 00000000 bf6c6fb4 bf6ca03c c7102040 00000081
0138f138 bf6c6ec0
[2406987.885671] dee0: c518c734 00000000 00000000 bf6c6fc8 c518c734 c0cf3f28
00000000 00000040
[2406987.894047] df00: 00000000 c518c720 c1975c4c b877c437 bf6ca480 bf6ca03c
00000000 c7102040
[2406987.902423] df20: c03002f0 bf6c8a94 bf6ca240 00000800 00000000 c03eecf4
00000006 00000000
[2406987.910799] df40: 00000000 00000000 00000000 00000000 6d61727a 00000000
00000000 00000000
[2406987.919172] df60: 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000
[2406987.927548] df80: 00000000 00000000 00000000 b877c437 0138f138 0138e190
0138f138 00000000
[2406987.935925] dfa0: 00000081 c03000c0 0138e190 0138f138 0138f174 00000800
00000000 00000000
[2406987.944300] dfc0: 0138e190 0138f138 00000000 00000081 beb75eeb 00000001
00000002 0138f138
[2406987.952675] dfe0: 0043fe10 beb75b8c 0041f0a5 b6c48168 00010030 0138f174
00000000 00000000
[2406987.961101] zcomp_cpu_dead [zram] from cpuhp_invoke_callback+0xd4/0x6fc
[2406987.968039] cpuhp_invoke_callback from cpuhp_issue_call+0x54/0x1b4
[2406987.974523] cpuhp_issue_call from
__cpuhp_state_remove_instance+0xf8/0x1b4
[2406987.981702] __cpuhp_state_remove_instance from zcomp_destroy+0x20/0x34
[zram]
[2406987.989153] zcomp_destroy [zram] from zram_reset_device+0x114/0x170
[zram]
[2406987.996345] zram_reset_device [zram] from zram_remove+0x10c/0x120 [zram]
[2406988.003358] zram_remove [zram] from zram_remove_cb+0x14/0x5c [zram]
[2406988.009941] zram_remove_cb [zram] from idr_for_each+0x5c/0x108
[2406988.016084] idr_for_each from destroy_devices+0x38/0x68 [zram]
[2406988.022240] destroy_devices [zram] from sys_delete_module+0x194/0x320
[2406988.028990] sys_delete_module from ret_fast_syscall+0x0/0x1c
[2406988.034943] Exception stack(0xf0bbdfa8 to 0xf0bbdff0)
[2406988.040190] dfa0: 0138e190 0138f138 0138f174 00000800
00000000 00000000
[2406988.048572] dfc0: 0138e190 0138f138 00000000 00000081 beb75eeb 00000001
00000002 0138f138
[2406988.056945] dfe0: 0043fe10 beb75b8c 0041f0a5 b6c48168
[2406988.062194] Code: e52de004 e28dd004 e30b3750 e34c3140 (e5114008)
[2406988.069040] ---[ end trace 0000000000000000 ]---

Any idea or hint what could cause this is welcome.

Thanks
Rainer
--
Rainer Dorsch
http://bokomoko.de/
Arnd Bergmann
2025-02-11 07:30:01 UTC
Reply
Permalink
Post by Rainer Dorsch
during reboot of Cubox-i with stable kernel 6.1.0-29-armmp, I got a kernel
Hi Rainer,
Post by Rainer Dorsch
[2406987.476525] 8<--- cut here ---
[2406987.479798] Unable to handle kernel NULL pointer dereference at virtual
address 00000000
A NULL pointer was dereferenced, which in this case is almost
certainly a logic bug in kernel code.
Post by Rainer Dorsch
[2406987.669806] CPU: 0 PID: 9106 Comm: rmmod Tainted: G C
6.1.0-29-armmp #1 Debian 6.1.123-1
[2406987.679578] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
[2406987.686300] PC is at zcomp_cpu_dead+0x14/0x58 [zram]
[2406987.691486] LR is at cpuhp_invoke_callback+0xd4/0x6fc
You can get the exact code location by running the oops through
'addr2line', but the function is fairly short.
Post by Rainer Dorsch
[2406987.816364] Process rmmod (pid: 9106, stack limit = 0x589ba9ab)
This happened while unloading a module
Post by Rainer Dorsch
[2406987.961101] zcomp_cpu_dead [zram] from cpuhp_invoke_callback+0xd4/0x6fc
[2406987.968039] cpuhp_invoke_callback from cpuhp_issue_call+0x54/0x1b4
[2406987.974523] cpuhp_issue_call from
__cpuhp_state_remove_instance+0xf8/0x1b4
[2406987.981702] __cpuhp_state_remove_instance from zcomp_destroy+0x20/0x34
[zram]
[2406987.989153] zcomp_destroy [zram] from zram_reset_device+0x114/0x170
[zram]
[2406987.996345] zram_reset_device [zram] from zram_remove+0x10c/0x120 [zram]
[2406988.003358] zram_remove [zram] from zram_remove_cb+0x14/0x5c [zram]
[2406988.009941] zram_remove_cb [zram] from idr_for_each+0x5c/0x108
[2406988.016084] idr_for_each from destroy_devices+0x38/0x68 [zram]
[2406988.022240] destroy_devices [zram] from sys_delete_module+0x194/0x320
[2406988.028990] sys_delete_module from ret_fast_syscall+0x0/0x1c
This is the entire backtrace, showing that only the zram module
was involved.

Linux-6.1 is fairly old, and this file has changed a bit between
that and 6.13, though none of the changes here immediately point
to a NULL pointer dereference:

b8f03cb703a1 zram: move immutable comp params away from per-CPU context
6a81bdfeb350 zram: introduce zcomp_ctx structure
52c7b4e2ba50 zram: introduce zcomp_req structure
f2bac7ad187d zram: introduce zcomp_params structure
1a78390d8760 zram: check that backends array has at least one backend
1d3100cf148d zram: add 842 compression backend support
84112e314f69 zram: add zlib compression backend support
73e7d81abbc8 zram: add zstd compression backend support
c60a4ef54446 zram: add lz4hc compression backend support
22d651c3b339 zram: add lz4 compression backend support
2152247c55b6 zram: add lzo and lzorle compression backends support
917a59e81c34 zram: introduce custom comp backends API
45866e0e214f zram: do not allocate physically contiguous strm buffers
7ac07a26dea7 zram: preparation for multi-zcomp support

This is the code in question (from 6.13):

static void zcomp_strm_free(struct zcomp *comp, struct zcomp_strm *zstrm)
{
comp->ops->destroy_ctx(&zstrm->ctx);
vfree(zstrm->buffer);
zstrm->buffer = NULL;
}
int zcomp_cpu_dead(unsigned int cpu, struct hlist_node *node)
{
struct zcomp *comp = hlist_entry(node, struct zcomp, node);
struct zcomp_strm *zstrm;

zstrm = per_cpu_ptr(comp->stream, cpu);
zcomp_strm_free(comp, zstrm);
return 0;
}

If you look at the vmlinux file with objdump, you can probably
figure out if the bug is dereferencing zstrm or comp. The other
things I would try to narrow down the problem are:

- unload the module manually during runtime
- update the kernel to a more recent one, such as 6.12
- use a different compression backend for zram (zstd, deflate, lzo, ...)

Arnd

Loading...