Nicolas_vamous

The Basics

A already fixed issue I found when code auditing the old version of mali driver.

The vulnerability

There is an ioctl command KBASE_IOCTL_KCPU_QUEUE_ENQUEUE in the ioctl of mali driver:

......
case KBASE_IOCTL_KCPU_QUEUE_ENQUEUE:
KBASE_HANDLE_IOCTL_IN(KBASE_IOCTL_KCPU_QUEUE_ENQUEUE,
kbasep_kcpu_queue_enqueue,
struct kbase_ioctl_kcpu_queue_enqueue,
kctx);
break;
.....

We can use the command KBASE_IOCTL_KCPU_QUEUE_ENQUEUE to enqueue a command into the KCPU queue. For example, we can use KBASE_IOCTL_KCPU_QUEUE_ENQUEUE to enqueue a queue command BASE_KCPU_COMMAND_TYPE_GROUP_SUSPEND.

The queue command BASE_KCPU_COMMAND_TYPE_GROUP_SUSPEND is processed with the function kbase_csf_kcpu_queue_enqueue. In the function kbase_csf_kcpu_queue_enqueue, the queue command BASE_KCPU_COMMAND_TYPE_GROUP_SUSPEND will be processed first by function kbase_csf_queue_group_suspend_prepare, and then it will be processed by the function kbase_csf_queue_group_suspend_process.

In the function kbase_csf_queue_group_suspend_process, it calls the function kbase_csf_queue_group_suspend:

1int kbase_csf_queue_group_suspend(struct kbase_context *kctx,
		  struct kbase_suspend_copy_buffer *sus_buf,
		  u8 group_handle)
4{
struct kbase_device *const kbdev = kctx->kbdev;
int err;
struct kbase_queue_group *group;
8
err = kbase_reset_gpu_prevent_and_wait(kbdev);
if (err) {
dev_warn(
	kbdev->dev,
	"Unsuccessful GPU reset detected when suspending group %d",
	group_handle);
return err;
}
mutex_lock(&kctx->csf.lock);
18
group = find_queue_group(kctx, group_handle);
if (group)
err = kbase_csf_scheduler_group_copy_suspend_buf(group,  //<-------- enter the routine !!!
						 sus_buf);
else
err = -EINVAL;
25
mutex_unlock(&kctx->csf.lock);
kbase_reset_gpu_allow(kbdev);
28
return err;
30}

As you can see that the function kbase_csf_scheduler_group_copy_suspend_buf gets called:

1int kbase_csf_scheduler_group_copy_suspend_buf(struct kbase_queue_group *group,
struct kbase_suspend_copy_buffer *sus_buf)
3{
struct kbase_context *const kctx = group->kctx;
struct kbase_device *const kbdev = kctx->kbdev;
struct kbase_csf_scheduler *const scheduler = &kbdev->csf.scheduler;
int err = 0;
8
kbase_reset_gpu_assert_prevented(kbdev);
lockdep_assert_held(&kctx->csf.lock);
mutex_lock(&scheduler->lock);
12
if (kbasep_csf_scheduler_group_is_on_slot_locked(group)) {
DECLARE_BITMAP(slot_mask, MAX_SUPPORTED_CSGS) = {0};
15
set_bit(kbase_csf_scheduler_group_get_slot(group), slot_mask);
17
if (!WARN_ON(scheduler->state == SCHED_SUSPENDED))
	suspend_queue_group(group);
err = wait_csg_slots_suspend(kbdev, slot_mask,
			     kbdev->csf.fw_timeout_ms);
if (err) {
	dev_warn(kbdev->dev, "[%llu] Timeout waiting for the group %d to suspend on slot %d",
		 kbase_backend_get_cycle_cnt(kbdev),
		 group->handle, group->csg_nr);
	goto exit;
}
}
29
if (queue_group_suspended_locked(group)) {
unsigned int target_page_nr = 0, i = 0;
u64 offset = sus_buf->offset;
size_t to_copy = sus_buf->size;
34
if (scheduler->state != SCHED_SUSPENDED) {
	/* Similar to the case of HW counters, need to flush
	 * the GPU cache before reading from the suspend buffer
	 * pages as they are mapped and cached on GPU side.
	 */
	kbase_gpu_start_cache_clean(kbdev);
	kbase_gpu_wait_cache_clean(kbdev);
} else {
	/* Make sure power down transitions have completed,
	 * i.e. L2 has been powered off as that would ensure
	 * its contents are flushed to memory.
	 * This is needed as Scheduler doesn't wait for the
	 * power down to finish.
	 */
	kbase_pm_wait_for_desired_state(kbdev);
}
51
for (i = 0; i < PFN_UP(sus_buf->size) &&
		target_page_nr < sus_buf->nr_pages; i++) {
	struct page *pg =
		as_page(group->normal_suspend_buf.phy[i]);  //<--------- OOB read can happen here !!!!
	void *sus_page = kmap(pg);
57
	if (sus_page) {
		kbase_sync_single_for_cpu(kbdev,
			kbase_dma_addr(pg),
			PAGE_SIZE, DMA_BIDIRECTIONAL);
62
		err = kbase_mem_copy_to_pinned_user_pages(
				sus_buf->pages, sus_page,
				&to_copy, sus_buf->nr_pages,
				&target_page_nr, offset);
		kunmap(pg);
		if (err)
			break;
	} else {
		err = -ENOMEM;
		break;
	}
}
schedule_in_cycle(group, false);
} else {
/* If addr-space fault, the group may have been evicted */
err = -EIO;
}
80
81exit:
mutex_unlock(&scheduler->lock);
return err;
84}

As you can see, when performing the work of copying suspend buffer, both sus_buf->size and sus_buf->nr_pages are used as the upper limit of the for circle. But because there are not enough checks for both sus_buf->size and sus_buf->nr_pages, the sus_buf->size and sus_buf->nr_pages can be really big, as a result, the variable i used to traverse the array group->normal_suspend_buf.phy can be out of bound of the actual size of the array group->normal_suspend_buf.phy. As a result, the OOB read will happen and an illegal pointer of struct page will be got, information leak might happen for that.