Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

drm: Some refactoring and bug fixes to address VM panics #2215

Merged
merged 11 commits into from
Oct 9, 2024

Conversation

markjdb
Copy link
Contributor

@markjdb markjdb commented Sep 4, 2024

The last commit is the main change. I had tried a couple of times to split it up but it was hard to do correctly, so I gave up in the interest of making the patches available a bit more quickly.

I took some small liberties in modifying drmkpi and DRM to better match the design of FreeBSD's fault handler. Since this code (drm device file mmap, fault handling) is highly non-portable anyway, I think this should be acceptable.

This should fix the panics reported in #2097 . I might be too optimistic, but so far it also seems to address some issues I was seeing with tearing and clipped text.

Copy link
Collaborator

@bsdjhb bsdjhb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This generally looks ok to me.

sys/dev/drm/freebsd/drm_os_freebsd.c Show resolved Hide resolved
markjdb and others added 11 commits October 9, 2024 12:07
Make sure that a failure from pmap_gpu_enter() triggers an assertion failure.
This should only happen under memory pressure.
mmap_sem refers to a Linux synchronization mechanism.  Linux is different enough
here that the references are not very helpful.

No functional change intended.
vm_pfn_pcount is only set once, we might as well set it during
initialization.
vmap->vm_len is always 0 here.  I believe this code is effectively dead, but
let's fix it anyway.
See vm_fault_populate(): a return value of VM_PAGER_BAD tells the fault layer to
continue handling the fault "normally", i.e., it will allocate a page, insert it
into the VM object, and map it.  That's not what we want in general.
There are currently no DRM objects of type OBJT_DEVICE, all are of type
OBJT_PHYS or OBJT_MGTDEVICE.  drm_cdev_pager_fault() is thus dead code, so
remove it in advance of further refactoring.  In particular, drm_vmap_find()
will be removed.
As in dev_pager_dealloc(), drop the object lock around calls to the destructor.
This might be necessary if the destructor wants to sleep.
Each GEM object is backed by a VM object which stores a handle for each page
belonging to the object.  A GEM object can be mapped multiple times.  Panfrost's
buffer objects are backed by physical RAM from the host system.

The old code tried to maintain a global list of VMAs which map a GEM object, and
this used the object pointer as a unique key, which breaks when the same object
is mapped more than once.  The mapping list is also somewhat awkward to
maintain: FreeBSD's vm_fault layer is supposed to hide details relating to
mappings of a given object (for instance, it doesn't pass the fault address to
DRM), but Linux passes such details, so we end up faking them.

Refactor things to handle the possibility of multiply-mapped GEM objects:
- Minimize uses of VMAs.  In the pager populate handler, we don't know which
  mapping the fault came from, so it doesn't make sense to try and look up a
  VMA.  In practice, there is no need to know the origin of the fault, we just
  need a pindex to select the page to map.
- Introduce a struct drm_object_glue, which glues together a GEM object, a VM
  object, and a list of VMAs which map the object.  The list of VMAs is mostly
  useless but I've avoided removing it for now.  Most uses of the VMA are
  replaced by extending the thread-private vm_fault structure.
- Convert the global linked list lock to an sx lock so that it's possible to
  hold it while allocating memory.
- Change the DRM object fault interface to avoid referencing VMAs.  This
  deviates from upstream, but the modified code is all quite FreeBSD-specific
  anyway.

While here, fix a related problem: when mmap()ing a device file,
drm_fstub_do_mmap() tries to infer the VM object type to use.  In particular, it
creates a "managed device" object for panfrost buffer objects, which isn't
correct since the object is backed by host RAM.  The problem is further
illustrated by the need to clear VPO_UNMANAGED and set PG_FICTITIOUS: setting
PG_FICTITIOUS in host RAM pages is incorrect and can trigger assertion failures
in the contig reclaim code.

My view is that it's not possible to correctly map DRM objects without modifying
upstream sources a little bit.  That is, we cannot hide everything in
drmkpi/linuxkpi.  Keeping this in mind, the change fixes the problem described
above by adding a new VM object type to struct vm_operations_struct, used in
drm_fstub_do_mmap() to allocate the correct type of VM object.
@markjdb markjdb merged commit 81af918 into CTSRD-CHERI:dev Oct 9, 2024
29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants