TTM for developers
Background and Naming
TTM is short for Translation Table Maps which was originally a new DRM map type that could be flipped into and out of a general translation table; for example an AGP or PCI Gart. Since then it has, a bit misleading, become the name of the functionality of the TTM unified memory manager for display memory. A white-paper describing the TTM memory manager can be found here. The original implementation was by many considered too complex and with too little flexibility. An update / reorganization has been done to correct most if not all of the issues people were having.
Purpose
The purpose of the TTM implementaion is to provide tools for buffer object placement, caching, mapping and synchronization. Placement could for example be system memory, pre-bound AGP memory or dynamic AGP memory. The implementation also handles cache coherency automatically by mapping memory in uncached- or write-combined placement using the correct attributes. Buffer object data can be mapped to user-space with a single virtual address that doesn't change even if the placement of the buffer object changes. TTM can also optionally handle synchronization on a per-buffer-object level and provides an optional read-write-lock replacement for the global DRM lock. It's designed to minimize lock contention in the presence of multiple rendering clients and command fifos. TTM does not attempt to handle all kinds of GPU mapping. For simple hardware, the placement functionality is enough to give a unique GPU virtual address for a buffer in VRAM or AGP space. For a device with multiple memory contexts and GPU paging, TTM will handle placement and CPU mapping, whereas the driver itself will need to set up the GPU maps.
Relation to GEM
In the following text it's important to distinguish between the GEM users-space API, the GEM core implementation and the Intel device-specific implementation backing this API. TTM can be used as a backing implementation for the GEM user-space API, which, together with the GEM core implementation is simple and straight-forward. However, TTM may also be used with a separate user-space interface for buffer object handling. Which one to choose is really up to the driver writer, and the driver writer can easily set up a unique user-space API.
The advantages of the TTM implementation compared to the implementation backing GEM for the Intel drivers are: The TTM implementation
- Is designed with concurrency in mind and has a fine-grained locking scheme with deadlock prevention. The Intel implementation, and to some extent even the GEM core implementation relies on the somewhat outdated global hardware lock and device global mutexes for concurrency.
- Has a well defined and efficient way to handle write-combined memory and VRAM, contrary to the Intel driver implementation.
- Provides optional tools for per-buffer synchronization and even sub-buffer synchronization between clients.
- Is designed to support user-space sub-allocators by supporting single CPU virtual addresses for buffer data and user-space fence objects.
- Natively supports mmap, (p)write and (p)read using syscalls, and does not need ioctl wrappers for this functionality.
- No physical page aliasing with inconsistent caching attributes as in the Intel driver, as this apparently violates processor specs and causes data corruption on certain processors.
Drawbacks of the TTM implementation compared to the Intel GEM implementations are mainly:
- Buffer objects do not have a one-to-one inode mapping, which makes non-trivial to share buffer objects between devices. On the other hand this is what makes it possible to use the mmap, write and read syscalls directly on TTM buffers.
- The physical page aliasing used by the Intel driver GEM implementation makes CPU reading from GPU memory quite efficient.
- Creation of uncached / write-combined dynamic AGP / TT buffers is slow with TTM, but there are various workarounds for this.
Components
The TTM code is now structured as a library of functionality that a driver writer can use. The following components currently exists:
Buffer objects Implements base buffer class placement and synchronization. By default buffer objects are not visible to user-space but the buffer class can be used as a base class for more complex implementations if needed. The buffer class implementation requires a synchronization class, the objects of which are opaque to the buffer class driver. The device driver needs to provide methods for those. The buffer class implementation lives in the files ttm_bo_driver.h, ttm_bo_api.h, ttm_placement_common.h, ttm_bo.c (core implementation), ttm_bo_vm.c (buffer object VM and IO) and ttm_bo_util.c (Buffer move and kernel map utilities).
Fence objects The TTM fence class can optionally be used as synchronization class for buffer objects, and there are utility functions that can be plugged into the buffer class driver structure to enable this. Fence objects are not visible to user space by default, but the fence class can be used as a base class for any form of user-visible fence subclasses. The fence class implementation lives in the files ttm_fence_driver.h, ttm_fence_api.h and ttm_fence.c.
User objects. The user base class is used as a base class for any user-visible data structure. The implementation provides two data structures, A base class and a ref class. The base class objects have a unique 32-bit identifier and a refcount member. A ref object is intended to be created for each user-space reference of a base object. The TTM user base class is in many ways similar to the GEM class, except it doesn't carry any buffer class specific members and can therefore be used as a base class for any user-visible structure. Be it a fence class, a lock structure or a buffer class. The ref class is similar to a kernel struct idr, but it does not have a unique identifier member. Instead each ref class object is identified by the base class object it references. In this way, many references by a single application to a base class object will only create a single ref object with an internal refcount. Ref object can have different types, for example usage references and synchronization references. When an application exits or dies, all its reference objects are deleted, removing any state that application has imposed on its referenced base class objects. User object classes live in the files ttm_objects.h and ttm_objects.c. Note that a driver that uses TTM buffer objects to back a GEM interface, and that does not need any other objects to be visible to user-space does not need the user object implementation and will not include these files.
Placement user objects A minimalistic implementation of a class derived from the TTM buffer class and the TTM user base class to create a buffer class visible from user-space. It comes with a minimal user-space interface. This implementation replaces the GEM user-space interface for those drivers that desire. The implementation lives in ttm_placement_user.h, ttm_placement_common.h, ttm_userobj_api.h and ttm_placment_user.c. The files ttm_placement_user.h and ttm_placement_common.h define the user-space interface and are exported as include files.
Fence user objects A fence class derived from the TTM fence class and TTM user base class to create a fence object visible from user-space. Also comes with a minimal user-space interface. The implementation lives in ttm_fence_user.h, ttm_userobj_api.h and ttm_fence_user.c. User-space includes the file ttm_fence_user.h to be able to use the user-space API. While user-visible fence objects may to some appear unnecessary, there are some important use cases:
- Swap buffer throttling. Pointers to fence objects can be stored on renderbuffer structures to wait for outstanding swapbuffers or page-flips.
- User-space suballocators. Each user-space buffer will have a user-space fence object as a sync object. If user-space sub-buffers are to be re-exported to other clients, for example as part of DRI buffer sharing, user-space fence objects need also to be shareable.
- Error reporting. Asynchronous error reporting for, for example, accelerated video decoders.
- Backing various openGL fence extensions.
- May aid the implementation of various openGL queries.
TTM locks. The TTM lock structure derives from the TTM user base class and carries two atomic members. It's a read-write lock, and taking it in read mode is quick whereas taking it in write mode is slow. Processes are allowed to exit kernel space with the lock held in write mode, but not with the lock held in read mode. This also means that if an application dies with a TTM lock held in write mode, the kernel must release the lock. There are at least three distinct uses of the TTM lock:
- Block buffer object validation when VT-switched. Effectively suspends DRI clients.
- Block buffer object validation when destroying TTM memory types as part of a driver takedown process.
- Block other clients from validating buffer when an out-of-graphics memory condition has occured and we need to start evicting all buffers in a memory type to work around fragmentation.
Typically the TTM lock is taken in read mode before buffers are validated. For any of the above three conditions, it is instead taken in write (exclusive) mode. The TTM lock implementation lives in ttm_lock.h and ttm_lock.c. Like the other components, the TTM lock is a free-standing implementation and neither buffer objects nor fence objects depend on it. However, as noted above, it derives from the TTM user base object.
Execbuf utilities
The execbuf IOCTL is usually quite driver-specific, however there are some functions to aid execbuf driver writing; mainly to reserve buffers for validation in a deadlock-safe and starvation-safe manner, and to put sync objects on those buffers or back off validation in case of errors or interruption. These utilities are located in ttm_execbuf_util.h and ttm_execbuf_util.c. The driver writer may choose whether to use these utilities.
AGP TTM backend
For drivers using traditional AGP memory there's an AGP TTM backend implemented in ttm_agp_backend.c. This backend supports multiple AGP bridges, but the driver needs to know what AGP bridge it sits on. The backend calls into the Linux agpgart kernel API. Drivers with other types of GPU MMU will have to write their own backends for that particular MMU. However, note that hardware with advanced and perhaps per-context MMUs that create GPU views into the data only will probably not want to use the TTM backend mechanism, but a customized bind / unbind procedure. For those drivers TTM should be used to place data in a suitable memory type and with suitable caching attributes prior to setting up the GPU maps. If we take the Intel i965 MMU system as an example, TTM with the AGP backend would be used to place buffers either in VRAM (stolen memory) or write-combined the global GTT aperture for mapping and perhaps for shared buffers. The i965 also has per-memory-context GTTs, the mapping of which is visible to the GPU only. A driver could allocate a private GTT page table per client and let the client manage its private GTT in user-space. The actual binding to a private GTT would then be done by the driver as part of command submission, and the driver would call into TTM to make sure that the buffer objects are in a bindable state: reciding in system memory and write-combined.
User-space components
The TTM user-space API is quite small and a client can and may use the IOCTLs directly. However, there is a general user-space buffer manager implementation, libwsbm, with backends that interface to the placement user object and fence user object implementations.


