Computerworld
Memory management for graphics processors
The CPU and the GPU share access to some pages of memory. New Linux code helps the kernel keep track of memory holding data for the GPU.
Jonathan Corbet (LinuxWorld)  15 November, 2007 11:23

The management of video hardware has long been an area of weakness in the Linux system (and free operating systems in general). The X Window System tends to get a lot of the blame for problems in this area, but the truth of the matter is that the problems are more widespread and the kernel has never made it easy for X to do this job properly. Graphics processors (GPUs) have gotten steadily more powerful, to the point that, by some measures, they are the fastest processor on most systems, but kernel support for the programming of GPUs has lagged behind. A lot of work is being done to remedy this situation, though, and an important component of that work has just been put forward for inclusion into the mainline kernel.

Once upon a time, video memory comprised a simple frame buffer from which pixels were sent to the display; it was up to the system's CPU to put useful data into that frame buffer. With contemporary GPUs, the memory situation has gotten more complex; a typical GPU can work with a few different types of memory:

  • Video RAM (VRAM) is high-speed memory installed directly on the video card. It is usually visible on the system's PCI bus, but that need not be the case. There is likely to be a frame buffer in this memory, but many other kinds of data live there as well.
  • In many lower-end systems, the "video RAM" is actually a dedicated section of general-purpose system memory. That RAM is set aside for the use of the GPU and is not available for other purposes. Even adapters with their own VRAM may have a dedicated RAM region as well.
  • Video adapters contain a simple memory management unit (the graphics address remapping table or GART) which can be used to map various pages of system memory into the GPU's address space. The result is that, at any time, an arbitrary (scattered) subset of the system's RAM pages are accessible to the GPU.

Each type of video memory has different characteristics and constraints. Some are faster to work with (for the CPU or the GPU) than others. Some types of VRAM might not be directly addressable by the CPU. Memory may or may not be cache coherent - a distinction which requires careful programming to avoid data corruption and performance problems. And graphical applications may want to work with much larger amounts of video memory than can be made visible to the GPU at any given time.

All of this presents a memory management problem which, while being similar to the management needs of system RAM, has its own special constraints. So the graphics developers have been saying for years that Linux needs a proper manager for GPU-accessible memory. But, for years, we have done without that memory manager, with the result that this management task has been performed by an ugly combination of code in the X server, the kernel, and, often, in proprietary drivers. Happily, it would appear that those days are coming to an end, thanks to the creation of the translation-table maps (TTM) module written primarily by Thomas Hellstrom, Eric Anholt, and Dave Airlie. The TTM code provides a general-purpose memory manager aimed at the needs of GPUs and graphical clients.

The core object managed by TTM, from the point of view of user space, is the "buffer object." A buffer object is a chunk of memory allocated by an application, and possibly shared among a number of different applications. It contains a region of memory which, at some point, may be operated on by the GPU. A buffer object is guaranteed not to vanish as long as some application maintains a reference to it, but the location of that buffer is subject to change.

Once an application creates a buffer object, it can map that object into its address space. Depending on where the buffer is currently located, this mapping may require relocating the buffer into a type of memory which is addressable by the CPU (more accurately, a page fault when the application tries to access the mapped buffer would force that move). Cache coherency issues must be handled as well, of course.

There will come a time when this buffer must be made available to the GPU for some sort of operation. The TTM layer provides a special "validate" ioctl() to prepare buffers for processing; validating a buffer could, again, involve moving it or setting up a GART mapping for it. The address by which the GPU will access the buffer will not be known until it is validated; after validation, the buffer will not be moved out of the GPU's address space until it is no longer being operated on

That means that the kernel has to know when processing on a given buffer has completed; applications, too, need to know that. To this end, the TTM layer provides "fence" objects. A fence is a special operation which is placed into the GPU's command FIFO. When the fence is executed, it raises a signal to indicate that all instructions enqueued before the fence have now been executed, and that the GPU will no longer be accessing any associated buffers. How the signaling works is very much dependent on the GPU; it could raise an interrupt or simply write a value to a special memory location. When a fence signals, any associated buffers are marked as no longer being referenced by the GPU, and any interested user-space processes are notified.

Computerworld Buyer's Guide - Vendors Matched to this Article

Comments

Post new comment

Login or register to link comments to your user profile, or you may also post a comment without being logged in.
The content of this field is kept private and will not be shown publicly.
Enter the fully qualified URL, eg. http://www.example.com/
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.

More information about formatting options

Add to Google
Computerworld Buyer's Guide - Vendors Matched to this Article
Zones
Zone logoZones provide focussed content from Computerworld and leading technology partners.
Newsletter Subscription
Newsletter Subscription
Sign up for our Computerworld newsletters!
Syndicate content
 

Computerworld Webinar

Thursday, June 11th, 2009
10:30am EST (Sydney, Australia)
Screening at your PC

Computerworld is hosting a 30 minute live webinar to help you to learn how unified communications can save you money, foster innovation and business agility by making it easier for people to find, reach and collaborate with one another.

Register Now

Computerworld Community Comments
Whitepaper

Data Center Eco-Nomics

Discover the pathway towards greener, more efficient operations. Learn how real customers are leveraging their green efforts to drive toward the dynamic data centre of the future. Click through to watch this webinar now.

Enterprise IT Buyer's Guide
Find Technology Vendors Fast
 
Find vendors by name | Find by category
Sponsored Links
 
Send Us E-mail | Privacy Policy
Features List | Media Kit | Advertising | Contact Us

Copyright 2009 IDG Communications. ABN 14 001 592 650. All rights reserved.
Reproduction in whole or in part in any form or medium without express written permission of IDG Communications is prohibited.