Kernel space: Toward better direct I/O scalability
- 11 April, 2008 12:09
- Comments
Linux enthusiasts like to point out just how scalable the system is; Linux runs on everything from pocket-size devices to supercomputers with several thousand processors. What they talk about a little bit less is that, at the high end, the true scalability of the system is limited by the sort of workload which is run. CPU-intensive scientific computing tasks can make good use of very large systems, but database-heavy workloads do not scale nearly as well. There is a lot of interest in making big database systems work better, but it has been a challenging task. Nick Piggin appears to have come up with a logical next step in that direction, though, with a relatively straightforward set of core memory management changes.
For some time, Linux has supported direct I/O from user space. This, too, is a scalability technology: the idea is to save processor time and memory by avoiding the need to copy data through the kernel as it moves between the application and the disks. With sufficient programming effort, the application should be able to make use of its superior knowledge of its own data access patterns to cache data more effectively than the kernel can; direct I/O allows that caching to happen without additional overhead. Large database management systems have had just that kind of programming effort applied to them, with the result that they use direct I/O heavily. To a significant extent, these systems use direct I/O to replace the kernel's paging algorithms with their own, specialized code.
When the kernel is asked to carry out a direct I/O operation, one of the first things it must do is to pin all of the relevant user-space pages into memory and locate their physical addresses. The function which performs this task is get_user_pages(): int get_user_pages(struct task_struct *tsk, struct mm_struct *mm, unsigned long start, int len, int write, int force, struct page **pages, struct vm_area_struct **vmas);
A successful call to get_user_pages() will pin len pages into memory, those pages starting at the user-space address start as seen in the given mm. The addresses of the relevant struct page pointers will be stored in pages, and the associated VMA pointers in vmas if it is not NULL.
This function works, but it has a problem (beyond the fact that it is a long, twisted, complex mess to read): it requires that the caller hold mm->mmap_sem. If two processes are performing direct I/O on within the same address space - a common scenario for large database management systems - they will contend for that semaphore. This kind of lock contention quickly kills scalability; as soon as processors have to wait for each other, there is little to be gained by adding more of them.
- Bookmark this page
- Share this article
- Got more on this story? Email Computerworld
- Follow Computerworld on twitter
-
Wednesday Grok: Microsoft’s browser lockout is to be pitied more than despised
-
Change My Password logs 10 millionth account
-
Brain drain: Where Cobol systems go from here
-
The ABCs of camera phone technology
-
Change My Password logs 10 millionth account
-
Excel 2007 All-In-One Desk Reference for Dummies
-
Teach Yourself Visually Windows 7
-
MYOB Software for Dummies 6E Australian Edition
-
Office 2007 All-In-One Desk Reference for Dummies
-
Windows 7 for Dummies® Dvd+book Bundle
-
Microsoft Office
-
Office 2007 for Dummies
-
Windows 7 for Dummies®
-
Windows 7 for Seniors for Dummies®









Comments
Post new comment