Hi,
Some notes - all AFAIK (I'm no expert on Linux video or X, and could be wrong)...
AFAIK X is portable and has many different back-ends. On an OS with a standard video API (e.g. Windows, OS X) the back-end talks directly to the OSs standard video API.
Originally most *nix clones (including Linux) had a standard "text-only" video API, which was completely useless for something like X. To get around that the X back-end (and anything else that wanted to use the video for more than text) had to include it's own video drivers, and these *nix clones were modified to allow this - I'd guess that at the time it was probably as simple as an "IOPL=3" hack with some support for memory mapped I/O regions in user space. This caused plenty of problems though, partly because when a process that uses it's own video driver crashes nothing can restore the video again. It gets worse when you look at modern hardware (e.g. video cards with multiple outputs where different software using different monitors need to share the same video card, and trying to get several applications that use GPGPU to cooperate with anything using the GPU for video).
Since then, Linux developers added a bunch of half-assed crud to get around some of the problems that not having a standard video API caused. This includes a framebuffer driver, DRI, DRM, MESA, etc. AFAIK the current goal is to have video mode switching, video card memory management, and GPGPU support all built into the kernel, with anything else (2D and 3D acceleration?) in user-space. I don't think they've got this working for most video cards yet though.
Basically the way video is handled in Linux has always been a poorly designed mess (it's like watching the
Three Stooges build a nuclear reactor in slow motion). To be fair, some of the problem with Linux video support comes from changing expectations - the "text only" stuff was probably designed half a century ago (before Linux was even started), the framebuffer stuff may have been introduced before hardware acceleration became popular, and the GPGPU stuff and video cards with multiple outputs are fairly recent.
I guess what I'm trying to say is that it'd probably be better to learn how it should be done or how it could be done (with the benefit of hindsight), rather than spending time trying to figure out how it has been done by others (without the benefit of hindsight, and with disadvantages like backward compatibility and a complete lack of leadership)...
Cheers,
Brendan