Sorry for the rambling style, I am in the manic phase of my development again.
I'd like to insert my service's ipc client stubs (elf dynamic) into the service binary (elf binary), but to do this I'd need a way to
keep them separate even in the linking phase, so that they won't interact by any means other than ipc.
My dynamic linker would then serve the application from the stub built into the service,
this way I can avoid having to match the client lib and service versions.
Ultimately my goal is a behaviour like this:
Game application wants to use DisplayDriver::drawsomething()
we normally would map this to DisplayDriver::IPCClient::drawsomething()
But if we want the maximum FPS for "The n:th Attack of the Generic Bad Guys from Outer Space 2140" (the OS might actually be ready by that year)
we could request the OS to do the following:
1. load game
2. load another copy of the DisplayDriver
3. instruct the original service to release the graphics system (save display state to memory, while (1) sleep();
4. use the entire new copy of the service as a dynamic library(relocate etc)
5. use DisplayDriver::Core::drawsomething() in place of the call to DisplayDriver::drawsomething() (the lib and the core shall be ABI compatible)
6. the game process is given direct access to the display card
Game now runs with a direct function call access to the display driver without additional IPC overhead.
Any thoughts?
(Opinion request) Thoughts about IPC and services
- Combuster
- Member
- Posts: 9301
- Joined: Wed Oct 18, 2006 3:45 am
- Libera.chat IRC: [com]buster
- Location: On the balcony, where I can actually keep 1½m distance
- Contact:
actually, you dont need to shut down the original driver. To the driver it just gets another thread. All you really need to do is copy permissions over to the task.
As for my personal opinion on the matter: Its a good way to allow the end user to choose between speed and security. (Actually, i have this sort of thing on my feature list)
As for my personal opinion on the matter: Its a good way to allow the end user to choose between speed and security. (Actually, i have this sort of thing on my feature list)
Re: (Opinion request) Thoughts about IPC and services
Hi,
For a direct function call, the work done by the client/caller and the work done the service/callee must be done on the same CPU. For asynchronous IPC on a multi-CPU computer, the work done by the client can occur in parallel to work done by the service.
Imagine if an application spends 20 mS to figure out 1000 actions needed to draw a frame, the video driver spends 20 mS to draw 1000 things, each IPC/message takes 1 mS and there's 2 CPUs. In this case using direct function calls takes 40 mS and you get 25 FPS and one CPU doing nothing. For IPC with one message per action it takes 1020 mS, which is much worse, but...
If the application builds a list of actions, then sends the entire list in one message, it would take 21 mS and you'd get around 48 FPS, because the work is done by both CPUs in parallel.
The alternative here is direct function calls with a multi-threading. In this case each application thread could spend 10 mS figuring out half of the actions needed and 10 mS drawing half of the actions. The problem here is that you need to synchronise everything using some form of re-entrancy locking, and you'd end up with lock contention slowing things down. You'd get 50 FPS with no lock contention, but the more lock contention there is the worse it gets (for example, with a single lock protecting access to display memory you'd get high lock contention, and be lucky to get 35 FPS).
How well it scales is another problem - for example, with 8 CPUs the lock contention for multi-threaded direct calls becomes much worse, and the "single-threaded with IPC" approach leaves 6 CPUs unused.
In this case one approach would be to have 8 application threads each build their own list, then combine all lists into one list, and send the list to the service. The service would have a "controlling" thread which splits the list into 8 seperate lists that can be processed in parallel (e.g. divide the screen into 8 seperate sections), and then have 8 threads processing a mini-list each. All CPUs would almost always be in use (either drawing part of a frame or figuring out part of the next frame), there is no need for re-entrancy locks (and no lock contention), and you've still only got the overhead of one message. The main bottleneck here would be the hardware (e.g. the PCI/AGP bus and/or GPU).
A better idea would be to do something very similar but use 4 or 5 threads for the application and 4 threads for the video driver to minimise this hardware bottleneck (i.e. do video access and the application's work in parallel, rather than doing all of the video work in parallel then all of the application's work in parallel).
Cheers,
Brendan
Lode wrote:Ultimately my goal is a behaviour like this:
Ultimately, my goal goes a little like this...Lode wrote:Any thoughts?
For a direct function call, the work done by the client/caller and the work done the service/callee must be done on the same CPU. For asynchronous IPC on a multi-CPU computer, the work done by the client can occur in parallel to work done by the service.
Imagine if an application spends 20 mS to figure out 1000 actions needed to draw a frame, the video driver spends 20 mS to draw 1000 things, each IPC/message takes 1 mS and there's 2 CPUs. In this case using direct function calls takes 40 mS and you get 25 FPS and one CPU doing nothing. For IPC with one message per action it takes 1020 mS, which is much worse, but...
If the application builds a list of actions, then sends the entire list in one message, it would take 21 mS and you'd get around 48 FPS, because the work is done by both CPUs in parallel.
The alternative here is direct function calls with a multi-threading. In this case each application thread could spend 10 mS figuring out half of the actions needed and 10 mS drawing half of the actions. The problem here is that you need to synchronise everything using some form of re-entrancy locking, and you'd end up with lock contention slowing things down. You'd get 50 FPS with no lock contention, but the more lock contention there is the worse it gets (for example, with a single lock protecting access to display memory you'd get high lock contention, and be lucky to get 35 FPS).
How well it scales is another problem - for example, with 8 CPUs the lock contention for multi-threaded direct calls becomes much worse, and the "single-threaded with IPC" approach leaves 6 CPUs unused.
In this case one approach would be to have 8 application threads each build their own list, then combine all lists into one list, and send the list to the service. The service would have a "controlling" thread which splits the list into 8 seperate lists that can be processed in parallel (e.g. divide the screen into 8 seperate sections), and then have 8 threads processing a mini-list each. All CPUs would almost always be in use (either drawing part of a frame or figuring out part of the next frame), there is no need for re-entrancy locks (and no lock contention), and you've still only got the overhead of one message. The main bottleneck here would be the hardware (e.g. the PCI/AGP bus and/or GPU).
A better idea would be to do something very similar but use 4 or 5 threads for the application and 4 threads for the video driver to minimise this hardware bottleneck (i.e. do video access and the application's work in parallel, rather than doing all of the video work in parallel then all of the application's work in parallel).
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
I mean that the ABI of the service would be exactly same whether or not the called function is the function proper itself or a stub that routes the call through ipc to the function proper.Crazed123 wrote:Could you explain what exactly you're trying to do in design terms? I understand that you want games and drivers to communicate by function call, but how does this relate to IPC?
@Brendan:
Thanks a lot. I hadn't thought about the SMP performance hit, but I see it now. :/
The IPC system is message based and the channels are "from thread to thread". Delivery is guaranteed, eg. the call blocks until the message can be queued. After that it is the responsibility of the client library wrapper to sleep until the response arrives if blocking operation is desired.