Of course not. Those are not even accessible to our application.Combuster wrote:Including his credit/debit card number?I can even see exactly what the customer does
Best processor for 32-bit [rd]OS - a.k.a RDOS-OS is best OS!
Re: Best processor for 32-bit [rd]OS - a.k.a RDOS-OS is best
Re: Best processor for 32-bit [rd]OS - a.k.a RDOS-OS is best
I doubt that's even possible with todays card-handling, with chip & pin, 3d-secure and everything else put in place to protect from fraud. Well, except for the US - where ancient stuff like cheques/checkes are still in use (been like 40 years since it was widely used here).Combuster wrote:Including his credit/debit card number?I can even see exactly what the customer does
Re: Best processor for 32-bit OS
Logging is the domain of the application, not the OS. The OS provides the logging framework, the application decides what to log where. And don't give me the "it's different in embedded programming" stuff. It's perhaps different in RDOS, but it's bad design any way you put it.rdos wrote:That's what you have log files for.
If your syscall failed fatally, you do present an error to the end user, because you don't have a choice. It might be a screen message, it might be a reboot, or a flashing LED, or a beeping sound, but it's an error message.rdos wrote:Presenting errors to end-users of embedded systems is just plain stupid.
Correct. That's why the logging shouldn't be done by the filesystem, or the GUI, but meaningful errors be reported to the application, so that the application (which is the part of the system knowing what it was actually doing at the time) can generate a meaningful log message.Besides, you would not log filesystem error codes in a log, as that would not make any sense to the typical support guy. [...] The trick is to provide useful information, and not to log things that only programmers understand.
Whether the solution is to retry, or reboot, or whatever, doesn't really matter. In order to fix the problem at the cause, you need all the information you can get. The OS knows how something failed, but only the application knows what it was that failed. That is why a simple boolean success / fail return code of syscalls is suboptimal, which is what Brendan pointed out, and which is where you lapsed (for the umpteenth time) into your standard defensive pattern of "I am not wrong, because that is the way RDOS does it".
Every good solution is obvious once you've found it.
Re: Best processor for 32-bit [rd]OS - a.k.a RDOS-OS is best
Solar, I have 15+ years of experience professional embedded system development for petrol stations, and I know what works and what doesn't, and there is no design limitations in RDOS in this regard. The API mostly was designed during the last 15 years, and adapted to what I regard best practises for such applications. So, the API is the consequence of my experience in the area, not a bagage to overcome. Therefore, I don't need to defend anything.
- Combuster
- Member
- Posts: 9301
- Joined: Wed Oct 18, 2006 3:45 am
- Libera.chat IRC: [com]buster
- Location: On the balcony, where I can actually keep 1½m distance
- Contact:
Re: Best processor for 32-bit [rd]OS - a.k.a RDOS-OS is best
You just failed your logic exam.
Again.
I don't need to defend that statement you are the only one who disagrees and does not understand.
Again.
I don't need to defend that statement you are the only one who disagrees and does not understand.
Re: Best processor for 32-bit [rd]OS - a.k.a RDOS-OS is best
And I have 12+ years of experience mopping up the truly mediocre stuff others have left behind, be it out of ignorance, attempts at "job security", or being locked up in an ivory tower.rdos wrote:Solar, I have 15+ years of experience professional embedded system development for petrol stations...
Can we pull up our pants again?
Every good solution is obvious once you've found it.
Re: Best processor for 32-bit [rd]OS - a.k.a RDOS-OS is best
Updated results on my 2-core AMD E-300 portable (at 1.2GHz):
near: 20.0 million calls per second.
gate: 2.7 million calls per second.
syscall: 3.8 million calls per second.
IOW, there is a 40% performance improvement when SYSENTER/SYSEXIT is used instead of a call gate. OTOH, this processor is still slower than AMD Geode at 500MHz!
This is the code tested:
This alternative sysentry code provides an even larger boost:
This version does 6.5 million calls per second, which is 2.4 times the call gate performance.
There are several issues that must be solved before these results are usable. One issue is that since it is the application that sets up both EIP and ESP, it is possible for an application to forge addresses within kernel space. A second issue is that RDOS cannot handle the stack being loaded with a 32-bit flat stack pointer, and will panic when the code faults or is debugged. A third issue is that the switch from user to kernel with sysenter when debugging now will go through lots of irrelevant code, which makes it harder to debug.
near: 20.0 million calls per second.
gate: 2.7 million calls per second.
syscall: 3.8 million calls per second.
IOW, there is a 40% performance improvement when SYSENTER/SYSEXIT is used instead of a call gate. OTOH, this processor is still slower than AMD Geode at 500MHz!
This is the code tested:
Code: Select all
; patched code
nop ; lead-byte (1 byte)
call gate_entry ; a near call to a dynamically created user-level gate entry (5 bytes)
nop
nop
; The dynamic entry. These are placed in application space with read only access.
gate_nr DD ?
gate_entry Proc near
push eax
push ecx
push edx
mov ecx,esp
mov edx,OFFSET gate_leave
sysenter
gate_leave:
pop edx
pop ecx
pop eax
ret
gate_entry Endp
; in kernel
gate_nr = -9
app_eax = 8
app_ecx = 4
app_edx = 0
; Each core will setup it's own sysenter handler. This can be used to define the processor block linear address
sysenter_entry:
mov eax,OFFSET proc_linear ; patched at initialization time to contain linear address of processor block
mov ss,cs:[eax].ps_syscall_ss0 ; get ss0 of current thread
mov esp,stack0_size ; load top of stack
sti
push edx ; push return-point (application EIP)
push ecx ; put application ESP on kernel stack
mov eax,ds:[edx].gate_nr ; get gate # from just before the current procedure
push dword ptr cs:[eax].ret ; push sysleave offset
push dword ptr cs:[eax].sel ; push handler selector
push dword ptr cs:[eax].offset ; push handler offset
mov eax,ds:[ecx].app_eax
mov edx,ds:[ecx].app_edx
mov ecx,ds:[ecx].app_ecx
retf32 ; jump to syscall handler
; in a device-driver module
dummy_gate Proc near
ret
dummy_gate Endp
; exit procedure:
sysleave_entry16:
push ecx
mov ecx,ss:[esp+6] ; get application ESP
mov ds:[ecx].app_edx,edx ; return registers to caller
mov ds:[ecx].app_eax,eax
pop ds:[ecx].app_ecx
pop dx ; pop unused high part of entry-point EIP
pop ecx ; pop application ESP
pop edx ; pop application EIP
sysleave
Code: Select all
sysenter_entry:
push edx ; push return-point (application EIP)
push ecx ; put application ESP on kernel stack
mov eax,ds:[edx].gate_nr ; get gate # from just before the current procedure
push dword ptr cs:[eax].ret ; push sysleave offset
push dword ptr cs:[eax].sel ; push handler selector
push dword ptr cs:[eax].offset ; push handler offset
mov eax,ds:[ecx].app_eax
mov edx,ds:[ecx].app_edx
mov ecx,ds:[ecx].app_ecx
retf32 ; jump to syscall handler
There are several issues that must be solved before these results are usable. One issue is that since it is the application that sets up both EIP and ESP, it is possible for an application to forge addresses within kernel space. A second issue is that RDOS cannot handle the stack being loaded with a 32-bit flat stack pointer, and will panic when the code faults or is debugged. A third issue is that the switch from user to kernel with sysenter when debugging now will go through lots of irrelevant code, which makes it harder to debug.
Re: Best processor for 32-bit [rd]OS - a.k.a RDOS-OS is best
This alternative provides the required safety from applications inserting random sysenter instructions. Instead of loading edx with the return address, edx is loaded with the gate #. The gate number can then be evaluated, and the correct handler return is then hardcoded for sysexit.
EDIT: Removed EAX from saved registers as it is possible to do the sysenter handler without modifying EAX.
EDIT: Removed EAX from saved registers as it is possible to do the sysenter handler without modifying EAX.
Code: Select all
; patched code
nop ; lead-byte (1 byte)
call gate_entry ; a near call to a dynamically created user-level gate entry (5 bytes)
nop
nop
; The dynamic entry. These are placed in application space with read only access.
gate_entry Proc near
push ecx
push edx
mov ecx,esp
mov edx,gate_nr ; code is patched with gate # at creation time
sysenter
gate_leave:
pop edx
pop ecx
ret
gate_entry Endp
; in kernel
app_ecx = 4
app_edx = 0
; Each core will setup it's own sysenter handler. This can be used to define the processor block linear address
sysenter_entry:
mov ss,cs:ps_syscall_ss0 ; get ss0 of current thread. Patched at creation time
mov esp,stack0_size ; load top of stack
sti
push ecx ; put application ESP on kernel stack
cmp edx,usergate_entries
jae sysenter_fail ; check that gate # is within limits
mov ecx,cs:[4*edx].gate_linear ; get handler-address of this entry in kernel space. Gate linear is patched at creation time
push ecx ; push application return-point
shl edx,GATE_SHIFT ; get to correct entry
add edx,OFFSET gate_table ; add linear address to entry table (patched)
push dword ptr cs:[edx].ret ; push sysleave offset
push dword ptr cs:[edx].sel ; push handler selector
push dword ptr cs:[edx].offset ; push handler offset
mov edx,ds:[ecx].app_edx ; get user EDX
mov ecx,ds:[ecx].app_ecx ; get user ECX
retf32 ; jump to syscall handler
; exit procedure for 32-bit code:
sysleave_entry32:
xchg ecx,ss:[esp+4] ; get application ESP, and save return ECX
mov ds:[ecx].app_edx,edx ; write-back application EDX
mov edx,ss:[esp+4] ; get return ECX
mov ds:[ecx].app_ecx,edx ; write-back application ECX
mov edx,ss:[esp] ; get application EIP
sysexit
; exit procedure for 16-bit code:
sysleave_entry16:
xchg ecx,ss:[esp+6] ; get application ESP, and save return ECX
mov ds:[ecx].app_edx,edx ; write-back application EDX
mov edx,ss:[esp+6] ; get return ECX
mov ds:[ecx].app_ecx,edx ; write-back application ECX
mov edx,ss:[esp+2] ; get application EIP
sysexit
Last edited by rdos on Wed Apr 18, 2012 9:12 am, edited 2 times in total.
Re: Best processor for 32-bit [rd]OS - a.k.a RDOS-OS is best
The tamper safe versions end up with these timings:
2-core AMD E-300 portable (at 1.2GHz):
near: 20.0 million calls per second.
gate: 2.7 million calls per second.
syscall (loading ss:esp): 3.6 million calls per second.
syscall (not loading ss:esp): 6.0 million calls per second.
IOW, creating support for using a flat kernel stack in the production release would increase syscall performance considerably on this processor.
Perhaps, the most intersting thing is that loading ss:esp takes much longer than just loading a general segment register like GS. When an additional load of GS in the version that doesn't load ss:esp is added, performance changes to 5.3 million calls per second, not to the value when loading ss:esp, indicating that the implementation of loading ss is horribly slow on this processor.
2-core AMD E-300 portable (at 1.2GHz):
near: 20.0 million calls per second.
gate: 2.7 million calls per second.
syscall (loading ss:esp): 3.6 million calls per second.
syscall (not loading ss:esp): 6.0 million calls per second.
IOW, creating support for using a flat kernel stack in the production release would increase syscall performance considerably on this processor.
Perhaps, the most intersting thing is that loading ss:esp takes much longer than just loading a general segment register like GS. When an additional load of GS in the version that doesn't load ss:esp is added, performance changes to 5.3 million calls per second, not to the value when loading ss:esp, indicating that the implementation of loading ss is horribly slow on this processor.
Re: Best processor for 32-bit OS
No way! None of your system would pass any kind of audit!Solar wrote:Logging is the domain of the application, not the OS.
If it would be the application's duty to log it's trying to do something nasty, of course it won't log it! Design failure!
Re: Best processor for 32-bit [rd]OS - a.k.a RDOS-OS is best
Ignoring all off-topic posts above ^^
Re: Best processor for 32-bit OS
Erm... what? I think we're seriously misunderstanding each other, here.turdus wrote:No way! None of your system would pass any kind of audit!Solar wrote:Logging is the domain of the application, not the OS.
If it would be the application's duty to log it's trying to do something nasty, of course it won't log it! Design failure!
Example: sshd, the SSH server. If I attempt to break in to that noteable, it's sshd that will write the log entry about it. Not the kernel, not the pam module handling the actual auth request, but the application that knows what's going on overall.
To be precise, even the logging isn't done by the OS itself, but by an application (syslog-ng, in my case).
Of course my hacker script won't write a log about what it's doing...
Every good solution is obvious once you've found it.
Re: Best processor for 32-bit [rd]OS - a.k.a RDOS-OS is best
I think you all got away from the issue, if I understand it correctly Brendan commented on using the carry as error signal instead of returning proper error codes. Returning different error codes from each function would be a good idea, no matter who's responsibility it is to log it, print it or discard it.
Re: Best processor for 32-bit [rd]OS - a.k.a RDOS-OS is best
Hi,
2.7 million "call gates" per second on a 1.2 GHz CPU works out to about 444 cycles per "call gate". By subtracting the "about 56 cycles per iteration" loop overhead from above this gives us an actual figure closer to 388 cycles for the call gate alone.
3.8 million "sysenters" per second on a 1.2 GHz CPU works out to about 316 cycles per "sysenter". By subtracting the "about 56 cycles per iteration" loop overhead again this gives us an actual figure closer to 260 cycles for the sysenter alone. This is about 50% faster than the call gate method.
6.5 million "alternative sysenters" per second on a 1.2 GHz CPU works out to about 185 cycles per "alternative sysenter". By subtracting the "about 56 cycles per iteration" loop overhead again this gives us an actual figure closer to 129 cycles for the alternative sysenter alone. This is about 300% faster than the call gate method, and about 200% faster than the original "sysenter" method.
The only difference between the "sysenter" and "alternative sysenter" method is that the former loads a different SS:ESP while the latter doesn't. Because the alternative method is about 200% faster, this means that loading a different SS:ESP must halve the performance. Loading a different value into ESP is just a normal "mov" and should only cost about 2 cycles. Therefore loading a different value into the SS register must be costing about 126 cycles all by itself. Loading a different value into CS would cost about the same. Therefore, without these (CS and SS) segment loads (at 126 cycles each) the cost of the sysenter and sysexit instructions alone would be about 5 cycles.
This is undeniable proof that if RDOS didn't use segmentation system calls would be about as fast as a near call.
This is undeniable proof that if RDOS didn't use segmentation it would be more secure.
Cheers,
Brendan
rdos wrote:IOW, there is a 40% performance improvement when SYSENTER/SYSEXIT is used instead of a call gate. OTOH, this processor is still slower than AMD Geode at 500MHz!
rdos wrote:This alternative sysentry code provides an even larger boost:
20.0 million "near calls" per second on a 1.2 GHz CPU works out to about 60 cycles per "near call". I'd expect that a near call actually costs about 4 cycles, so this first test indicates that the loop overhead is probably about 56 cycles per iteration (probably because the compiler is crap - a decent compiler would have inlined the "do nothing" function, then decided that "sync_val" never changes because it's not volatile and generated a "jmp $" infinite loop).rdos wrote:This version does 6.5 million calls per second, which is 2.4 times the call gate performance.
2.7 million "call gates" per second on a 1.2 GHz CPU works out to about 444 cycles per "call gate". By subtracting the "about 56 cycles per iteration" loop overhead from above this gives us an actual figure closer to 388 cycles for the call gate alone.
3.8 million "sysenters" per second on a 1.2 GHz CPU works out to about 316 cycles per "sysenter". By subtracting the "about 56 cycles per iteration" loop overhead again this gives us an actual figure closer to 260 cycles for the sysenter alone. This is about 50% faster than the call gate method.
6.5 million "alternative sysenters" per second on a 1.2 GHz CPU works out to about 185 cycles per "alternative sysenter". By subtracting the "about 56 cycles per iteration" loop overhead again this gives us an actual figure closer to 129 cycles for the alternative sysenter alone. This is about 300% faster than the call gate method, and about 200% faster than the original "sysenter" method.
The only difference between the "sysenter" and "alternative sysenter" method is that the former loads a different SS:ESP while the latter doesn't. Because the alternative method is about 200% faster, this means that loading a different SS:ESP must halve the performance. Loading a different value into ESP is just a normal "mov" and should only cost about 2 cycles. Therefore loading a different value into the SS register must be costing about 126 cycles all by itself. Loading a different value into CS would cost about the same. Therefore, without these (CS and SS) segment loads (at 126 cycles each) the cost of the sysenter and sysexit instructions alone would be about 5 cycles.
This is undeniable proof that if RDOS didn't use segmentation system calls would be about as fast as a near call.
If a flat application does attempt to forge dodgy values for EIP or ESP it'd only cause a page fault due to the correct use of the supervisor/user flag in page table entries, and would be no worse than the same application doing "jmp somewhere_in_kernel" or "mov esp,somewhere_in_kernel". My main concern (if I understand RDOS enough) would be segmented applications using the SYSENTER interface to break their segments. For example, if you have several segmented applications in the same virtual address space, then one of them could use SYSENTER to modify its SS and then use its SS:ESP to read a different application's data.rdos wrote:There are several issues that must be solved before these results are usable. One issue is that since it is the application that sets up both EIP and ESP, it is possible for an application to forge addresses within kernel space.
This is undeniable proof that if RDOS didn't use segmentation it would be more secure.
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re: Best processor for 32-bit [rd]OS - a.k.a RDOS-OS is best
That's probably not correct. The overhead is not in the loop, but in the C procedure that saves registers, checks the stack and so on. I think it is reasonable to set loop overhead to 10 cycles and procedure overhead to 46 insteadBrendan wrote:20.0 million "near calls" per second on a 1.2 GHz CPU works out to about 60 cycles per "near call". I'd expect that a near call actually costs about 4 cycles, so this first test indicates that the loop overhead is probably about 56 cycles per iteration (probably because the compiler is crap - a decent compiler would have inlined the "do nothing" function, then decided that "sync_val" never changes because it's not volatile and generated a "jmp $" infinite loop).
That would be 434 using the corrected figureBrendan wrote:2.7 million "call gates" per second on a 1.2 GHz CPU works out to about 444 cycles per "call gate". By subtracting the "about 56 cycles per iteration" loop overhead from above this gives us an actual figure closer to 388 cycles for the call gate alone.
And that would be 306. Just above 40% faster.Brendan wrote:3.8 million "sysenters" per second on a 1.2 GHz CPU works out to about 316 cycles per "sysenter". By subtracting the "about 56 cycles per iteration" loop overhead again this gives us an actual figure closer to 260 cycles for the sysenter alone. This is about 50% faster than the call gate method.
Overhead would be 175 cycles, and that is 150% faster.Brendan wrote:6.5 million "alternative sysenters" per second on a 1.2 GHz CPU works out to about 185 cycles per "alternative sysenter". By subtracting the "about 56 cycles per iteration" loop overhead again this gives us an actual figure closer to 129 cycles for the alternative sysenter alone. This is about 300% faster than the call gate method, and about 200% faster than the original "sysenter" method.
I think loading CS is also a lot faster than loading SS (probably similar to loading general segment register), and SYSENTER/SYSEXIT doesn't use 5 cycles, but a lot more.Brendan wrote:The only difference between the "sysenter" and "alternative sysenter" method is that the former loads a different SS:ESP while the latter doesn't. Because the alternative method is about 200% faster, this means that loading a different SS:ESP must halve the performance. Loading a different value into ESP is just a normal "mov" and should only cost about 2 cycles. Therefore loading a different value into the SS register must be costing about 126 cycles all by itself. Loading a different value into CS would cost about the same. Therefore, without these (CS and SS) segment loads (at 126 cycles each) the cost of the sysenter and sysexit instructions alone would be about 5 cycles.
Yeah, and unreliable.Brendan wrote:This is undeniable proof that if RDOS didn't use segmentation system calls would be about as fast as a near call.
Not so since these are used to load/save stack state in application space in kernel. User/supervisor flags are useless when the operations take place in kernel.Brendan wrote:If a flat application does attempt to forge dodgy values for EIP or ESP it'd only cause a page fault due to the correct use of the supervisor/user flag in page table entries, and would be no worse than the same application doing "jmp somewhere_in_kernel" or "mov esp,somewhere_in_kernel".
The last version provides full protection. Besides, for segmented applications, the SYSENTER interface could not be used (application CS/SS are not flat with a zero base), and thus would default to call gates only.Brendan wrote:My main concern (if I understand RDOS enough) would be segmented applications using the SYSENTER interface to break their segments. For example, if you have several segmented applications in the same virtual address space, then one of them could use SYSENTER to modify its SS and then use its SS:ESP to read a different application's data.
There is one issue though, and it is that the CS and SS that is setup by SYSEXIT has an incorrect limit, which means that CS and SS could be used to address kernel. However, this is not a big issue as RDOS has supervisor only access to kernel pages. If you note the code carefully, you can see that I deliberately use DS (which is loaded with a limit that excludes kernel) when I address the user-supplied stack, so if the user forges ECX, the stack operations will fault in kernel. For the same reason I use CS override for data that are located in kernel.