Code: Select all
Allegedly, this is bsd's syscall open()
open:
push dword mode
push dword flags
push dword path
mov eax, 5
push eax
int 80h
Code: Select all
Allegedly, this is bsd's syscall open()
open:
push dword mode
push dword flags
push dword path
mov eax, 5
push eax
int 80h
That's one way to do it, but it's not enforced by the CPU architecture. Linux uses those six registers plus EBP.ITchimp wrote:I was always under the impression that in x86, syscall parameters are passed using eax, ebx, ecx, edx, edi, esi,
I think you mean ESP3, and yes, it does use the ring 3 stack pointer to retrieve the parameters.ITchimp wrote:does bsd use esp0 to retrieve syscall parameters?
Why wouldn't the kernel be able to access the user stack?ITchimp wrote:isn't the user stack not accessible to kernel?
If the syscall arguments are packaged into a structure, you only have to validate and get_user for the structure itself, if at all.nullplan wrote:The problem with this approach is that it is memory based. And the kernel must be careful in accessing user memory, always be prepared that the pointer given might point to Nirvana. So in this case, the kernel must already call get_user() (or something like it) six times just to read the arguments. Another issue is that failure to read those arguments is not necessarily a mistake. The call could be well formed, and the stack is just very empty. You would actually have to figure out how many arguments a call has before reading them.
So that's why it is typically a better idea to have the arguments in registers.
Code: Select all
union syscall_args
{
/* Mapped to x86 registers ebx, ecx, edx, edi, esi, ebp */
uintptr_t regs[6];
struct {
int fd;
void * buffer;
size_t count;
} read_args;
/* Add other syscalls here */
...
}
Code: Select all
SYSCALL(read) ssize_t file_read(int fd, void * buf, size_t count)
{
...
}
Code: Select all
struct read_params_t {
int fd;
void *buf;
size_t count;
};
Yes, but then you have a different length for each syscall. So now you need to know the length of the structure before calling the syscall, or else get the structure in each syscall. You could conceivably have the length be an element of the structure, but then you have to determine if the length fits the requested call.thewrongchristian wrote:If the syscall arguments are packaged into a structure, you only have to validate and get_user for the structure itself, if at all.
I doubt the claim of atomicity, but yes, you can copy the whole thing.thewrongchristian wrote:The structure can be copied atomically from user to kernel space.
Not gonna lie, I am not a fan of the union with 1000 substructures. You know Linux is up to almost 500 syscalls by now, right? And sure, many of them are legacy, but that is still hundreds of things in there.thewrongchristian wrote: /* Add other syscalls here */
Meanwhile, how does Linux solve this? Userspace puts arguments into registers. All arguments are of type "long" (except on x32, where it is "long long", and you mustn't sign-extend pointers in the conversion). There are up to six arguments on all architectures, some support seven (I think it was only MIPS). Therefore, all syscalls are designed to require at most six arguments. If more are needed, some must be passed through memory.thewrongchristian wrote:By passing the arguments in registers, the code generator would have to be much more complex, and be able to, for example, marshal 64-bit parameters across multiple 32-bit registers. Not impossible, but more effort than I can be bothered with at this time.
Code: Select all
ssize_t read(int fd, void *buf, size_t len) { return syscall(SYS_read, fd, buf, len); }
Code: Select all
ssize_t read(int fd, void *buf, size_t len) {
long ret;
__asm__("syscall" : "=a"(ret) : "a"(SYS_read), "D"(fd), "S"(buf), "d"(len) : "memory","cc");
return __syscall_ret(ret);
}
long __syscall_ret(unsigned long x) {
if (x > -4096UL) {
errno = -x;
x = -1UL;
}
return x;
}
Code: Select all
typedef long syscall_t();
static syscall_t *const syscall_tbl[__NR_syscalls] =
[0...__NR_syscalls-1] = sys_ni_syscall,
...
[SYS_read] = sys_read,
...
};
Code: Select all
void handle_syscall(struct regs *regs) {
if (regs->rax < __NR_syscalls)
regs->rax = syscall_tbl[regs->rax](regs->rdi, regs->rsi, regs->rdx, regs->r10, regs->r8, regs->r9);
else
regs->rax = -ENOSYS;
}
Code: Select all
off_t lseek(int fd, off_t off, int whence) {
#ifdef SYS__llseek
int ret = syscall(SYS__llseek, fd, SC_LL_E(off), &off, whence);
if (ret) off = ret;
return off;
#else
return syscall(SYS_lseek, fd, off, whence);
#endif
}
Code: Select all
#define SC_LL_E(x) x >> 32, x
#define SC_LL_O(x) 0, SC_LL_E(x)
Code: Select all
long sys_ppc_llseek(int fd, unsigned long off_hi, unsigned long off_lo, off_t *off_ret, int whence)
{
return sys_llseek(fd, (0ULL+off_hi) << 32 | off_lo, off_ret, whence);
}
True, it's rare, but how much of that is because it's a PITA?nullplan wrote: The problem of passing a 64-bit number directly to a syscall occurs surprisingly rarely. Most of the time, you pass such numbers through memory again.
Code: Select all
status syscall(void * params, size_t sizeparams, void * result, size_t sizeresult);
Presumably, the author of an operating system knows the number of arguments that each syscall takes, which he can put into a table. You're making this out to be much harder than it needs to be.nullplan wrote:Yes, but then you have a different length for each syscall. So now you need to know the length of the structure before calling the syscall, or else get the structure in each syscall. You could conceivably have the length be an element of the structure, but then you have to determine if the length fits the requested call.thewrongchristian wrote:If the syscall arguments are packaged into a structure, you only have to validate and get_user for the structure itself, if at all.
Sure, it isn't a big problem. just one more thing you have to do. It's not that it is hard, it's that it adds one more step to the whole context switch rigamarole. And the register based mechanism can make do without that.Gigasoft wrote:Presumably, the author of an operating system knows the number of arguments that each syscall takes, which he can put into a table. You're making this out to be much harder than it needs to be.