I'm trying to figure out what type of data structure to use in order to keep track of nodes in /dev, /proc, and /sys. So far, I have considered hash tables, tries, and radix tries. Radix tries definitely seem the most memory efficient. However, for both tries and radix tries, I'm not sure how to keep track of each node's child nodes. It seems I can either used a linked list or an 63-element array (A-Z, a-z, _, and 0-9). Which is more memory efficient and lookup efficient? Are there any other data structures I should consider? How have other people implemented this?
Noah Singer
Data Structure for Equivalent of ramfs
Re: Data Structure for Equivalent of ramfs
I wouldn't worry too much performance in the first version, just make something that works. I can personally tell you that even the simplest implementation will be a couple of magnitudes faster than a filesystem that has to access the harddisk. My kernel RAM filesystem directories contain a simple array of pairs (pointer-to-sub-file-or-directory, name) and the ramfs_directory_open function simply scans through it and does its thing. A path like /foo/bar/qux/spam.txt is simply resolved recursively and the pointer to the file itself is remembered, so the following accesses to the open file is very fast and looking it up is a one-time cost.
My OS is able to build itself from within a large kernel RAM filesystem - and its speed rivals that of a single-threaded Linux build because it doesn't have to access the harddisk. If I put the root filesystem on a harddisk, it gets an order of magnitude slower. In other words, don't worry too much about this - worry about structuring your code in a maintainable way that you can optimize later.
My OS is able to build itself from within a large kernel RAM filesystem - and its speed rivals that of a single-threaded Linux build because it doesn't have to access the harddisk. If I put the root filesystem on a harddisk, it gets an order of magnitude slower. In other words, don't worry too much about this - worry about structuring your code in a maintainable way that you can optimize later.