Page 1 of 1

Data Structure for Module Storage

Posted: Sun Aug 25, 2013 3:15 pm
by singerng
I'm identifying modules like this:

<type> <class> <subclass> <bus> <vendorID> <deviceID>

The reason for this format is that when a bus driver, such as PCI or USB, finds a new device, it needs to look up this device driver in my OS's "module registry" based on it's vendor ID and device ID.

I need to find an efficient way to store the loaded modules in memory. When a new volume is connected to the VFS, the VFS has to search through the list of FS drivers that have been loaded. The same thing is done when a file is executed, the kernel has to figure out what executable format to use. I need to be able to find lists of modules that have a specific type, class, subclass, bus, vendorID, or deviceID. What do you think is the fastest data structure I can use for this purpose? What is the most memory efficient? It seems very slow to have a list of modules and search through them. It also seems memory inefficient to have dynamic lists or linked lists preallocated for every level of the hierarchy (as in having arrays of lists of pointers to all modules of each type, class, subclass, etc.)

Re: Data Structure for Module Storage

Posted: Sun Aug 25, 2013 3:19 pm
by dozniak
I heard variations of trees produce close to optimal O(log n) search times.
Hash map, radix tree, Patricia trie, rbtree... you name it.

Re: Data Structure for Module Storage

Posted: Sun Aug 25, 2013 10:47 pm
by Nessphoro
Skip list.

Re: Data Structure for Module Storage

Posted: Sun Aug 25, 2013 10:52 pm
by bluemoon
For some specific usage modules that require frequent scanning, I would suggest extract/rebuild a sub-tree or sub-list, which only include interested modules, from the master tree upon changes.

Since changes is not often, it's a way to optimize by usage pattern.

Re: Data Structure for Module Storage

Posted: Mon Aug 26, 2013 9:24 am
by h0bby1
the theory with that kind of things is to reduce the number of elements to parse for each search, and to store with a tree structure, to organize the data structure in which you store the infos in 'nodes' with a system that can allow to find quickly in which node you need to search to find the particular structure

the quickest and least memory savy way would be to make up a table of 65556 vendors id and product_id, and just get the modules stored at the index of the table for its vendor id and product id, same for class/sub class id but it leave lot of unused memory space

otherwise you can make up a tree structure like vendor ids->product _id-> drivers, like this you can alread limit to search into a list of vendor id, and then into a list of product id and you can eliminate lot of the search, and each vendor id is only stored once, same for class id and sub class id, they also imply a hierachy and you can easily limit the number of search not to have to parse all module all the time

in 3D engine they often use an octree to limit the number of polygon to search and process , it is a general purpose concept that can apply to many things, even color reduction, the idea is that you initiallize a box into which all the dimensions of each elements will fit, and then create sub boxes that are half the size as child of the box until you reach a maximum number of element / box, it can allow a flexible strucure to store some elements of a large array in a way that some elements can still be found quickly given the values that are used to create the tree

like a general algorythm could be to create a 2D quad tree with minimum and maximum value for each dimension set as the maximum value you can have for vendor id/product id, and then creating leafing node until each node contain a maximum of X elements, and then you can find easily the node in which the drivers are stored withing a list of maximum X elements, given the vendor id and productid, same can be done for class id in 3D dimension with class/sub class/revision, or even if more infos are to be stored for the driver, but as the stucture vendor id/prouct id, class id/sub class id already imply a hierarchy it's not the most efficient