File System Design.
Posted: Fri Oct 16, 2015 7:16 pm
I've been offline and not updating GIT for the past week or so, instead I have been designing and testing a new File System.
My understanding of a File System may be a little flawed, as the only one I implemented thus far has been FAT, so bare with me just a little bit.
The design I have come up with take some points from FAT, EXT, and ELF - along with some personal assumptions.
The goal of this file system is quickly being able to open a file without searching through a directory tree (although you still could) - meaning that after initializing cache you only have one cluster to read before the actual file data. The way this works is through a hash table which hashes the full file path and associates said hash with the file block.
Before I go any further here are the data structures:And the Hash function:Currently I have found no conflicts with my hash method (I hashed the full path to all 1.5 million files on my internal HDDs with 0 collisions. [which took ~700 ms - took longer to search for collisions...])
And am currently about 20-30% done with the utility for accessing the FS in *nix. (Able to create an image and read existing files - though cannot add new files...)
The hash table, string table, and root directory are stored as files on the disk - so they can be dynamically allocated (couldn't decide on a fixed size and dynamic seems better IMO)
The current design allows for a disk anywhere from 4KB to 32TB with 1 to 64 Sectors per cluster respectfully - though it would be easy to increase this to 16PiB but it wastes a-lot of space (128MB at 32TB). (note at 4KB you have no free space on disk as it is all allocated for the FS)
So to access a file you should:
1) Hash the full path
2) Search the hash table
3) Read file block
4) read file clusters...
To add a file you should:
1) Hash the full path of the sub directory
2) Search hash table
(if not found ERROR or search DIR tree)
3) Create new file block
4) Write file to disk (while setting Cluster Number in file block / slave block(s))
5) Add string (file name) to string table
6) Read sub dir file block
7) Read sub dir clusters (usually only need the last cluster)
Find blank entry <- Smiley not supposed to be there...
9) Insert new entry.
A) Hash the full path of the new file
B) Insert Hash Table Entry
The root directory is stored in the Hash Table as "/", all other directories are stored as "/dir" (no ending '/')
So, to the main point of this post...
Since my knowledge of File Systems is scarce, I would really appreciate feedback on this design.
Am I missing something important?
Do I have something I shouldn't?
etc...?
NOTE: I got side-tracked a few times writing this... So it may be missing some [important] info.
My understanding of a File System may be a little flawed, as the only one I implemented thus far has been FAT, so bare with me just a little bit.
The design I have come up with take some points from FAT, EXT, and ELF - along with some personal assumptions.
The goal of this file system is quickly being able to open a file without searching through a directory tree (although you still could) - meaning that after initializing cache you only have one cluster to read before the actual file data. The way this works is through a hash table which hashes the full file path and associates said hash with the file block.
Before I go any further here are the data structures:
Code: Select all
typedef struct BFS_InfoBlock {
uint8_t JumpCode[3];
uint8_t SectorsPerCluster;
char BFS_Signiture[4]; // "BFS1"
uint16_t BytesPerSector;
uint32_t TotalClusters;
uint32_t HashFileBlock;
uint32_t RootFileBlock;
uint32_t StringTableFileBlock;
uint32_t BATCluster;
uint16_t ClustersPerBAT;
char VolumeLabel[32];
uint8_t BootCode[446];
uint16_t BootSigniture;
} __attribute__((packed)) BFSib_t, *BFSib_p;
typedef uint32_t* BAT_p; // Binary Allocation Table
typedef struct BFS_FileBlock {
uint32_t Parrent;
uint32_t ClusterChain[116];
uint32_t Flags;
uint16_t UID;
uint16_t GID;
uint32_t TimeStampCreated;
uint32_t TimeStampModified;
uint32_t HashValue[3];
uint64_t SizeInBytes;
uint32_t StringBlockOffset;
uint32_t SlaveChainList; // FFFFFFFF = appended in this cluster, 0 = none
} __attribute__((packed)) BFSfb_t, *BFSfb_p;
typedef struct BFS_SlaveBlock {
uint32_t ClusterChain[127];
uint32_t SlaveChainList; // FFFFFFFF = appended in this cluster, 0 = none
} __attribute__((packed)) BFSsb_t, *BFSsb_p;
typedef struct BFS_DirectoryEntry {
uint32_t FileBlockCluster;
uint32_t StringBlockOffset;
uint32_t Flags;
uint16_t UID;
uint16_t GID;
} __attribute__((packed)) BFSde_t, *BFSde_p;
typedef struct BFS_DirectoryTable {
BFSde_t DirectoryEntry[32];
} __attribute__((packed)) BFSdt_t, *BFSdt_p;
typedef struct BFS_HashEntry {
uint32_t Hash[3];
uint32_t FileBlockCluster;
} __attribute__((packed)) BFShe_t, *BFShe_p;
typedef struct BFS_HashTable {
BFShe_t HashEntry[32];
} __attribute__((packed)) BFSht_t, *BFSht_p;
typedef char* BFSst_p; // String Table
Code: Select all
uint8_t temp;
hash[0] = 2147481863;
hash[1] = 1147440961;
hash[2] = 4180001983;
while((temp = (uint8_t)*input++)) {
hash[0] += temp;
hash[1] -= temp;
hash[2] /= temp;
hash[0] ^= hash[1];
hash[1] ^= hash[2];
hash[2] *= hash[0];
}
And am currently about 20-30% done with the utility for accessing the FS in *nix. (Able to create an image and read existing files - though cannot add new files...)
The hash table, string table, and root directory are stored as files on the disk - so they can be dynamically allocated (couldn't decide on a fixed size and dynamic seems better IMO)
The current design allows for a disk anywhere from 4KB to 32TB with 1 to 64 Sectors per cluster respectfully - though it would be easy to increase this to 16PiB but it wastes a-lot of space (128MB at 32TB). (note at 4KB you have no free space on disk as it is all allocated for the FS)
So to access a file you should:
1) Hash the full path
2) Search the hash table
3) Read file block
4) read file clusters...
To add a file you should:
1) Hash the full path of the sub directory
2) Search hash table
(if not found ERROR or search DIR tree)
3) Create new file block
4) Write file to disk (while setting Cluster Number in file block / slave block(s))
5) Add string (file name) to string table
6) Read sub dir file block
7) Read sub dir clusters (usually only need the last cluster)
Find blank entry <- Smiley not supposed to be there...
9) Insert new entry.
A) Hash the full path of the new file
B) Insert Hash Table Entry
The root directory is stored in the Hash Table as "/", all other directories are stored as "/dir" (no ending '/')
So, to the main point of this post...
Since my knowledge of File Systems is scarce, I would really appreciate feedback on this design.
Am I missing something important?
Do I have something I shouldn't?
etc...?
NOTE: I got side-tracked a few times writing this... So it may be missing some [important] info.