Hi,
tsdnz wrote:Righty, have decided to use 1 x 256 bucket holding a pointer to a linked-list.
256 * sizeof(QWORD) = 524,288 Bytes
Each IPv4Connection is 42 bytes.
Each pointer (first line) points to a linked-list, each linked-list contains 1024 entries.
So 1024 * 42 = 43,008 bytes per linked-list.
43,008 * 256 = 11,010,048 bytes for the initial setup.
As I am working on IPv4, the TCP Source address bytes will be added together.
What are your thoughts?
Singly linked lists means an extra 8-byte pointer for each of the 12 million entries, and (for 12 million entries) that will cost an extra 96 MB of RAM just for the "next" pointers. It also means you're no longer doing sequential reads to iterate through the list and you're doing "pointer chasing" instead. This adds up to a pre-fetching nightmare - an average of 50 thousand entries per list means an average of 25 thousand ("un-pre-fetchable") potential cache misses per search.
I'd be tempted to use the "least significant n bits" as a hash; with an array of IP addresses and an array of structures for each hash. I'd want to ensure that the arrays don't have many entries (you'll see why soon); which means that for 12 million entries I'd want the number of hashes to be around 2 million. This works out to using the least significant 21 bits of an IP address as the hash (possibly XORed with the remaining/highest 11 bits for fun).
To keep the size of the main table small I'd only have one pointer per entry; which would point to a structure containing details for that hash (pointer to array of structures, number of entries in both arrays) and would be immediately followed by the array of IP addresses.
The main table would cost "sizeof(void *) * 2 million entries" or 16 MiB. On average, for 12 million IP addresses, you'd have 6 entries per array. That'd be a 12 byte header (8 byte pointer and 4 byte "number of entries" count) and then 4 bytes per IP address; or 36 bytes per list on average (not including the array of unknown structures). This also means that all lists that have 13 or fewer IP addresses fit in a single cache line with the header; and because you're expecting 6 entries per list on average it's unlikely you'll have many lists that use 2 or more cache lines.
Note: I'm assuming that the header is aligned on a (64-byte) cache line boundary here - if you're using a generic "malloc()" thing then it'd probably be at least 50% more cache lines touched.
2 million lists at 36 bytes per list is 72 MiB, plus the main table's 16 MiB works out to 88 MiB for everything used for searching (for 12 million entries); not including the array of unknown structures. You're also only touching 2 cache lines for almost all searches.
To make this a bit easier to understand:
Code: Select all
%define HASH_BITS 21
typedef struct {
/* ??? */
} UNKNOWN_DATA;
typedef struct {
uint32_t listSize;
UNKNOWN_DATA *unknownDataArray;
} ENTRY_HEADER;
ENTRY_HEADER *mainTable[1 << HASH_BITS)];
UNKNOWN_DATA *findDataForIPaddress(uint32_t IPaddress) {
uint32_t hash;
ENTRY_HEADER *header;
uint32_t *IParray;
unsigned int i;
hash = (IPaddress ^ (IPaddress >> HASH_BITS)) & ((1 << HASH_BITS) - 1);
header = mainTable[hash]; // First cache line touched
IParray = (void *)header + sizeof(ENTRY_HEADER);
// All memory accesses past here likely to be
// contained in same/second cache line
for(i = 0; i < header->listSize; i++) {
if(IParray[i] == IPaddress) {
return &(header->unknownDataArray[i]);
}
}
return NULL;
}
Cheers,
Brendan