Logging Database Engine

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
User avatar
01000101
Member
Member
Posts: 1599
Joined: Fri Jun 22, 2007 12:47 pm
Contact:

Re: Logging Database Engine

Post by 01000101 »

I'm very unfamiliar with all of this, so I may need some clarification on how Combuster's idea could be implemented. As JamesM said, I am looking for the quickest and smallest method of achieving basic logging as I want to create as little overhead as possible when displaying logs back to the administrator.

Is the suggestion pro index->data, or just lay all the data out and search through it all searching for a timestamp?

I just took a look at my log structure, and it looks like I will be using 32-byte logs instead of 64, so it fits 16 logs/sector on the HDD. Each log includes a timestamp in the first 4 bytes of the log, so they will always be on a 32-byte boundary and the log-size will never fluctuate.
User avatar
JamesM
Member
Member
Posts: 2935
Joined: Tue Jul 10, 2007 5:27 am
Location: York, United Kingdom
Contact:

Re: Logging Database Engine

Post by JamesM »

01000101 wrote:I'm very unfamiliar with all of this, so I may need some clarification on how Combuster's idea could be implemented. As JamesM said, I am looking for the quickest and smallest method of achieving basic logging as I want to create as little overhead as possible when displaying logs back to the administrator.

Is the suggestion pro index->data, or just lay all the data out and search through it all searching for a timestamp?

I just took a look at my log structure, and it looks like I will be using 32-byte logs instead of 64, so it fits 16 logs/sector on the HDD. Each log includes a timestamp in the first 4 bytes of the log, so they will always be on a 32-byte boundary and the log-size will never fluctuate.
Yes, my (and Combuster's, if I interpreted correctly) solution is to lay the data out on disk and perform a binary search over it (wiki it. It has O(log2(n)) time complexity which is the theoretical minimum for searching). The problem occurs when the disk is full. What do you do?

You have to decide how to handle that :)
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Re: Logging Database Engine

Post by Combuster »

Rotate the logs.

Talking about rotation: ring buffer, anyone? Since it's a linear translation you can still binary search on it.

Edit: that and you get the record on having the largest ring buffer ever :D
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
User avatar
01000101
Member
Member
Posts: 1599
Joined: Fri Jun 22, 2007 12:47 pm
Contact:

Re: Logging Database Engine

Post by 01000101 »

Combuster wrote:Rotate the logs.

Talking about rotation: ring buffer, anyone? Since it's a linear translation you can still binary search on it.
After programming a few network cards, I'm very familiar with ring buffers. I really like that idea to. It would also allow for the fewest amount of logs to be destroyed on a full buffer (as it will only overwrite the oldest set).

Couldn't I just search from the beginning of the allocated HDD log space and keep going until I find the log (by date) or until I circle around and return -1?
User avatar
JamesM
Member
Member
Posts: 2935
Joined: Tue Jul 10, 2007 5:27 am
Location: York, United Kingdom
Contact:

Re: Logging Database Engine

Post by JamesM »

Couldn't I just search from the beginning of the allocated HDD log space and keep going until I find the log (by date) or until I circle around and return -1?
That would require linear time, which is slow. "Linear time" means that the time taken to find an entry is proportional to the number of entries. With a binary search, the time taken to find an entry is proportional to the log to base 2 of the number of entries, which is much better. You should definately use that over linear search!

James
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Re: Logging Database Engine

Post by Combuster »

01000101 wrote:Couldn't I just search from the beginning of the allocated HDD log space and keep going until I find the log (by date) or until I circle around and return -1?
If you want to read one million sectors rather than only 20, go ahead :wink:

Edit: JamesM beat me to the point.
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
User avatar
01000101
Member
Member
Posts: 1599
Joined: Fri Jun 22, 2007 12:47 pm
Contact:

Re: Logging Database Engine

Post by 01000101 »

I seriously doubt I will only be loading 20 sectors for a normal search.

Think about it this way:

Pseudo(C)-code:

Code: Select all

int32 start_sector = 0;
int32 hi = num_logs_per_sector; // probably 32
int32 lo = 0;  // start log per sector
while(1)
{
    load_sector(start_sector++);
    int32 mid = (lo + hi) / 2;
    if(log[mid].date > somedate){hi = mid;}
    else if(log[mid].date < somedate){lo = mid+1;}
    else{return mid;}  // we found it
}
unless I load a bunch of sectors at once (still defeating your point), I don't see how I can go throughout sectors effectively without just scaling 1by1 and binary searching each sector.
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Re: Logging Database Engine

Post by Combuster »

Oh well.
Binary search takes at most half the amount of seeking distance since each step eliminates half the remainder.
Total settling time on a cylinder is reduced to insignificance
Total rotational delays are reduced to insignificance (for each track read -> log(n) tracks read)
Total transfer time reduced to insignificance
Lower store-and-forward delays due to shorter data reads.

It may however be faster to read out a whole track once you know which one.
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
User avatar
samoz
Member
Member
Posts: 59
Joined: Sun Jun 01, 2008 1:16 pm

Re: Logging Database Engine

Post by samoz »

I'm currently in a data structures and feel I might be able to give some hints.

One way you could go is simply keep them sorted and then using binary searches. This may or may not be fast enough though.

Possibly look into using hashes to store and search for them?

Or if you want to fool with them, B-trees are a good choice as well. There are ways you can optimize them so they maximize efficiency whenever a disk read must occur.

Not sure if that's what you had in mine, but those are three approaches you might want to look into further.
Hexciting: An open source hex editor for the command line.
https://sourceforge.net/projects/hexciting/
DeletedAccount
Member
Member
Posts: 566
Joined: Tue Jun 20, 2006 9:17 am

Re: Logging Database Engine

Post by DeletedAccount »

Just refreshing my memory :) , correct me if I am wrong .

One of the properties of binary search tree is that the left node value < parent node , right node value > parent node . So if the following input sequence is given

4 9 10 15 34

the tree will become , (4)-> (9) -> ( 10) -> (15 ) -> ( 34)->nill , i have shown only the right nodes , not the left nodes as they are nill . This is similar to a linked list where when there are n elements , the search time is rougly factor of n , worst case time for the search of a binary search tree . The search function is usually written recursively as follows

Code: Select all

int search(TREE *t ,int elment)
{
  // not found  
  if( *t == NULL )
   {
     return  INT_MIN; 
   }
  // found!
   if( (*t) -> value == elment)
  {
     return 1;
  }
  else
  {
     //if less proceed recusively towards left node 
     if( (*t)-> value  < elment)
       {
          return  search((*t)->lchild,elment);
       }
     else
      {
          // towards the right node
          return  search((*t)->rchild,elment);
       }  
   }
}

So balacing the tree is required so that the search time remains a factor of log n . If you see the function above , it develops like a geometric progression and complexity can be easily derived by using the series summation. There are two kinds of balanced trees which i am aware of they are AVL trees and Red Black trees .

The AVL tree node has an extra field in the node called the balance factor , it is recursively defined as the height of the right subtree - the height of the left subtree. In an AVL tree the balance factor of each node should be either 1 , 0 or -1 . To perform the balancing act we need to perform rotations. There are basically two kinds of rotations . i left rotation and right rotation . The easiest way to understand rotations is as follows.
left rotation - swap right child and parent node , make sure that properties of binary search tree is maintained
right rotation - swap left chid and parent node , make sure the properties of binary search tree is maintained .While insertion is performed balance factor is checked and if its 1,0,-1 then do nothing . If more than 1 ie 2 , then right subtree is more than the left subtree , then left rotation needs to be performed and balance factor needs to be calculated again and recusilvely peform the balancing and so on ... its symmetric

Red Black trees are similar to AVL trees except that they they maintain additions value called color which can be either RED or BLACK . The Red black trees show the following properties :

1) Root node is black
2) Leaves are black
3) every path from a node to leaves have equal number of black nodes
4) if a node is red , the its children are black .

These properties make sure that the tree remains balanced . I forgot the insertion and deletion algorithms . need to refer them again :oops: .... My memory failed .

B-Trees are just fine for creating tiny datbase sort of utilites , they are an extension to the the 2-3 tree concept.They were specifically created with disk access in mind . ie A B- tree of order M has ,atmost M pointers (children) and at least M /2 pointers , m-1 data elements if non leaf node and the values are arranged in a similar way as in binary search tree . Additon , Deletion etc are similar excepet that we need to be careful in maintaining the tree structure .

This link might be useful :http://www.codeproject.com/KB/database/btreedbclass.aspx

Regards
Shrek
Post Reply