Logging Database Engine

01000101 · Post by **01000101** » Wed Nov 26, 2008 9:02 pm

I've been giving a lot of thought towards a good logging routine, but they all seem to require an unreasonable amount of resources. I want to design a system where I can write logs to the hard drive with a time-stamp and be able to find that same info (searching by the time-stamp) at a later date. I'm quite stumped over this and have never worked with database development before. How do I go about this? Any algorithms that are somewhat simple (as I don't need an incredibly complex design) that I should be looking into?

I already have the HDD write/read support and the logging structures and fillers in place, I just don't know how to organize everything.

tantrikwizard · Post by **tantrikwizard** » Wed Nov 26, 2008 9:50 pm

You probably dont need a database for logging, a simple text file should work just fine. If you already have methods to write files then just write out the data as text.

01000101 · Post by **01000101** » Wed Nov 26, 2008 10:55 pm

Yes, but how do I sort through previously written logs without having to load a ton of logs into memory and just brute forcing through them until I find the one I'm looking for?

quok · Post by **quok** » Thu Nov 27, 2008 12:08 am

You could always use sqlite. It's public domain code. Approx 25K lines of C, ANSI compliant and all transactions are ACID. Each database ends up being a single file in your filesystem. Stack and heap requirements are pretty low, and it's damn fast. You can minimize all of that by leaving out optional features as well. It should pretty well suit your needs.

01000101 · Post by **01000101** » Thu Nov 27, 2008 12:21 am

25k?!? I'm not sure about the amount of pure code, but the C file has ~100k lines and an associated header file. Also, I''m really just looking for an algorithm or explanation, not someone else's code (even if it is public domain).

quok · Post by **quok** » Thu Nov 27, 2008 12:36 am

01000101 wrote:25k?!? I'm not sure about the amount of pure code, but the C file has ~100k lines and an associated header file. Also, I''m really just looking for an algorithm or explanation, not someone else's code (even if it is public domain).

Meh, I pulled that number out of my @$$. Er, memory, actually. It use to be the case that sqlite was about 25K lines of code, and that was advertised on the sqlite website. But it's not there anymore, and apparently no longer holds true. Sorry about that, I should've verified it myself first.

kmtdk · Post by **kmtdk** » Thu Nov 27, 2008 1:27 am

well
as one solution, save each db in a file, and then name the file the date. ( this will take up many file entry)
another solution would be to make one file, with all the db's inside. here is an example:

Code: Select all

pointer to next data node, data

it depends on the size of the data ..

KMT dk

01000101 · Post by **01000101** » Thu Nov 27, 2008 1:34 am

An individual log will only be ~64 bytes of encoded data. I won't really be using files unless I have to, I planned on just allocating a large span of HDD space, and writing the raw data in sequence.

eg:

Code: Select all

--- 64byte log --- 64byte log --- etc...

and was hoping there would be a memory/size efficient indexing algorithm for me to find one of those packed logs without having to search them all.

tantrikwizard · Post by **tantrikwizard** » Thu Nov 27, 2008 2:39 am

If I understand your requirements correctly, I would just write the data to a comma seperated text file (csv) or XML text file then use some existing desktop application to inspect them. Excel and Access are well suited to import these formats natively so you can use those tools to sort, filter and search. There is also a csv ODBC driver to use SQL against the logs. Another tool I've used in the past (cant remember the name off the top of my head but can look it up if you need) it comes from Microsoft's IIS SDK that allows you to run SQL queries directly against any number of different formatted log files so you can keep them in their original format and run conversions/comparisons select, sorts, etc. on individual fields regardless of log file format and the engine will work. Come to think of it, that was a handy tool, I need to find it for something I've been working on....

Combuster · Post by **Combuster** » Thu Nov 27, 2008 2:44 am

Databases work faster by generating indexes (AVL trees, that sort.), and when they add entries they do a tree insert as well for each of those indexes. that means that for looking up all entries with certain value, you need log(n)+m time - log(n) to find the first and then traversing for the next m values.

If all you want to do is look things up by timestamp, they you can binary search on the harddisk itself since it is already stored in order (and hence the timestamps are ascending).

DeletedAccount · Post by **DeletedAccount** » Thu Nov 27, 2008 2:52 am

Hi ,
You should check out the famous book Advanced Unix Programming . In the last chapter , they implement a tiny database like utility .Its based on the btree -- implementing a btree should not be such a big issue as you proceeded so far with your os development

Regards
Shrek

pcmattman · Post by **pcmattman** » Thu Nov 27, 2008 4:33 am

Or you could put a filesystem on the device, and have a file for each timestamp. Probably not worth the effort though, and might end up being too slow or using up too many resources anyway.

jal · Post by **jal** » Thu Nov 27, 2008 6:01 am

Combuster wrote:If all you want to do is look things up by timestamp, they you can binary search on the harddisk itself since it is already stored in order (and hence the timestamps are ascending).

That's what I would propose, if you wouldn't have beaten me to it :). Could be slightly optimized depending on need, e.g. a meta-b-tree for dates, or the like.

JAL

thepowersgang · Post by **thepowersgang** » Thu Nov 27, 2008 6:45 am

Either use a text based log file, with the timestamp and then the text, ending in a newline or use a linked list with an offset pointer that doubles as the string length.
E.g.

Code: Select all

struct fileLL {
   time_t timestamp;
   int strlen;
   char string[];
}

JamesM · Post by **JamesM** » Thu Nov 27, 2008 11:43 am

thepowersgang wrote:Either use a text based log file, with the timestamp and then the text, ending in a newline or use a linked list with an offset pointer that doubles as the string length.
E.g.
Code: Select all
struct fileLL {
   time_t timestamp;
   int strlen;
   char string[];
}

A linked list is a horrible idea! It has O(n) AND theta(n) time complexity for searching, which is the OP's primary aim. I'd agree with combuster in that using a raw-ish filesystem and a binary search may be the best idea. The thing to think about would be "what happens when the disk/partition gets full?" The contents are binary-searchable because of the implicit ordering imposed on them by virtue of being written in the order they should be searched in (time occurred).

When the disk is full however, what happens then? Erase the contents and start over? These things need to be considered too.

James

OSDev.org

Logging Database Engine

Logging Database Engine

Re: Logging Database Engine

Re: Logging Database Engine

Re: Logging Database Engine

Re: Logging Database Engine

Re: Logging Database Engine

Re: Logging Database Engine

Re: Logging Database Engine

Re: Logging Database Engine

Re: Logging Database Engine

Re: Logging Database Engine

Re: Logging Database Engine

Re: Logging Database Engine

Re: Logging Database Engine

Re: Logging Database Engine