Sectioned my project into two configurable var-s: small data, huge data
small data - for testing the algorithm.
huge data - stressing over large data to see any anomaly appears.
either way i fed the 0-100 (small data) and varying number of data sets (1000, 2000....14000, 15000) into tree.
With large data with thousands of members latter, if inserted into incrementing numbers, it will be worst case, as tree is of course horribly, completely skewed to one side, it was just performing like list. and once approaching the 15000 it tooks about quite a few seconds to insert 15000 units.
Now to make most efficient, recursively call insert bottom by finding median and calling insert bottom. Doing so maximized the insertion efficiency (in another words, minimized the insert time).
But man, I was surprised by how much different it makes between these two insertion methods. Dedicated one variable: insertIterationCounter which increments each time the function recursively walks through tree nodes for single data insertion into a node.
When data set size for example 15000 is fed incrementtedly 1 by 1, then the last and highest value of 15000 took 15000 iteration to insert. Picking the median and working simultenously incrementing and decrementing will improve only by factor of 2. The tree will have only two huge branches of 7500 data.
But for median approach, it only took 13 iteration for highest iteration walk through when 15000 nodes were filled. WIth this large set, and only max iteration of 13, I thought there is something wrong with my algorithm with treePrint function and output to log file and indeed tree had 15000 nodes inserted.
So I counted the number of data that are inserted to binary tree through 13 iteration before being inserted, that was 2000! Which means base of the binary tree become 2000 wide!! Then 13 iterations made sense.
When performed insertion took less than a time it takes to blink an eye.
Efficiency compared:
Incrementing data sets: max iteration: through 15000 nodes.
Recursive insertion with median: max iteration: through 13 nodes.
When grouping the data-s inserted by number of iterations, here is the following statistics I got, looks like there might still be some problem with algorithm or skewage toward the end, but seems to be working fairly good.
Code: Select all
[root@localhost cs.dev]# cat bTree-big-oh-artificial-bisect-verbose.log | grep "00014000" -A 15000 | grep ": 13" | wc -l
2048
[root@localhost cs.dev]# cat bTree-big-oh-artificial-bisect-verbose.log | grep "00014000" -A 15000 | grep ": 12" | wc -l
4096
[root@localhost cs.dev]# cat bTree-big-oh-artificial-bisect-verbose.log | grep "00014000" -A 15000 | grep ": 11" | wc -l
4096
[root@localhost cs.dev]# cat bTree-big-oh-artificial-bisect-verbose.log | grep "00014000" -A 15000 | grep ": 10" | wc -l
1024
[root@localhost cs.dev]# cat bTree-big-oh-artificial-bisect-verbose.log | grep "00014000" -A 15000 | grep ": 9" | wc -l
512
[root@localhost cs.dev]# cat bTree-big-oh-artificial-bisect-verbose.log | grep "00014000" -A 15000 | grep ": 8" | wc -l
256
[root@localhost cs.dev]# cat bTree-big-oh-artificial-bisect-verbose.log | grep "00014000" -A 15000 | grep ": 7" | wc -l
128
[root@localhost cs.dev]# cat bTree-big-oh-artificial-bisect-verbose.log | grep "00014000" -A 15000 | grep ": 6" | wc -l
64
[root@localhost cs.dev]# cat bTree-big-oh-artificial-bisect-verbose.log | grep "00014000" -A 15000 | grep ": 5" | wc -l
32
[root@localhost cs.dev]# cat bTree-big-oh-artificial-bisect-verbose.log | grep "00014000" -A 15000 | grep ": 4" | wc -l
16
[root@localhost cs.dev]# cat bTree-big-oh-artificial-bisect-verbose.log | grep "00014000" -A 15000 | grep ": 3" | wc -l
8
[root@localhost cs.dev]# cat bTree-big-oh-artificial-bisect-verbose.log | grep "00014000" -A 15000 | grep ": 2" | wc -l
4
[root@localhost cs.dev]# cat bTree-big-oh-artificial-bisect-verbose.log | grep "00014000" -A 15000 | grep ": 1" | wc -l
11266
[root@localhost cs.dev]#