OSDev.org

Posted: **Mon Jun 01, 2015 8:54 am**

bigbob wrote:I think that rules out the hash-table (I don't think a dictionary together with a hash-table would be a good idea).

Hmmn, yes, but if you implement the dictionary as a hash table... I'm think along these lines, myself: The hash entries would map a hash value to a word, as per usual, and collision resolution would use the method of walking through the hash values until it finds the match. The words, in turn, would hold a list of the actual implementations of the word - in most cases, there would only be one entry, but the entry at the list's head would be the most recent one. The FORGET word would remove that entry from the list, and set the previous one as the current entry.

The only real downside to this is that it changes the behavior of FORGET to only removing the specific word, not the whole chain of words following it. If that behavior is really important as a Forth idiom, this approach won't be usable.

Posted: **Mon Jun 01, 2015 9:33 am**

Schol-R-LEA wrote: Hmmn, yes, but if you implement the dictionary as a hash table...
The only real downside to this is that it changes the behavior of FORGET to only removing the specific word, not the whole chain of words following it. If that behavior is really important as a Forth idiom, this approach won't be usable.

According to ANSI Forth all the words till the given one have to be removed from the dictionary.
You can have your own version of Forth, remove just the given word and all the words after the given word that use the given word (directly or indirectly). It's possible but I am not sure whether it is worth it or not.
I didn't implement 100% of the standard in my OS. For example, I have no "Double cells" (i.e. double-length numbers). "Double cells" was a good idea in the 16-bit era (16-bit registers), in order to have 32-bit numbers.

EDIT: the original FORGET should also be kept (i.e. have FORGET2 or something), because removing all the words after a given one is very useful. In case of a "3D rotating cube" that consist of 20 words, I load CUBE from the disk, use it, then I remove it from the dictionary by issuing "FORGET 3DVECTOR" where 3DVECTOR is the first word of the "rotating" words. It would just slow down the search, if we left words in the dictionary we don't need.

Posted: **Mon Jun 01, 2015 12:50 pm**

OK, that explains that, then. If it's an important idiom of the language, changing it would not be desirable, so my suggestion is not a workable solution. It may be possible to make it work - say, by having timestamp metadata on each word - but it probably wouldn't be worth the trouble.

Posted: **Mon Jun 01, 2015 10:58 pm**

Hi,

bigbob wrote:
Brendan wrote: If you have a word GREET that prints "Hi" and another word GREETNAME that calls GREET; and then you add a new GREET that prints "Hello"; which GREET would GREETNAME use? Would it automatically start using the new GREET, or continue using the old GREET?
The old one will be used because its address was compiled into GREETNAME.
No to mention that that was the original intention of the user who added GREETNAME.

That's good - it means you can JIT compile properly, and can use a hash table to find a linked list for GREET where the first entry in the list is the newest entry (and the old GREET is the second entry that's already been compiled into GREETNAME).

bigbob wrote:According to ANSI Forth all the words till the given one have to be removed from the dictionary.

For each word you have some sort of structure describing the word. The hash table is just a fast way to find the right structure for a world from its name. Nothing prevents you from also having a linked list (in chronological order) that can be used for fast FORGET.

When someone does FORGET GREET you'd use the hash table to find the structure for the newest GREET. Then you'd use the "linked list in chonological order" to find (and delete) all words that are newer than GREET.

Basically, for each word you'd have a structure that might contain:

The word's full name (e.g. "GREET"), use for handling collisions in hash table lookups
A link to the next entry in the hash table's linked list
A link to the previous entry in the hash table's linked list (used for deleting entries in the hash table's list faster)
A link to the next entry in the "chronological order" linked list
The high level code for the word (which may be "none" to indicate its a built-in word that only has native code)
Some statistical stuff (e.g. how often the word is used so you can determine if it's worth JIT compiling, how many things it removes from the stack so you can detect if something will crash before executing it, whatever)
The native code for the word (either for built-in words or for JIT compiled code, where NULL indicates that it isnt built-in and hasn't been JIT compiled)

Cheers,

Brendan

Posted: **Mon Jun 01, 2015 11:05 pm**

Ninja'ed by Brendan. My post contains a worked example, so I'll still post it.

Schol-R-LEA wrote:OK, that explains that, then. If it's an important idiom of the language, changing it would not be desirable, so my suggestion is not a workable solution. It may be possible to make it work - say, by having timestamp metadata on each word - but it probably wouldn't be worth the trouble.

Well, what you could do is keep the linked list but just have the hash table serve as a shortcut into it.

Something like this:

We start by defining a word GREET:

Code: Select all

: GREET (Do something) ;

To enter it into the dictionary, we first hash it. For purposes of illustration, let's use a 26-place hash table where the hash of a word is the position of it's first letter in the alphabet (it wouldn't be a good choice of hash algorithm in an actual implementation, but I hope it will make it easier to understand what I'm doing). So GREET hashes to 7. A hash table entry consists of a pointer to a list of all words whose hash value points to that position in the table. So our table looks like this:

Code: Select all

...
6) NULL
7) WORDLIST7
8) NULL
...

At address "WORDLIST7", we have a structure of the following form:

Code: Select all

(Word name, Pointer to most recent word with this name, pointer to next entry in wordlist)

So memory looks something like this :

Code: Select all

DICTIONARY:
NULL (A pointer to the most recent word in the dictionary will go here)

HASHTABLE:
...
6) NULL
7) WORDLIST7
8) NULL
...

WORDLIST7:
("GREET", FIRSTGREET, NULL)

At the address "FIRSTGREET", we establish a dictionary entry structure of the following form.

Code: Select all

(Word name, pointer to previous word in dictionary, pointer to previous word with same name, other data needed by the forth implementation)

We then point DICTIONARY at FIRSTGREET, so memory now looks like this:

Code: Select all

DICTIONARY:
FIRSTGREET

HASHTABLE:
...
6) NULL
7) WORDLIST_G
8) NULL
...

WORDLIST_G:
("GREET", FIRSTGREET, NULL)

FIRSTGREET:
("GREET", NULL, NULL, Data)

We then define a word "FOO":

Code: Select all

: FOO (Do something) ;

We add a hashtable entry and wordlist for "F" (hash table entry 6), add "FOO" too the new wordlist, and add foo to the dictionary. Memory now looks like this:

Code: Select all

DICTIONARY:
FIRSTFOO

HASHTABLE:
...
6) WORDLIST_F
7) WORDLIST_G
8) NULL
...

WORDLIST_G:
("GREET", FIRSTGREET, NULL)

FIRSTGREET:
("GREET", NULL, NULL, Data)

WORDLIST_F:
("FOO", FIRSTFOO, NULL)

FIRSTFOO:
("FOO", FIRSTGREET, NULL, Data)

We now redefine GREET:

Code: Select all

: GREET (Do something else) ;

Memory now looks like this:

Code: Select all

DICTIONARY:
SECONDGREET

HASHTABLE:
...
6) WORDLIST_F
7) WORDLIST_G
8) NULL
...

WORDLIST_G:
("GREET", SECONDGREET, NULL)

FIRSTGREET:
("GREET", NULL, NULL, Data)

WORDLIST_F:
("FOO", FIRSTFOO, NULL)

FIRSTFOO:
("FOO", FIRSTGREET, NULL, Data)

SECONDGREET:
("GREET", FIRSTFOO, FIRSTGREET, Data)

To see what happens with a hash collision, let's define a word "FLOB", memory then looks like this:

Code: Select all

DICTIONARY:
FIRSTFLOB

HASHTABLE:
...
6) WORDLIST_F
7) WORDLIST_G
8) NULL
...

WORDLIST_G:
("GREET", SECONDGREET, NULL)

FIRSTGREET:
("GREET", NULL, NULL, Data)

WORDLIST_F:
("FOO", FIRSTFOO, WORDLIST_F_NEXT)

FIRSTFOO:
("FOO", FIRSTGREET, NULL, Data)

SECONDGREET:
("GREET", FIRSTFOO, FIRSTGREET, Data)

WORDLIST_F_NEXT
("FLOB", FIRSTFLOB, NULL)

FIRSTFLOB:
("FLOB", SECONDGREET, NULL, Data)

When looking up a word, we hash it and use the hash as an index into HASHTABLE. We follow the pointer in the hashtable entry and see if the word name there matches the word we're looking for. If so, we follow the first pointer in the wordlist entry to find the most recent definition of that word. If not, we follow the second pointer in the wordlist entry to find the next word with that hash. If the second pointer is NULL, then the word we're looking for does not exist.

When forgetting a word, we follow DICTIONARY to the most recent word in the dictionary and forget it. If the words name matches that of the word we are trying to forget, we stop after we've forgotten this word. If not, we follow the first pointer in the dictionary entry to the previous word in the dictionary, and continue on, forgetting words as we go, until we get one whose name matches the word we're trying to forget. This is just like regular Forth.

The difference is that, as part of the procedure for forgetting a word, we need to maintain the wordlists referenced by the hashtable. We do this by using the second pointer in the dictionary entry. If it is NULL, there are no previous definitions of the word, we just hash the word to get to the wordlist it's in, find the word in the wordlist, and remove it from the wordlist. If it isn't NULL, then there's a previous definition of the word. We leave it in the wordlist and use the second pointer in the dictionary entry to find the previous definition. We then fill in the pointer to the word in the wordlist entry with the pointer to the previous definition.

Posted: **Tue Jun 02, 2015 12:09 am**

You are thinking too difficult. You could use kind of a "relocation" at load time.

Each new word that appears in the loaded program is assigned unique identifier and all references to the word are replaced by the identifier. Each identifier has a stack containing the word "implementations".

Adding a word pushes to the stack, using a word just accesses the first element (O(1)), and forget just pops the stack.

So you have N stacks in an array, where N is the total number of individual words in your program. Smooth and fast.

Edit: if accessing happens more often than adding/forgetting, you could even keep the currently "active" word implementations in a simple array, so the access is just one memory lookup using the word identifier. Pushing and popping would update the "active word array" to match the top-of-stack implementation.

Posted: **Tue Jun 02, 2015 12:26 am**

Thanks guys (Brendan and linguofreak)!
What you say, sounds good

Hellbender: I will think it over

Posted: **Tue Jun 02, 2015 3:06 am**

Hellbender wrote:You are thinking too difficult. You could use kind of a "relocation" at load time.

Each new word that appears in the loaded program is assigned unique identifier and all references to the word are replaced by the identifier. Each identifier has a stack containing the word "implementations".

Adding a word pushes to the stack, using a word just accesses the first element (O(1)), and forget just pops the stack.

Recall, though, that forget in Forth doesn't just forget the most recent implement implementation of the word forgotten, but also forgets every implementation that has been defined more recently than the most recent implementation of the forgotten word, whatever word that implementation belongs to.

So if we have defined the following implementation in the following order:

Foo
bar
baz
bar
flub
foo
foo
baz

If we then decide to forget bar, we forget the most recent implementation of baz, two implementations of foo, and the most recent implementation of flub in addition to bar.

So forget is a bit more involved than a single pop on a single stack. If we're doing things the slow way and not using a hash table or unique identifier to do O(1) lookup, it involves multiple pops on one stack. If we are doing O(1) lookup it involves multiple pops to multiple stacks (and at least one of the stacks is probably implemented as a linked list).

Posted: **Tue Jun 02, 2015 3:28 am**

linguofreak wrote: forgets every implementation that has been defined more recently than the most recent implementation of the forgotten word, whatever word that implementation belongs to.

Ah, ok. So then you need a "master stack" that has all word ids in it. When you pop "foo", you keep popping the master stack, and the individual word stacks that you find in it (updating the "current words array"), until you hit "foo".
So Forget is O(N), where N is the number of words that need to be forgotten.

OSDev.org

Cooperative Multitasking

Re: Cooperative Multitasking

Re: Cooperative Multitasking

Re: Cooperative Multitasking

Re: Cooperative Multitasking

Re: Cooperative Multitasking

Re: Cooperative Multitasking

Re: Cooperative Multitasking

Re: Cooperative Multitasking

Re: Cooperative Multitasking