The Most Common Subject Words In This Forum

Questions, comments, and suggestions about this site should go here.
Post Reply
User avatar
Kevin McGuire
Member
Member
Posts: 843
Joined: Tue Nov 09, 2004 12:00 am
Location: United States
Contact:

The Most Common Subject Words In This Forum

Post by Kevin McGuire »

The Most Common Subject Words In This Forum

The entire development forum is used to extract over 8900 thread titles. Each titles has the words broken out by spaces. Each word only allows alphanumerical characters and all uppercase letters are converted into lower case. While this happens each word is counted from zero to one. So each of the counts to the right of the word are really +1.

1. to * 718
2. in * 649
3. os * 616
4. and * 614
5. a * 562
6. kernel * 491
7. the * 449
8. with * 442
9. problem * 398
10. help * 391
11. c * 367
12. how * 364
13. memory * 318
14. mode * 301
15. i * 281
16. for * 267
17. question * 236
18. of * 235
19. on * 213
20. bochs * 195
21. my * 191
22. floppy * 188
23. paging * 184
24. driver * 181
25. pmode * 177
26. about * 176
27. is * 175
28. code * 168
29. system * 166
30. grub * 164
31. from * 160
32. what * 159
33. problems * 152
34. an * 141
35. file * 141
36. keyboard * 133
37. not * 130
38. need * 127
39. gcc * 124
40. new * 120
41. do * 118
42. interrupt * 116
43. can * 116
44. questions * 115
45. idt * 113
46. boot * 113
47. error * 112
48. stack * 107
49. multitasking * 107
User avatar
AndrewAPrice
Member
Member
Posts: 2300
Joined: Mon Jun 05, 2006 11:00 pm
Location: USA (and Australia)

Post by AndrewAPrice »

Cool!
My OS is Perception.
nick8325
Member
Member
Posts: 200
Joined: Wed Oct 18, 2006 5:49 am

Post by nick8325 »

I like that "kernel" is more common than "the" :)
User avatar
Alboin
Member
Member
Posts: 1466
Joined: Thu Jan 04, 2007 3:29 pm
Location: Noricum and Pannonia

Post by Alboin »

nick8325 wrote:I like that "kernel" is more common than "the" :)
Well, at least we know we have our linguistic priorities straight.
C8H10N4O2 | #446691 | Trust the nodes.
User avatar
Kevin McGuire
Member
Member
Posts: 843
Joined: Tue Nov 09, 2004 12:00 am
Location: United States
Contact:

Post by Kevin McGuire »

You guys have any ideas what we could do with extracting data from the forums? I got board and did it, but I figure there could be a useful idea in it somewhere..
User avatar
chase
Site Admin
Posts: 710
Joined: Wed Oct 20, 2004 10:46 pm
Libera.chat IRC: chase_osdev
Location: Texas
Discord: chase/matt.heimer
Contact:

Post by chase »

Filter with a list of the most common english words and get a list of the most frequent of OS development subjects. Could be used to figure out where wiki articles should be expanded or created.
User avatar
Kevin McGuire
Member
Member
Posts: 843
Joined: Tue Nov 09, 2004 12:00 am
Location: United States
Contact:

forumdown

Post by Kevin McGuire »

I will give it a try. It actually seems a little more complicated then what you would think with the initial thought, but I have confidence that it is possible.

I got a initial tool written. A program forumdown which will download a entire sub forum and store the linked list structures of threads and posts into a local data file that can be loaded.

I did a little thinking. I came up with the conclusion that I can use a website that provides a dictionary, thesaurus, and encyclopedia to allow some degree of spell checking and mapping of similar words such as IDT and Interrupt Descriptor Table and allow some sort of primitive comprehension of sentences to get an idea of exactly what people are talking about in the posts.

I will try to use this site to provide the English word database, and add some cache to prevent it from taking a excess amount of time.
http://www.reference.com/browse/

http://kmcguire.jouleos.galekus.com/dok ... orum_tools

Lets see if I can get the other part working.
User avatar
Kevin McGuire
Member
Member
Posts: 843
Joined: Tue Nov 09, 2004 12:00 am
Location: United States
Contact:

Post by Kevin McGuire »

A sprocket, two gears, and some strange gooey gel came out my head. I think I was thinking too hard. This might be more than I asked for. I got to get this kernel finished. :P
Post Reply