OSDev.org

Posted: **Thu Mar 22, 2007 10:20 pm**

Hey, it should be noted that from now on spam is going to get more polite and sometimes harsh, to dramatically increase its success, and spambot is surely learning about the most successful posts.

Just look at these automatically generated topics:

Subject: Hi I am writer, please help
Subject: Forum protection from spam
Subject: hello, guys
Subject: I would like to ask a question
Subject: Always getting a 404 error

Subject: Problemm with http

Note: Look at the profiles of those users and you'll find a clear spamish style. And look at how posts are turning more natural and convincing.

Posted: **Fri Mar 23, 2007 2:40 am**

IMO there's one way to stop this: require users to have an account to view the forum. This is an absolute pain in the neck and not worth the effort, and probably wouldn't work.

The other thing that needs to be done is to stop the accounts with dud profiles from being created, which is also difficult because you'd have to define a spam account and the spambot would get smart anyway.

I think the only real way to get around this is to have a two-phase registration system:

Do the normal thing, verification email gets sent.
After verification, you must answer a multiple choice question with > 25 options?

These ideas have been presented before, but the multiple choice one is one of the few that will really catch spambots.

Posted: **Fri Mar 23, 2007 4:05 am**

One way to prevent automated spam is already being used which is to generate some letters and add noise.

However another way could be to generate a small three dimensional image with some simple shapes and walls through it. Then place textures on these objects, but before hand impose a letter or word onto each texture so that it is visible on the object in the scene.

Then have the user type the three words in the picture that are located on a vertical surface, not horizontal.

Posted: **Fri Mar 23, 2007 4:34 am**

The better bots break captchas with stunning ease (PHPbbs built-in has a success rate of ~50%), and that was a year ago and things are not improving... The only thing to keep them out is to do something they do not expect. (the list of which is also diminishing over time).

and WHY can't we just post this in one thread so that we do not get the same arguments over and over?

Posted: **Fri Mar 23, 2007 7:52 pm**

"Ninjas Invade Zimbabwe" - another example of SPAM!! Trouble is, it seems completely legit apart from 2 things.. First, no one here reports news. Second, the signature was a link to a website.

They're getting smarter, people...

Posted: **Fri Mar 23, 2007 9:07 pm**

We could always spam our own boards, so by the time they got here, there would be nothing for them to do, and therefore, we win.

Hmmm..Maybe there is something wrong with that...

Posted: **Fri Mar 23, 2007 9:42 pm**

We could attempt to use some type of intelligent detection of our own. What we do is set a flag on new users where we perform a calculation for their first or second post. Although a little involved this below presents more of a idea than a optimized algorithm.

We build the database by counting the number of times a certain word is used adjacent to another to distinguish between jabber and actual talk. Then we build the word relevance to other words used in the entire database. You could break down the granularity of the algorithm to cover: phpbb->board->thread->(posts). That granularity might help.

Code: Select all

struct tmessage{
	unsigned char *message;
	struct tmessage *next;
};

struct tword;
struct twordlink{
	unsigned int		count;
	struct twordcomplex	*word;
	struct twordlink	*next;
};

struct twordcomplex{
	unsigned char 		*word;
	struct twordlink 	*adjacent;
	struct twordlink	*whole;
	struct twordcomplex	*next;
};

struct tword{
	unsigned char		*word;
	struct tword		*next;
};

struct twordcomplex	*word_db = 0;

struct tword* wordStringTo(unsigned char *message);
struct twordcomplex* wordComplexGet(unsigned char *word);
void wordSetAdjacent(struct tword *word);
void wordSetWhole(struct tword *word);

/// build a database to enable to ability to score new member's first posts
int dbBuild(struct tmessage *message){
	struct tword *wordlist, *cw, *pw;
	struct twordcomplex *wlex;
	for(; message != 0; message = message->next){
		wordlist = wordStringTo(message->message);
		for(cw = wordlist, pw = 0; cw != 0; pw = cw, cw = cw->next){
			wlex = wordComplexGetOrCreate(cw->word);
			wordSetAdjacent(wlex, pw);
			wordSetAdjacent(wlex, cw->next);
		}
		for(cw = wordlist; cw != 0; cw = cw->next){
			wlex = wordComplexGet(cw->word);
			for(pw = wordlist; pw != 0; pw = pw->next){
				if(pw != cw){
					wordSetWhole(wlex, pw);
				}
			}
		}
	}
	return 0;
}

/// score a new member's first post or two.
int dbScore(unsigned char *message){
	unsigned int adjscore, relscore;
	struct tword *wordlist = wordStringTo(message), *cw, *pw;
	/// score words based on adjacent words
	for(cw = wordlist, adjscore = 0, pw = 0; cw != 0; pw = cw, cw = cw->next){
		wlex = wordComplexGet(cw->word);
		adjscore += wordGetAdjacentCount(wlex, pw);
		adjscore += wordGetAdjacentCount(wlex, cw->next);
	}
	/// score words based on relavence.
	for(cw = wordlist, relscore = 0; cw != 0; cw = cw->next){
		wlex = wordComplexGet(cw->word);
		for(pw = wordlist; pw != 0; pw = pw->next){
			if(pw != cw){
				relscore += wordGetWholeCount(wlex, pw);
			}
		}
	}
}

adjscore and relscore would be two seperate, but could be combined or such. I have no idea. =)

So the lower the score the more likely the post is spam for a members first or second post.

Posted: **Fri Mar 23, 2007 11:49 pm**

What about letting only members that have been here for a while (several useful posts) to post in "General Ramblings" AND allow them to post off-topic content? It just makes no sense that some person comes here, a specialized development forum, just to post "spam" right away.

As a plus, it could be that an user that gets its very first post as spam gets into believing that it has posted something, but in such case that user will be the only one able to see its own post appearing as it actually is seen by everybody.

Getting more dramatic, there could be also a low priority web crawler process (to avoid eating too many server resources) to find out whether a post of such type of user (spam at first post) is found massively on the Internet, and, if so, detete it automatically without human intervention.

For example, look at the following Google search looking for the text of the first line from the spam topic "Always getting a 404 error

":

Friend of mine tell me about some statistic information available at this page

You'll see that even the same exact user names are being used which opens a wider multitude of testing possibilities to prove it's spam. Based on that, it is possible to find out one of the weaknesses of such spambots: they always repeat the same posts massively to many forums. Certainly, captchas have stopped being a 100% effective protection. Now, monotony from spambots would be a 99% of a solution (the other 1% would be if the message is contained completely in an image of a text and random actual text to avoid banning the message for having only 1 image and no text).

Some other things should be added in case these workarounds get to be taken into account by the malicious programmer and kept secret so that the spambots don't know what kind of measures other than captchas are being applied.

Posted: **Sat Mar 31, 2007 6:14 pm**

For a forum like this, putting a pic of some (simple) x86 assembly and asking for the value in EAX afterwards should work fairly well.

On one hand, its fairly simple for somebody who belongs on this forum to do (in most cases... a PA-RISC OS developer might have issues with it

. On the other hand, it would require some work to solve by a computer for fairly dubious advantage...

Posted: **Sun Apr 01, 2007 4:01 am**

TheQuux wrote:For a forum like this, putting a pic of some (simple) x86 assembly and asking for the value in EAX afterwards should work fairly well.

Nice idea, but... Intel or AT&T syntax?

*duck*

Posted: **Sun Apr 01, 2007 1:12 pm**

TheQuux wrote:For a forum like this, putting a pic of some (simple) x86 assembly and asking for the value in EAX afterwards should work fairly well.

You could just use something along the lines of:

Code: Select all

MOV EAX, 8935
ADD EAX, 10014
EAX = ....

I doubt ATT weenies (or anybody with a sense for programming) would be unable to solve that.

I like the concept as it has the tendency to also keep out the 'I wanna build the next Windows, please teach me how to program' types

*runs*

OSDev.org

Spam is Increasing

Spam is Increasing

spam account creation prevention