Spam is Increasing

All off topic discussions go here. Everything from the funny thing your cat did to your favorite tv shows. Non-programming computer questions are ok too.
Post Reply
User avatar
~
Member
Member
Posts: 1226
Joined: Tue Mar 06, 2007 11:17 am
Libera.chat IRC: ArcheFire

Spam is Increasing

Post by ~ »

Hey, it should be noted that from now on spam is going to get more polite and sometimes harsh, to dramatically increase its success, and spambot is surely learning about the most successful posts.

Just look at these automatically generated topics:

Subject: Hi I am writer, please help
Subject: Forum protection from spam
Subject: hello, guys
Subject: I would like to ask a question
Subject: Always getting a 404 error :(
Subject: Problemm with http


Note: Look at the profiles of those users and you'll find a clear spamish style. And look at how posts are turning more natural and convincing.
pcmattman
Member
Member
Posts: 2566
Joined: Sun Jan 14, 2007 9:15 pm
Libera.chat IRC: miselin
Location: Sydney, Australia (I come from a land down under!)
Contact:

Post by pcmattman »

IMO there's one way to stop this: require users to have an account to view the forum. This is an absolute pain in the neck and not worth the effort, and probably wouldn't work.

The other thing that needs to be done is to stop the accounts with dud profiles from being created, which is also difficult because you'd have to define a spam account and the spambot would get smart anyway.

I think the only real way to get around this is to have a two-phase registration system:
  1. Do the normal thing, verification email gets sent.
  2. After verification, you must answer a multiple choice question with > 25 options?
These ideas have been presented before, but the multiple choice one is one of the few that will really catch spambots.
User avatar
Kevin McGuire
Member
Member
Posts: 843
Joined: Tue Nov 09, 2004 12:00 am
Location: United States
Contact:

spam account creation prevention

Post by Kevin McGuire »

One way to prevent automated spam is already being used which is to generate some letters and add noise.

However another way could be to generate a small three dimensional image with some simple shapes and walls through it. Then place textures on these objects, but before hand impose a letter or word onto each texture so that it is visible on the object in the scene.

Then have the user type the three words in the picture that are located on a vertical surface, not horizontal.
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Post by Combuster »

The better bots break captchas with stunning ease (PHPbbs built-in has a success rate of ~50%), and that was a year ago and things are not improving... The only thing to keep them out is to do something they do not expect. (the list of which is also diminishing over time).

and WHY can't we just post this in one thread so that we do not get the same arguments over and over?
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
pcmattman
Member
Member
Posts: 2566
Joined: Sun Jan 14, 2007 9:15 pm
Libera.chat IRC: miselin
Location: Sydney, Australia (I come from a land down under!)
Contact:

Post by pcmattman »

"Ninjas Invade Zimbabwe" - another example of SPAM!! Trouble is, it seems completely legit apart from 2 things.. First, no one here reports news. Second, the signature was a link to a website.

They're getting smarter, people...
User avatar
Alboin
Member
Member
Posts: 1466
Joined: Thu Jan 04, 2007 3:29 pm
Location: Noricum and Pannonia

Post by Alboin »

We could always spam our own boards, so by the time they got here, there would be nothing for them to do, and therefore, we win.

Hmmm..Maybe there is something wrong with that...
C8H10N4O2 | #446691 | Trust the nodes.
User avatar
Kevin McGuire
Member
Member
Posts: 843
Joined: Tue Nov 09, 2004 12:00 am
Location: United States
Contact:

Post by Kevin McGuire »

We could attempt to use some type of intelligent detection of our own. What we do is set a flag on new users where we perform a calculation for their first or second post. Although a little involved this below presents more of a idea than a optimized algorithm.

We build the database by counting the number of times a certain word is used adjacent to another to distinguish between jabber and actual talk. Then we build the word relevance to other words used in the entire database. You could break down the granularity of the algorithm to cover: phpbb->board->thread->(posts). That granularity might help.

Code: Select all

struct tmessage{
	unsigned char *message;
	struct tmessage *next;
};

struct tword;
struct twordlink{
	unsigned int		count;
	struct twordcomplex	*word;
	struct twordlink	*next;
};

struct twordcomplex{
	unsigned char 		*word;
	struct twordlink 	*adjacent;
	struct twordlink	*whole;
	struct twordcomplex	*next;
};

struct tword{
	unsigned char		*word;
	struct tword		*next;
};

struct twordcomplex	*word_db = 0;

struct tword* wordStringTo(unsigned char *message);
struct twordcomplex* wordComplexGet(unsigned char *word);
void wordSetAdjacent(struct tword *word);
void wordSetWhole(struct tword *word);

/// build a database to enable to ability to score new member's first posts
int dbBuild(struct tmessage *message){
	struct tword *wordlist, *cw, *pw;
	struct twordcomplex *wlex;
	for(; message != 0; message = message->next){
		wordlist = wordStringTo(message->message);
		for(cw = wordlist, pw = 0; cw != 0; pw = cw, cw = cw->next){
			wlex = wordComplexGetOrCreate(cw->word);
			wordSetAdjacent(wlex, pw);
			wordSetAdjacent(wlex, cw->next);
		}
		for(cw = wordlist; cw != 0; cw = cw->next){
			wlex = wordComplexGet(cw->word);
			for(pw = wordlist; pw != 0; pw = pw->next){
				if(pw != cw){
					wordSetWhole(wlex, pw);
				}
			}
		}
	}
	return 0;
}

/// score a new member's first post or two.
int dbScore(unsigned char *message){
	unsigned int adjscore, relscore;
	struct tword *wordlist = wordStringTo(message), *cw, *pw;
	/// score words based on adjacent words
	for(cw = wordlist, adjscore = 0, pw = 0; cw != 0; pw = cw, cw = cw->next){
		wlex = wordComplexGet(cw->word);
		adjscore += wordGetAdjacentCount(wlex, pw);
		adjscore += wordGetAdjacentCount(wlex, cw->next);
	}
	/// score words based on relavence.
	for(cw = wordlist, relscore = 0; cw != 0; cw = cw->next){
		wlex = wordComplexGet(cw->word);
		for(pw = wordlist; pw != 0; pw = pw->next){
			if(pw != cw){
				relscore += wordGetWholeCount(wlex, pw);
			}
		}
	}
}
adjscore and relscore would be two seperate, but could be combined or such. I have no idea. =)

So the lower the score the more likely the post is spam for a members first or second post.
User avatar
~
Member
Member
Posts: 1226
Joined: Tue Mar 06, 2007 11:17 am
Libera.chat IRC: ArcheFire

Post by ~ »

What about letting only members that have been here for a while (several useful posts) to post in "General Ramblings" AND allow them to post off-topic content? It just makes no sense that some person comes here, a specialized development forum, just to post "spam" right away.

As a plus, it could be that an user that gets its very first post as spam gets into believing that it has posted something, but in such case that user will be the only one able to see its own post appearing as it actually is seen by everybody.

Getting more dramatic, there could be also a low priority web crawler process (to avoid eating too many server resources) to find out whether a post of such type of user (spam at first post) is found massively on the Internet, and, if so, detete it automatically without human intervention.

For example, look at the following Google search looking for the text of the first line from the spam topic "Always getting a 404 error :(":

Friend of mine tell me about some statistic information available at this page

You'll see that even the same exact user names are being used which opens a wider multitude of testing possibilities to prove it's spam. Based on that, it is possible to find out one of the weaknesses of such spambots: they always repeat the same posts massively to many forums. Certainly, captchas have stopped being a 100% effective protection. Now, monotony from spambots would be a 99% of a solution (the other 1% would be if the message is contained completely in an image of a text and random actual text to avoid banning the message for having only 1 image and no text).

Some other things should be added in case these workarounds get to be taken into account by the malicious programmer and kept secret so that the spambots don't know what kind of measures other than captchas are being applied.
TheQuux
Member
Member
Posts: 73
Joined: Sun Oct 22, 2006 6:49 pm

Post by TheQuux »

For a forum like this, putting a pic of some (simple) x86 assembly and asking for the value in EAX afterwards should work fairly well.

On one hand, its fairly simple for somebody who belongs on this forum to do (in most cases... a PA-RISC OS developer might have issues with it :-). On the other hand, it would require some work to solve by a computer for fairly dubious advantage...
My project: Xenon
User avatar
Solar
Member
Member
Posts: 7615
Joined: Thu Nov 16, 2006 12:01 pm
Location: Germany
Contact:

Post by Solar »

TheQuux wrote:For a forum like this, putting a pic of some (simple) x86 assembly and asking for the value in EAX afterwards should work fairly well.
Nice idea, but... Intel or AT&T syntax?

:lol:

*duck*
Every good solution is obvious once you've found it.
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Post by Combuster »

TheQuux wrote:For a forum like this, putting a pic of some (simple) x86 assembly and asking for the value in EAX afterwards should work fairly well.
You could just use something along the lines of:

Code: Select all

MOV EAX, 8935
ADD EAX, 10014
EAX = ....
I doubt ATT weenies (or anybody with a sense for programming) would be unable to solve that.

I like the concept as it has the tendency to also keep out the 'I wanna build the next Windows, please teach me how to program' types :D
*runs*
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
Post Reply