Page 1 of 3
Idea to remove spam and eliminate the need for registration.
Posted: Sun Feb 04, 2007 7:47 pm
by Android Mouse
All that would be needed is to change the html names for the form inputs. For example, the default textarea input name when editing is 'wpTextbox1', change this to anything else and bots wouldn't be able to edit.
I personally don't think requiring registration is in the spirit of a wiki. Some people may find a small error but don't want to go through the hassle of registering and as a result move on and leave the error.
Adding a captcha to those not logged in would also work too. But I think just changing the input names would be easier for everyone.
An idea to consider perhaps.
Posted: Sun Feb 04, 2007 11:25 pm
by Alboin
I agree. I think something like Wikipedia has, where all I have to do is create a user name and password, and BAM! I'm in. I think having to register for the forum is tiresome for just wanting to edit the wiki. Although, I have never taken care of anything like this before, so I may not be considering in certain factors.
Posted: Mon Feb 05, 2007 4:12 am
by Combuster
Chase once said (I cant find the thread otherwise i'd post a link) that he already changed the form names for phpBB and still get spam. I doubt mediawiki bots would not contain such intelligence somewhere.
To fix an misunderstanding here: Captcha's in their current form are usually broken.
In total, each defensive measure will filter only a percentage of bots, and ATM the current setup seems to be 100% accurate. In short: "Don't change the winning team". (From my own webmaster experience, even the smallest amount of spam is annoying)
Besides, registering at phpbb isnt such a big difference compared to registering at mediawiki.
But since i'm not the admin, I'll have to wait for Justice Chase to speak out the final verdict
Posted: Mon Feb 05, 2007 7:25 pm
by Android Mouse
To fix an misunderstanding here: Captcha's in their current form are usually broken.
True, but my point was anything non-default as a requirment for editing will totally through the bots.
For example, on the editing page it would be easy to add another required input field and require users to enter in the url of the current page they are on, ex:
"Copy + paste the url in your address bar here: [input box]".
Bots wouldn't be able to do this, unless they were custom written for this specific site, which they won't and is unlikely anyways. The added input field would only be needed for those not logged in of course.
Posted: Wed Feb 07, 2007 8:40 am
by spix
Bots wouldn't be able to do this, unless they were custom written for this specific site, which they won't and is unlikely anyways. The added input field would only be needed for those not logged in of course.
What makes you so sure? Have you studied these bots? or are you just making assumptions?
I wouldn't automatically assume people who write spam bots are stupid.
Posted: Wed Feb 07, 2007 1:12 pm
by SpooK
... or you could integrate the Wiki into the forum
Posted: Wed Feb 07, 2007 8:47 pm
by Android Mouse
What makes you so sure? Have you studied these bots? or are you just making assumptions?
I wouldn't automatically assume people who write spam bots are stupid.
No, but I doubt bots today will be passing the turning test anytime soon.
So it is unlikely they will be capable of comprehending what the added text input field is for, much less be able to fill it out correctly.
Posted: Wed Feb 07, 2007 8:50 pm
by Brynet-Inc
Android Mouse wrote:No, but I doubt bots today will be passing the turning test anytime soon.
So it is unlikely they will be capable of comprehending what the added text input field is for, much less be able to fill it out correctly.
We are borg!! We will adapt!!
Hehe, Sorry..
Anyway, Self-modifying code is possible.. But for these bots to adapt quickly to random modification's without human interaction does not seem possible yet..
I myself don't really think these people are actually writing bots for osdev alone....
But who knows.. They might be attempting to conqure the world with Viagra advertisements
Posted: Wed Feb 07, 2007 9:26 pm
by spix
So it is unlikely they will be capable of comprehending what the added text input field is for, much less be able to fill it out correctly.
You are assuming that spam bots use the "wpTextbox1" to identify what textarea to put their spam into.
A bot could deduce which textarea to put spam into the same way a human deduces which textarea to put content into.
More likely, it puts spam into every textarea it finds, some work some dont. Then the bot would work with any website with dynamic content and not just media wiki.
Posted: Wed Feb 07, 2007 9:42 pm
by Alboin
Apparently you have to be an expert in AI to create a good spambot. Hmm.. I can picture it now, spam bots written in Lisp.
(Although, I have read of people using Lisp as a server side language...)
Posted: Thu Feb 08, 2007 5:35 am
by ucosty
spix wrote:So it is unlikely they will be capable of comprehending what the added text input field is for, much less be able to fill it out correctly.
You are assuming that spam bots use the "wpTextbox1" to identify what textarea to put their spam into.
A bot could deduce which textarea to put spam into the same way a human deduces which textarea to put content into.
More likely, it puts spam into every textarea it finds, some work some dont. Then the bot would work with any website with dynamic content and not just media wiki.
What about a text box that if it contains text, you know it's a bot. Labeled something like "Don't put text in me, or else".
edit: Scratch that, I have no idea what I was thinking. I had a long day at work.
Posted: Thu Feb 08, 2007 8:09 am
by bubach
I have seen many diffrent types, like having a checked checkbox with the text "I'm a dirty spmabot, please ignore this post" and also one where all the form names is random each time, with a hidden field that explains how to rename them back to the real name.
spam filtering idea
Posted: Fri Mar 30, 2007 7:18 pm
by Kevin McGuire
What about twenty pictures each of a major scene, then have the user type what the scene contains. Like have a scene with planets orbiting in the solar system. Then certain words associated with the picture like:
Code: Select all
earth
planet
planets
world
universe
Better, have each word be able to have a little incorrect spelling like a certain percentage wrong in relation to the word size. At the same time have someone just check over the log of words that failed to hopefully find ones that make sense but we forgot to add.
If the user can not get it right let them send a special registration internet message which gets posted in the forum which only users who have made so many posts could make a post with the word "accepted" in and have the registration pass just for a double fault for people who might have a hard time at riddles? We should have a nice user base that is almost constantly online to make a quick reply instead of just the administrator and moderators.
-Sub Forum: Registration
-The Threads Are Registrations
Once a registration is accepted the thread is automatically removed so we do not end up with a flooded sub forum.
Even better have the failed words appear in the register sub forum thread for that registration attempt based on the internet mail address. So the users here who have over a certain number of posts can help add valid words for each picture.
Posted: Fri Mar 30, 2007 9:48 pm
by ~
Does it mean that they would be able to figure out what are the fields for even if they randomly change anytime with names like kiarkt, tialit, and there are like 100 invalid fields?
For making it harder, they could be positioned using CSS in such a way that only valid fields are seen by the user. The interface could also be generated using Javascript, so those programs will need to be able to interpret both Javascript and CSS correctly with rather complex and "self-modifying" algorithms to calculate the layout and naming of fields (keeping track of valid ones using a session cookie, so it would also need to interpret and keep cookies). If invalid fields are filled, then the sent content would be rejected.
----------------------------------------
Another thing that I have said has been to look for anomalous web addresses in the profiles, anomalous or unrelated words, and check for the posts of such users. If they are found to be fake, they could be deleted automatically (user and posts) without human intervention, but that would require a webcrawler through Google and Yahoo search to find for such messages, that would be around 70% of the time. They could be prevented from posting in General Ramblings for say before 15 useful, well-crafted and coherent posts or otherwise be fooled in thinking that they were successful sending garbage.
Of course all that would require a considerable amount of programming hours...
Posted: Sat Mar 31, 2007 5:43 am
by Candy
Most of the "check if human" checks proposed are things computers can do, if only because they use a certain determination humans aren't that good at either. Which of these three is a planet? The one with one object on a black background, of course. Which is earth? Check the colors. Statistics could solve that.
You need to do something humans are good at, something that's kind of subjective. The best theoretical test I've seen for the purpose is the "kitty"-test, in which you get 9 images, 3 of which are kittens and the other six are something else which looks a bit like one, but are clearly no kittens (dogs, birds, mangled images of something that looks like a cat). Computers have no definition for "kitten" especially since it isn't a clear-cut decision.
A moment of thought lead me to determine other things, depending on the audience. This audience isn't that good a match, but for instance, for a car audience you could take cars with their badges removed and ask the brand of the car.
Most importantly, however, is that none of these measures may be widely adopted (!). If something is widely adopted, it becomes humanly feasible to write a bot for it in order to make profit. On the other hand, you can make something computationally infeasible, by determining something a human is really really fast at and a computer is really really bad at. That's not OCR or such.