Spam Black List Mod

Started by Kakao, August 28, 2006, 07:17:03 PM

Previous topic - Next topic

Kakao

The purpose of the Spam Black List Mod it to prevent users from posting topics or personal messages containing known spam posted domains.

It works by checking for matches from the text of a posted topic or personal message against a list of regular expressions of known posted spam domains.

Try to post a spam domain in this test board. No need to register.

Quoteversion 0.0.1 2006-08-28

version 0.0.2 2006-08-29
- removed the black list online download.
- added the prerequisite of a cron job setup.

version 0.0.3 2006-08-30
- reverted to online download this time using curl

version 0.1.0 2006-08-01
- added option in Admin -> Features and Options -> Basic Features to enter the http proxy address

version 0.1.1 2006-08-04
- fixed the regular expression too large error

Download link

Rate It

karlbenson

nice idea.

I havent used it myself.  Do you have an idea on the additional running time that this script adds to the posting of a reply/topic/pm?

Quote
It uses a black list from Mediawiki.com which is copied to a local file each 30 minutes.

It might be a bit overkill.  Once a day would probably be more than enough if not once a week.

karlbenson

ok I tried it on rc3 in topics/posts (using admin account)
and it didnt work because there is no black list in my area?

Do you have to turn it on?
How do get it retrieve the black list?
It has downloaded it to the attachments/ folder as per the .xml install file.

Kakao

Quote from: karlbenson on August 28, 2006, 10:12:21 PM
ok I tried it on rc3 in topics/posts (using admin account)
and it didnt work because there is no black list in my area?

What did not work? Did you try to post a black listed url? When you do it it will show an error message and will not post your message.

Quote from: karlbenson on August 28, 2006, 10:12:21 PMDo you have to turn it on?
How do get it retrieve the black list?
It has downloaded it to the attachments/ folder as per the .xml install file.

You don't have to turn it on. It will retrieve the black list the first time you try to post anything.

Kakao

Quote from: karlbenson on August 28, 2006, 10:02:48 PM
nice idea.

I havent used it myself.  Do you have an idea on the additional running time that this script adds to the posting of a reply/topic/pm?

Quote
It uses a black list from Mediawiki.com which is copied to a local file each 30 minutes.

It might be a bit overkill.  Once a day would probably be more than enough if not once a week.

I didn't notice any difference in running time with the mod installed.

A future version will expose a user setting for the time interval to refresh the list.

karlbenson

I posted a complete page of blacklisted urls.

Under both an admin and normal user account.

Like I said, I believe its not working because there is no black list in any folder on my server.
Its not downloading it.

Kakao

Quote from: karlbenson on August 28, 2006, 10:12:21 PM
It has downloaded it to the attachments/ folder as per the .xml install file.
Quote from: karlbenson on August 28, 2006, 10:36:07 PM
Like I said, I believe its not working because there is no black list in any folder on my server.
Its not downloading it.

From the first post I understood there was a list in the attachments folder, where it should be.

Does the apache user has write permissions in the attachments folder? Are there user uploaded files in that folder? What is the PHP version? Does the error log show something?

Look for the allow_url_fopen option in php.ini. Is it on or off?

karlbenson

sorry it was mising the word not. Thats what I get writing at 2am.

loads of errors



testing   Today at 07:33:44 PM 
86.132.211.62     1d77431ba645ab2c0b0c31bfefa413d9 
http://www.youposted.com/rc3/index.php?action=post2;start=0;board=1 
2: array_filter(): The first argument should be an array
File: /home/content/y/o/u/youposted/html/rc3/Sources/Subs.php
Line: 3727

   testing   Today at 07:33:44 PM 
86.132.211.62     1d77431ba645ab2c0b0c31bfefa413d9 
http://www.youposted.com/rc3/index.php?action=post2;start=0;board=1 
2: array_map(): Argument #2 should be an array
File: /home/content/y/o/u/youposted/html/rc3/Sources/Subs.php
Line: 3727

   testing   Today at 07:33:44 PM 
86.132.211.62     1d77431ba645ab2c0b0c31bfefa413d9 
http://www.youposted.com/rc3/index.php?action=post2;start=0;board=1 
2: file(/home/content/y/o/u/youposted/html/rc3/attachments/wikipedia_black_list.txt): failed to open stream: No such file or directory
File: /home/content/y/o/u/youposted/html/rc3/Sources/Subs.php
Line: 3721

   testing   Today at 07:33:44 PM 
86.132.211.62     1d77431ba645ab2c0b0c31bfefa413d9 
http://www.youposted.com/rc3/index.php?action=post2;start=0;board=1 
2: copy(http://meta.wikimedia.org/w/index.php?title=Spam_blacklist&action=raw&sb_ver=1): failed to open stream: HTTP request failed! HTTP/1.0 403 Forbidden

File: /home/content/y/o/u/youposted/html/rc3/Sources/Subs.php
Line: 3715



Kakao

2: copy(http://meta.wikimedia.org/w/index.php?title=Spam_blacklist&action=raw&sb_ver=1): failed to open stream: HTTP request failed! HTTP/1.0 403 Forbidden

By your home directory path I suppose you are in a shared host and you probably can't see the php.ini file. Save the next script in the smf root with the name phpinfo.php:

Code (php) Select
<?php
echo phpinfo();
?>



Call it at http://www.youposted.com/rc3/phpinfo.php and look for the "allow_url_fopen" option. On or Off? Save this page locally for later use. After using delete phpinfo.php

If allow_url_fopen is off then depending on your host php configuration you can change it saving a php.ini file in the smf root:

Code (php) Select

allow_url_fopen = On


All the other options will still be there. You will just overwrite allow_url_open.

After that does the above error still show?


karlbenson

Ok, I've checked, allow_url_open is set to ON by default.

I (like most avg smf users) am on Shared Hosting (Godaddy.com)

Kakao

#10
I'm out of ideas why it is not working in your environment.

I have built a new version (0.0.2) without the online download. The user will have to setup a cron job to download the list. While less convenient i think it is more robust.

karlbenson

if it were me

- i'd use file_get_contents (probably 90% of hosts supporting php will allow it)
- I'd save each banned domain in a new table in the database
- show each banned domain to admins in the control panel

Kakao

Quote from: karlbenson on August 29, 2006, 08:35:37 PM
- i'd use file_get_contents (probably 90% of hosts supporting php will allow it)

file_get_contents() uses the same http wrapper as copy(). Why do you think it would work? I'm just starting to analyze curl.

Quote from: karlbenson on August 29, 2006, 08:35:37 PM
- I'd save each banned domain in a new table in the database
- show each banned domain to admins in the control panel

I have thought about storing the list in a table, but what do i get with it besides increasing complexity?

The feature i think is the most important after I get the basic functionality worked out is to have the ability to setup a custom black list. For simplicity it would be edited in a normal topic, only editable by the admin and moderators.

karlbenson

Quote from: Kakao on August 30, 2006, 07:50:44 AM
file_get_contents() uses the same http wrapper as copy(). Why do you think it would work? I'm just starting to analyze curl.
Because I already use file_get_contents on that same host and various other shared hosts

Kakao

Quote from: Kakao on August 29, 2006, 02:31:18 PM
I'm out of ideas why it is not working in your environment.

By this Godaddy support page I suspect you are behind a proxy at 64.202.165.130:3128

If that is the case then I will have to create an editable option to set the proxy address and port if I want the online download to be functional in all environments. That is almost as anoying as seting a cron job as the user will have to ask the support for the proxy settings.

Quote from: karlbenson on August 30, 2006, 11:52:42 AM
Quote from: Kakao on August 30, 2006, 07:50:44 AM
file_get_contents() uses the same http wrapper as copy(). Why do you think it would work? I'm just starting to analyze curl.
Because I already use file_get_contents() on that same host and various other shared hosts

Do you use file_get_contents with an http url as in file_get_contents('http://domain.tld/file.txt') ?

karlbenson

yes,

a standard website url.
eg $sFile = file_get_contents("http://www.php.net")
as per http://uk2.php.net/file_get_contents

file is identical to file_get_contents except whilst file_get_contents returns the whole page as a string, file returns each line in an array.
Using file you may be able to write only the domains without the comments in the file for faster searching

Kakao

#16
Quote from: karlbenson on August 30, 2006, 05:41:10 PM
a standard website url.
eg $sFile = file_get_contents("http://www.php.net")

Ok. Sorry to be picky, but just to be sure, did you use it with domains other than the localhost domain? I mean, in the youposted.com site you file_get_contents from http://someotherdomain.com?

I just tested a new version 0.0.3 using curl.

version 0.0.3

If you care to test it could you post the error log if it does not work?

karlbenson

yes, my domain is youposted.com
and I used get_file_contents to get External pages, eg google.com

I'll test that new mod right now. I just need to set up a CLEAN rc3 install to try it on

karlbenson

I've also used fopen to get and save an external file such as an image or a txt file

// GET AND SAVE THE ICON
if ($icon == 1) {
$handle = @fopen($filename, "rb");
if (!$handle) {
$icon = 0;
}
$file = fopen('icons/'.$addy.'.'.$ext, "wb");
if (!$file) {
$icon = 0;
}
if ($icon == 1) {
while (!feof($handle)) {
$contents .= fread($handle, 4096);
}
fwrite($file, $contents);
fclose($file);
fclose($handle);
}
}
unset($icon,$out);

karlbenson

ok, i've just installed a clean rc3 install

as before
Godaddy Shared Hosting

No errors whatsoever
However while it created wikipedia_black_list.txt and one with added bit 'last_check', they were 0byte txt files with nothing in them.

When i did paste a url in them myself it did work (prevent the link being posted).

Advertisement: