I'm seeing a dramatic change in the number of guest visitors over these past 4-5 weeks. I consistently see guest numbers that are four to five times larger than what has historically been normal. All hours of day and night high amount of guests. Ive made no changes to robots text so I'm not certain whether it's bots crawling my site or what. How can I tell the difference between an actual guest user being real or some sort of bot? How can I prevent them if they are not real? Thanks!
Unless they cause you issues (like increase in bandwidth costs or performance issues) it is generally best to try and ignore things like this. The internet is full of bots and crawlers, most of which will not cause issues to you, and trying to identify them all is a neverending task.
So far so good on cost, but I have had host gator shut me down in the past when I had been flooded with guest which I believe ended up being spider bots. I did see where I could make changes to .htaccess, but really uncertain what I should change.
I would move to a better host if they did that to me.
Look at https://clients.hostit.host/index.php
He's Understands SMF lol, And I've been with him for years without issue.
Quote from: mickjav on September 16, 2023, 09:12:26 AMI would move to a better host if they did that to me.
Look at https://clients.hostit.host/index.php
He's Understands SMF lol, And I've been with him for years without issue.
I'll check that out
Create a robots.txt file and a no_agents.php file.
Inside the robots.txt file put this
#User-agent: *
#Disallow: /
#Disallow: no_agents.php
Then inside the no_agents.php file put this
<?php
//nothing
?>
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
</head>
<body>
<h2>Private Directory</h2>
<div>
<p>You have reached a private directory for customers only. If you are a spider or web directory agent please remove and block this url from your list.</p>
</div>
</body>
</html>
This will block crawling of your site if that is what you want, that means any crawlers.
Robots.txt is a good tool, but sadly only the good actors follow robots.txt. Robots.txt is only a request you present to the crawlers, there's no obligation for them to actually comply, and the problematic ones seldom do.
Quote from: Aleksi "Lex" Kilpinen on December 21, 2023, 04:44:45 AMRobots.txt is a good tool, but sadly only the good actors follow robots.txt. Robots.txt is only a request you present to the crawlers, there's no obligation for them to actually comply, and the problematic ones seldom do.
Quite correct, if they comply, i should have been clear on that part :)