Download hotlinked images and host locally

Started by blork, June 15, 2025, 09:38:09 AM

Previous topic - Next topic

blork

I'm facing an issue where images are hotlinked on my forum, then the external source is taken down which results in a blank post. Is there a way to have SMF automatically download any images hotlinked with img bbcode and host the image from my own server?

In particular I find people hotlinking images from facebook. These URLs are dynamic and expire after a period of time. The post looks fine at first, but if you view it a week later the URL will have expired and the images will no longer display. I'd like SMF to download the remote image, then display the image hosted from my own server. Does SMF have this ability or are there any mods/workarounds?

I know this problem could be resolved by having users save the image themselves and upload it as an attachment but I was hoping for something more seamless.

Sesquipedalian

You will need to request a mod to do this for you.

That said, SMF actually does already include the underlying functionality that you are asking for. The built-in image proxy script does exactly what you have described, except that the proxy script only downloads images under very specific circumstances rather than always doing so. 

In light of that, all you really need this mod to do is to tweak the conditions for when the proxy decides to download the external image. It should be relatively easy to create such a mod.
I promise you nothing.

Sesqu... Sesqui... what?
Sesquipedalian, the best word in the English language.

pimps

Thanks for the explanation, Sesquipedalian, that helps clarify things.

Just to confirm: are you saying that the existing image proxy script in SMF already can download and serve images locally, but it only triggers under certain conditions? If so, would modifying those conditions be as simple as editing the relevant PHP file, or would a proper mod still be necessary to handle things cleanly (like checking image type, size, etc.)?

I'm also curious if anyone has already done something similar or could share a code snippet as a starting point.

Sesquipedalian

You will need to make changes in a couple of places.

First, in ./Sources/Subs.php you will find a function named get_proxied_url. The logic inside that function would need to be changed. Right now, the logic assumes that it should not proxy an image that is available via HTTPS. You would need to change that so that it makes an exception for Facebook URLs and always generates proxied URLs for those even when the original URL is using HTTPS.

Second, in ./proxy.php you will find a function called housekeeping. The logic in that function deletes proxied image files after a certain number of days so that the cache of proxied images doesn't grow to a massive size over time. You, however, want some of these images (specifically, the ones from Facebook) to never be deleted. Therefore, you will need to change the logic of this function in order to make that differentiation.

However, changing the housekeeping logic will be a bit tricky because the cached files are stored in a way that doesn't directly record the original URL. Instead, they are stored using cryptographic hashes. You will therefore need a system to keep track of the original URL for each image so that the housekeeping function can check it before deciding whether to delete the cached file or not.

One way to do that would be to add the URL to the information that is recorded for each image in the cacheImage function, and then have the housekeeping function read that data before deciding what to do with the file. This would be a very reliable approach, since the original URL would be stored inside the same cache file as the image content itself and could not be separated from it. But the downside is that it would not be very performant, because instead of simply checking the last modification time of the file, the housekeeping function now would need to open each cache file, extract the recorded URL, and then analyze the URL to see whether it is from Facebook in order to decide whether to keep the cached file or not.

Of course, there are other possibilities for how to track the necessary information—many of which would almost certainly be much more efficient than the idea I sketched above. But one way or another you will need a way for the housekeeping function to distinguish between files it should keep forever and files it should not.

I hope that helps.
I promise you nothing.

Sesqu... Sesqui... what?
Sesquipedalian, the best word in the English language.

shawnb61

Of course, there's the critical detail whether you have the rights to do so.  This is a bigger problem without a link to the original site & context.

https://www.hostinger.com/tutorials/hotlinking

This is even worse than "hotlinking"...  You're presenting it as your own original content.

The smf proxy only temporarily caches http content, and only to deal with what are presumably temporary technical limitations (other site has no cert yet).

To do this correctly, I think - at the very least - you'd need to caption the images with source attribution.  Probably a legal figleaf, but hopefully sufficient.

If someone copied my photos from my site without proper permission or attribution, I'd not be happy...
A question worth asking is born in experience & driven by necessity. - Fripp

Advertisement: