Cache size increasing in public_htlm since invoking https

Started by nax, September 12, 2019, 05:16:57 AM

Previous topic - Next topic

nax

Since installing SSL and using https I have seen an increase in disk usage for a site I "try" to manage. There is minor traffic on the SMF board and nothing has been added to the server in terms of web pages but the % disk used has gone up from 563MB to 723MB since May about 7% of disk space - the increase before that was about 1% per year!

I'm concerned that the disk is going to fill up unless I can find the cause.  I think it's something to do with the public_html/cache as it's the only folder that has a modification date of today.  Can I clear this (and how)?  Is this something to do with https and the image proxy service started after SSL was invoked?  Is there an easy script that I can run periodically using CRON to keep on top of this assuming it's the cause?

I tried clearing the cache in SMF Admin/Maintenance but that had no effect.

TIA

Arantor

It's the image proxy. It takes copies of images that people link to that aren't HTTPS so that when people visit your site, 1) everything is secure and 2) your site isn't drowning in serving up extra work it doesn't have to (by taking a local copy rather than not caching everything)

If you clear it daily you will actually make your site slower for everyone. You would be better served looking at the posts people make and try to change the links to https directly in posts to avoid this.

nax

Thanks for that advice (at least I know what it is now), if I deleted content in the directory that was over a month old would that be a better approach (can I just delete it?).  I can ask the users of the forum concerned to use https for image/video links including their avatars.  Most image hosting sites do support https.

a10

I'd think this should control it? image proxy in server settings, see attach.
2.0.19, php 8.0.23, MariaDB 10.5.15. Mods: Contact Page, Like Posts, Responsive Curve, Search Focus Dropdown, Add Join Date to Post.

nax

I already have this set and the Maximum File size is set to 5190KB - which I think was the default.

Looking in the cache there are files  of 3MB and more! see the attached Is there any way to see what these files are then maybe I can see where they're being used in the forum and address the issue.




vbgamer45

5190kb is 5.19 megabytes
Not easily no to see the files need to be decoded via the script to be displayed.
Community Suite for SMF - Take your forum to the next level built for SMF, Gallery,Store,Classifieds,Downloads,more!

SMFHacks.com -  Paid Modifications for SMF

Mods:
EzPortal - Portal System for SMF
SMF Gallery Pro
SMF Store SMF Classifieds Ad Seller Pro

a10

Quote from: nax on September 12, 2019, 07:49:28 AM
I already have this set and the Maximum File size is set to 5190KB - which I think was the default.


Interesting, can't really tell if it's working on my forum as have not reached my set max size.
Maybe the forum experts can tell if \ how the max size is supposed to work.

But is 723MB (or even many GB's) really any problem, hosting space is getting really cheap thsee days. And there is good  benefit in caching the http non-s stuff.

About finding http culprits, it can be done by editing the messages table (have done things with find\replace in notepad++, mainly replacing all outdated internal forum links in posts after domain name change).
2.0.19, php 8.0.23, MariaDB 10.5.15. Mods: Contact Page, Like Posts, Responsive Curve, Search Focus Dropdown, Add Join Date to Post.

nax

Quote from: nax on September 12, 2019, 07:49:28 AM
Quote from: vbgamer45 on September 12, 2019, 08:08:20 AM
5190kb is 5.19 megabytes
Not easily no to see the files need to be decoded via the script to be displayed.

So no easy way to find the culprits.  I think I will set the max size to 1024KB most embedded images shouldn't really be more than that don't you think?

nax

Quote from: a10 on September 12, 2019, 08:15:22 AM
Quote from: nax on September 12, 2019, 07:49:28 AM
I already have this set and the Maximum File size is set to 5190KB - which I think was the default.


Interesting, can't really tell if it's working on my forum as have not reached my set max size.
Maybe the forum experts can tell if \ how the max size is supposed to work.

But is 723MB (or even many GB's) really any problem, hosting space is getting really cheap thsee days. And there is good  benefit in caching the http non-s stuff.

About finding http culprits, it can be done by editing the messages table (have done things with find\replace in notepad++, mainly replacing all outdated internal forum links in posts after domain name change).

My message table is about 2GB ;)

shawnb61

You can safely disable the image proxy and clear the cache/images folder.  The only downside is you lose the "padlock" when viewing threads that have links to http:// images in them.

Some of us use cronjobs to keep the cache/images folder pruned to within a certain size.  An example:
   https://github.com/sbulen/sjrbTools/blob/master/proxy-maint-cron.php

I do not recommend editing your messages by hand. 
Address the process rather than the outcome.  Then, the outcome becomes more likely.   - Fripp

nax

Thanks Shawn, that script is just a little too complicated for me!

Arantor

Quote from: shawnb61 on September 12, 2019, 10:07:09 AM
You can safely disable the image proxy and clear the cache/images folder.  The only downside is you lose the "padlock" when viewing threads that have links to http:// images in them.

Some of us use cronjobs to keep the cache/images folder pruned to within a certain size.  An example:
   https://github.com/sbulen/sjrbTools/blob/master/proxy-maint-cron.php

I do not recommend editing your messages by hand. 

Depends on your browser, some outright won't show the image if it's on http when the site is https.

nax


Kindred

well, I went into the database and did a global replace of the http to https in the cases of known changes to the source (e.g. it is a known pattern for the changes to images hosted on imagbb and other sites like that)  That cut down on the use of the proxy
Слaва
Украинi

Please do not PM, IM or Email me with support questions.  You will get better and faster responses in the support boards.  Thank you.

"Loki is not evil, although he is certainly not a force for good. Loki is... complicated."

shawnb61

Then if the padlock isn't important I would simply suggest disabling the image proxy.  (That's what I do now!)
Address the process rather than the outcome.  Then, the outcome becomes more likely.   - Fripp

Arantor

Quote from: shawnb61 on September 12, 2019, 03:05:02 PM
Then if the padlock isn't important I would simply suggest disabling the image proxy.  (That's what I do now!)

That isn't what I said.

There are browsers who won't show any HTTP resource in an HTTPS context at all. The current mainstream browsers will allow (some) mixed content, but it's a policy that I've seen set with Group Policy (e.g. workplace computers) to never allow HTTP inside HTTPS pages.

(The trend is going in the way of more strict, not less. Right now the mainstream browsers will allow mixed content for images but not fonts and some JavaScript. Mozilla has certainly indicated that for more powerful features such as WebRTC, mixed content will not be tolerated.)

shawnb61

(Arantor - I wasn't directing that at you, but to the OP.) 

My input to nax still stands - with concern about cache growth, and an inability to get a pruning cron working, the best advice is to disable the image proxy & clear the cache. 

As Arantor points out, the mainstream browsers allow mixed content for images, so you're good. 

I have ~30K users.  I received many complaints from users with the proxy ON (dropped images, another topic entirely...) - but I have never received a single report at all from a user with the image proxy OFF.  My suggestion remains to turn it OFF. 

Address the process rather than the outcome.  Then, the outcome becomes more likely.   - Fripp

Advertisement: