News:

Bored?  Looking to kill some time?  Want to chat with other SMF users?  Join us in IRC chat or Discord

Main Menu

[2.0.11] Split-horizon caching when $boardurl is modified

Started by Porksifal, December 27, 2015, 04:33:41 AM

Previous topic - Next topic

Porksifal

Hi there,

We've run into a curious bug with caching when HTTPS is enabled. I'll quote from the lengthy explanation from our commit that fixes it on GitHub (which I can't seem to link here):

QuotePreviously, $boardurl was treated as static configuration by some parts of the code, and as mutable data by others. In particular, the caching implementation uses $boardurl to generate cache keys, which results in two or more different caches being accessed at different times during request processing if $boardurl gets modified.

There are three situations in which $boardurl may be modified, all of them in loadTheme in Load.php:

 1. $modSettings["forum_alias_urls"] is set, and the URL being accessed matches one of its entries.
 2. The URL being accessed differs only by scheme, using https instead of http.
 3. The site is being accessed by IP address rather than DNS name.

Any of these cases will trigger this bug.

There are numerous potential symptoms of reading from a stale cache, but the most obvious one we noticed was that boards containing recent posts made over HTTPS continue to be marked as unread, even once those posts are read. This is due to the way unread board status is tracked.

SMF tracks the most recent post ID on the entire site in $modSettings["maxMsgID"], which is read from cache very early in request processing. When a user marks a board as read by viewing its index, the log_boards table in the database is updated to indicate that their most recent read post is $modSettings["maxMsgID"] -- in other words, any posts made after now will be considered unread.

However, when a user makes a post over HTTPS, $modSettings["maxMsgID"] gets updated and written to both the database and the *wrong* cache. Next time a page is loaded (over HTTP or HTTPS; it doesn't matter because loading $modSettings from cache happens before modifying $boardurl), it gets set back to its old value, and marking a board as read then sets one's most recent read post to one less than it should be. That board is thus considered to still contain an unread post.

The root of the problem, as previously mentioned, is that some parts of the code treat $boardurl as static configuration, and others as mutable data. The fix here is simply to separate these use cases; store the original value as $canonical_boardurl, and use that wherever a consistent identifier for the board is required.

And our fix:

diff --git a/Sources/Load.php b/Sources/Load.php
index 7aeb4bd..9be7497 100644
--- a/Sources/Load.php
+++ b/Sources/Load.php
@@ -2588,7 +2588,7 @@ function cache_quick_get($key, $file, $function, $params, $level = 1)
 
 function cache_put_data($key, $value, $ttl = 120)
 {
-    global $boardurl, $sourcedir, $modSettings, $memcached;
+    global $canonical_boardurl, $sourcedir, $modSettings, $memcached;
     global $cache_hits, $cache_count, $db_show_debug, $cachedir;
 
     if (empty($modSettings['cache_enable']) && !empty($modSettings))
@@ -2601,7 +2601,7 @@ function cache_put_data($key, $value, $ttl = 120)
         $st = microtime();
     }
 
-    $key = md5($boardurl . filemtime($sourcedir . '/Load.php')) . '-SMF-' . strtr($key, ':', '-');
+    $key = md5($canonical_boardurl . filemtime($sourcedir . '/Load.php')) . '-SMF-' . strtr($key, ':', '-');
     $value = $value === null ? null : serialize($value);
 
     // The simple yet efficient memcached.
@@ -2715,7 +2715,7 @@ function cache_put_data($key, $value, $ttl = 120)
 
 function cache_get_data($key, $ttl = 120)
 {
-    global $boardurl, $sourcedir, $modSettings, $memcached;
+    global $canonical_boardurl, $sourcedir, $modSettings, $memcached;
     global $cache_hits, $cache_count, $db_show_debug, $cachedir;
 
     if (empty($modSettings['cache_enable']) && !empty($modSettings))
@@ -2728,7 +2728,7 @@ function cache_get_data($key, $ttl = 120)
         $st = microtime();
     }
 
-    $key = md5($boardurl . filemtime($sourcedir . '/Load.php')) . '-SMF-' . strtr($key, ':', '-');
+    $key = md5($canonical_boardurl . filemtime($sourcedir . '/Load.php')) . '-SMF-' . strtr($key, ':', '-');
 
     // Okay, let's go for it memcached!
     if (function_exists('memcache_get') && isset($modSettings['cache_memcached']) && trim($modSettings['cache_memcached']) != '')
diff --git a/index.php b/index.php
index 744982e..fa27679 100644
--- a/index.php
+++ b/index.php
@@ -43,6 +43,11 @@ foreach (array('db_character_set', 'cachedir') as $variable)
 // Load the settings...
 require_once(dirname(__FILE__) . '/Settings.php');
 
+// We may modify $boardurl later in Load.php for various reasons. Keep
+// a copy around for subroutines (such as caching) which rely on a
+// consistent $boardurl.
+$canonical_boardurl = $boardurl;
+
 // Make absolutely sure the cache directory is defined.
 if ((empty($cachedir) || !file_exists($cachedir)) && file_exists($boarddir . '/cache'))
     $cachedir = $boarddir . '/cache';

Please let me know if you need any further information. :)

Edit: Based on reading the code, it looks like this might affect 2.1 as well. I don't have a 2.1 installation to test on, but it might be worthwhile forward-porting this (looks like the patch should apply fairly cleanly, that code hasn't changed much).

Porksifal

The original patch I provided was incorrect because $canonical_boardurl didn't get set in SSI.php, so SSI would have used different cache keys. Here's a corrected patch.


diff --git a/SSI.php b/SSI.php
index 7b8d6b1..8e582ea 100644
--- a/SSI.php
+++ b/SSI.php
@@ -38,6 +38,11 @@ foreach (array('db_character_set', 'cachedir') as $variable)
// Get the forum's settings for database and file paths.
require_once(dirname(__FILE__) . '/Settings.php');

+// We may modify $boardurl later in Load.php for various reasons. Keep
+// a copy around for subroutines (such as caching) which rely on a
+// consistent $boardurl.
+$canonical_boardurl = $boardurl;
+
// Make absolutely sure the cache directory is defined.
if ((empty($cachedir) || !file_exists($cachedir)) && file_exists($boarddir . '/cache'))
        $cachedir = $boarddir . '/cache';
diff --git a/Sources/Load.php b/Sources/Load.php
index 7aeb4bd..9be7497 100644
--- a/Sources/Load.php
+++ b/Sources/Load.php
@@ -2588,7 +2588,7 @@ function cache_quick_get($key, $file, $function, $params, $level = 1)

function cache_put_data($key, $value, $ttl = 120)
{
-       global $boardurl, $sourcedir, $modSettings, $memcached;
+       global $canonical_boardurl, $sourcedir, $modSettings, $memcached;
        global $cache_hits, $cache_count, $db_show_debug, $cachedir;

        if (empty($modSettings['cache_enable']) && !empty($modSettings))
@@ -2601,7 +2601,7 @@ function cache_put_data($key, $value, $ttl = 120)
                $st = microtime();
        }

-       $key = md5($boardurl . filemtime($sourcedir . '/Load.php')) . '-SMF-' . strtr($key, ':', '-');
+       $key = md5($canonical_boardurl . filemtime($sourcedir . '/Load.php')) . '-SMF-' . strtr($key, ':', '-');
        $value = $value === null ? null : serialize($value);

        // The simple yet efficient memcached.
@@ -2715,7 +2715,7 @@ function cache_put_data($key, $value, $ttl = 120)

function cache_get_data($key, $ttl = 120)
{
-       global $boardurl, $sourcedir, $modSettings, $memcached;
+       global $canonical_boardurl, $sourcedir, $modSettings, $memcached;
        global $cache_hits, $cache_count, $db_show_debug, $cachedir;

        if (empty($modSettings['cache_enable']) && !empty($modSettings))
@@ -2728,7 +2728,7 @@ function cache_get_data($key, $ttl = 120)
                $st = microtime();
        }

-       $key = md5($boardurl . filemtime($sourcedir . '/Load.php')) . '-SMF-' . strtr($key, ':', '-');
+       $key = md5($canonical_boardurl . filemtime($sourcedir . '/Load.php')) . '-SMF-' . strtr($key, ':', '-');

        // Okay, let's go for it memcached!
        if (function_exists('memcache_get') && isset($modSettings['cache_memcached']) && trim($modSettings['cache_memcached']) != '')
diff --git a/index.php b/index.php
index 744982e..fa27679 100644
--- a/index.php
+++ b/index.php
@@ -43,6 +43,11 @@ foreach (array('db_character_set', 'cachedir') as $variable)
// Load the settings...
require_once(dirname(__FILE__) . '/Settings.php');

+// We may modify $boardurl later in Load.php for various reasons. Keep
+// a copy around for subroutines (such as caching) which rely on a
+// consistent $boardurl.
+$canonical_boardurl = $boardurl;
+
// Make absolutely sure the cache directory is defined.
if ((empty($cachedir) || !file_exists($cachedir)) && file_exists($boarddir . '/cache'))
        $cachedir = $boarddir . '/cache';

nend

Why not just take $boardurl out all together? There is already filetime on Load.php.

I have this problem also, not too stale data but noticible from time to time.

Problem I have is I am on a multi server setup. Each server has their own caches and depending on which server your handed off to you may see some stale data from a few seconds ago after viewing fresh data on a different node.

The joy. :D

Porksifal

Quote from: nend on January 26, 2016, 09:03:29 AM
Why not just take $boardurl out all together? There is already filetime on Load.php.

It's possible to use the same codebase to serve multiple forums; for example, by having multiple directories with their own Settings.php but symlinks to the same copy of the source. I don't know if anyone uses SMF that way, but minimising risk of breakage within the 2.0 release series is a good idea.

Of course, it's reasonable to do it this way in SMF 2.1 if that isn't a feature the developers want to support, but I suspect that's why it was added in the first place. The lack of public revision control for the 2.0 codebase makes it difficult to check.

Quote from: nend on January 26, 2016, 09:03:29 AM
I have this problem also, not too stale data but noticible from time to time.

Problem I have is I am on a multi server setup. Each server has their own caches and depending on which server your handed off to you may see some stale data from a few seconds ago after viewing fresh data on a different node.

The joy. :D

For a multi-server setup, you'd be best served by using a shared cache such as memcached.

Advertisement: