Re-thinking read/unread features

kaiden11 · January 20, 2012, 01:37:22 AM

Hello! New here... and not entirely sure where my topic fits in. I'm not sure my thoughts pertain to mods or themes, as the SMF Coding Discussion section advertises, but I am looking to discuss with coders/designers familiar with SMF, particular with the read/unread tracking system. If this is the wrong place, and the reader is somebody with such a power, feel free to move this to a more appropriate venue.

Here goes: about a year ago now, I figured out and implemented an "alternate" method for tracking read/unread information for forum posts. It is a form of run-length encoding/compression (or so I'm told), that is able to efficiently store per-post seen/unseen information while storing no more seen/unseen records than half of the number of total posts in the database per user (and usually much less). It works by maintaining "seen boundaries," where a record lists a lower and upper boundary of posts that have been seen. Upon page-load, a subroutine maintains this list of boundaries for the user, either creating new boundaries, growing old ones, or merging two boundaries. It works for data sets that have monotonically increasing integers as their primary keys (like SMF, phpBB, etc.).

It would appear my account here doesn't allow me to post external links, otherwise I would direct you to the StackOverflow answer I wrote up that describes the algorithm in more detail. I suppose all I can ask is that you search Google for "stackoverflow php forums how to cope with unread discussions kaiden". It should be the first result, second answer, beginning with "There is another...".

In the article, I say that SMF "cheats" a little in the above link because they only store the latest timestamps (or IDs, I forget) in their tables rather than per-post information. However, as I understand it, with SMF being a decidedly flat forum, and people very rarely reading out of order, and are usually reading through pages of posts, not single posts. More than per-forum or per-topic timestamps isn't likely to be needed.

In the forum that I designed this for, posts were threaded, and given some of the topics, reading out of order was entirely possible, if not encouraged. Therefore, per-post historical accuracy was important. In the year since I came up with my initial design, I've improved upon it, adding hooks to provide nice display features outside of the seen/unseen maintenance routine ("Which posts in this thread had I not read before accessing this page?"), and also to efficiently support the maintenance of multiple messages viewed at once.

I've tried posting this to the phpBB developer forums, but either I'm impatient, or they're not particularly interested. I've put a good deal of work into getting this right, and other than more complicated data structures and algorithms, I'm finding the pros tend to outweigh the cons with my implementation (and many of my more useful features would be impossible without them). I'm curious: has anyone seen this type of implementation before? If so, were there problems? Or, do you have particular issues with, or questions about my current implementation? And are you tired of this topic coming up?

Thanks for reading.

Arantor · January 20, 2012, 03:09:24 AM

Hi, and thanks for posting! (It's only your first post where you can't post links.)

I've seen it done before, not in a forum though.

The main question that comes up re unread implementation isn't so much related to per post read status but simply that 'unread replies' is always based on 'threads I've replied to' rather than 'threads I'm interested in' and if you ever so much as breathe on some topics, they'll haunt you ever-more if you use unread-replies as a tool for organising your forum time.

It's true that in the flat view used by SMF that it wouldn't necessarily be important to have per post read status so I'm all the more intrigued by where it might be used as the normal order of things (and it would be important to have something along those lines if ever threaded view were implemented)

The one thing I'd be concerned about is performance; the lists of unread posts and unread-posts-in-topics-I've-replied-to are not particularly cheap to generate and this could - depending on implementation - make them a lot slower.

kaiden11 · January 20, 2012, 03:37:56 AM

Let's just see... http://stackoverflow.com/questions/2288814/php-forums-how-to-cope-with-unread-discussions-topics-posts/5045827#5045827 [nofollow]

You say you've seen it done: you've seen my particular "boundary" solution before? Or per-post recording?

And I didn't realize it was "unread replies," rather than just "this is how far you got in reading topic X, something was posted after that." That seems a little more... reasonable. Being flat, though, I can see what you mean: whether the continuing conversation is relevant after your reply isn't exactly, er, computationally determinable

As I mentioned, the forum I designed this for is threaded, meaning that its possible to determine if a post was in direct reply to you, as well as whether you've read it. If somebody posts elsewhere in the thread (not in direct reply to your post), your list of "unread replies" wouldn't include the new post.

As for why I designed the system this way:

Never wanted to deal with "Why does it say I read that post? I never read that post."
I have multiple viewing modes, one of which is collapsing a thread and only displaying one "selected" post in a thread at a time. And for this mode, let's say you're reading a story progression, or something else that would benefit from having your place saved, as well as being accurate on what you've read and not read.
In the same vein, I have a "full" viewing mode, which allows me to view a full thread at once. I wanted users to be able to visually distinguish between read/unread at a glance, as well as maintain state as if I had "viewed" the unread posts one at a time. Assuming users are actually reading, it's a way to get through a backlog quickly, as well as see the direction a thread has gone since you left.
I don't really "paginate" within a thread, and the tree structure sort of makes the question of "what post came after?" a fuzzy one. If I stored seen/unseen information, I would have to store it for "branches" of the thread, which is the same as per-post, if you think about it.

As for performance (as I discuss a little bit in the StackOverflow article), you have to be a little more careful. You can end up storing a large number of records, depending on somebody's viewing habits. However, if people view posts in order (common), or view posts that were temporally similar, the boundary growing/merging part of the algorithm can improve upon itself. The nice part being: the more people read, the faster it gets. And unlike storing a record per-post, your worst-case scenario is if people only read every other post.

Granted, my forum is sort of a quiet place, so it's unlikely that I'll see it at scale. But at least for my usage, it's worked out.

Let me know if you have questions on the algorithm.

Arantor · January 20, 2012, 04:03:13 AM

I've seen something very similar done in boundaries but it wasn't for recording read status in a forum, it was for another data application I used to do occasional maintenance on (a financial system)

Quote"this is how far you got in reading topic X, something was posted after that."

No, that's just general unread status (and how the 'new' button normally works), unread replies is something slightly different but both are options provided.

QuoteAs I mentioned, the forum I designed this for is threaded, meaning that its possible to determine if a post was in direct reply to you, as well as whether you've read it. If somebody posts elsewhere in the thread (not in direct reply to your post), your list of "unread replies" wouldn't include the new post.

Yeah, that's the big thing and one of the banes of dealing with threaded posts (the other is pagination)

QuoteNever wanted to deal with "Why does it say I read that post? I never read that post."

Oddly enough that actually doesn't happen that often, because SMF records the last post you read in a thread, even at just the page level so if you only read 2 out of 3 pages, the 'new' starts from the start of the third page.

QuoteI have multiple viewing modes, one of which is collapsing a thread and only displaying one "selected" post in a thread at a time. And for this mode, let's say you're reading a story progression, or something else that would benefit from having your place saved, as well as being accurate on what you've read and not read.

That makes a lot of sense to be able to record the boundaries. Sounds a lot more like how the PM system works, really, though that is pretty much true per-post read status.

QuoteIn the same vein, I have a "full" viewing mode, which allows me to view a full thread at once. I wanted users to be able to visually distinguish between read/unread at a glance, as well as maintain state as if I had "viewed" the unread posts one at a time. Assuming users are actually reading, it's a way to get through a backlog quickly, as well as see the direction a thread has gone since you left.

Nice.

QuoteI don't really "paginate" within a thread, and the tree structure sort of makes the question of "what post came after?" a fuzzy one. If I stored seen/unseen information, I would have to store it for "branches" of the thread, which is the same as per-post, if you think about it.

That makes some sense, though do you find yourself loading the entire node tree at once or slices of it? If so, how do you deal with the performance aspects attached to traversing the tree of posts?

QuoteAs for performance (as I discuss a little bit in the StackOverflow article), you have to be a little more careful. You can end up storing a large number of records, depending on somebody's viewing habits.

Yeah, I can imagine that. SMF does some mitigation on this aspect; it stores per-topic habits, but if you mark a board read, it should be purging that board's per-topic states and just retaining the overall last id for the board. (It's a little more complex than that, but that's the theory behind it)

QuoteHowever, if people view posts in order (common), or view posts that were temporally similar, the boundary growing/merging part of the algorithm can improve upon itself. The nice part being: the more people read, the faster it gets. And unlike storing a record per-post, your worst-case scenario is if people only read every other post.

*nods*

That's incidentally something the PM system doesn't have to cope with, even though it does have per-post tracking, because it keeps a PM once and then has a side table to indicate who that message was sent to, and read status is within that record. Unfortunately such a thing can't practically be extended to the main post table.

Do you still have the facility to mark an entire board as read? How does the algorithim deal with that scenario?

QuoteGranted, my forum is sort of a quiet place, so it's unlikely that I'll see it at scale. But at least for my usage, it's worked out.

That's the thing, I'd love to see it at scale but I'm finding it hard to conceptualise the structure for larger uses.

QuoteLet me know if you have questions on the algorithm.

I had a few questions above that aren't strictly related to the algorithm itself but sort of side consequences out of the overall thing you're doing.

Also, thanks for sharing. It's been an interesting start to the day to read this

kaiden11 · January 20, 2012, 04:48:08 PM

QuoteOddly enough that actually doesn't happen that often, because SMF records the last post you read in a thread, even at just the page level so if you only read 2 out of 3 pages, the 'new' starts from the start of the third page

I have a bad habit of clicking on the latest post in a thread (say, one I've never seen, particularly if I'm a new user) to see latest activity, and then reading the backlog if I'm more interested. Doing this gives me any new replies after, yes, but means I can no longer accurately answer the question "Which posts/pages have I not read prior to this?" Granted, it's not a common case, but annoying enough for me to write an algorithm to address it

QuoteThat makes some sense, though do you find yourself loading the entire node tree at once or slices of it? If so, how do you deal with the performance aspects attached to traversing the tree of posts?

Yeah, I've never found fault with loading all of the nodes into a tree. My thread building mechanism maintains the sort order in the tree structure, and everything after that is just in-order linear traversal. Being that I'll have to ask things like thread size, latest post, current post, seen/unseen anyway, I've never found case, performance or otherwise, to limit the retrieval or traversal, other than to "truncate" branch display in the case that it gets too deep and screws up styling (in that case, I stop iterating, and provide a link to view the thread in full).

QuoteYeah, I can imagine that. SMF does some mitigation on this aspect; it stores per-topic habits, but if you mark a board read, it should be purging that board's per-topic states and just retaining the overall last id for the board. (It's a little more complex than that, but that's the theory behind it)

I've seen that. It seems... messy. I mean, if that is the user's intention, then fine. I just think about the viewing history compression scripts people have to write to keep their viewing history manageable. In a different implementation of this (with much, much higher message count), I implemented a boundary limit. I said something like "A user will not be able to store more than 1000 boundaries. If it becomes necessary to add a boundary after 1000, you take the two least-numbered boundaries (assuming them to be the oldest), and you merge them, *then* add the new boundary." It introduces inaccuracy into the system, but if you can assume that the oldest posts users haven't read are the ones they care least about, it at least manages itself.

QuoteDo you still have the facility to mark an entire board as read? How does the algorithim deal with that scenario?

Yes, essentially. I added a "speed read" function after the fact. It ties into my search capabilities by first performing a search (based on criteria like "is unread, not myself,not replies to my posts"), and then asking the user if they want to mark those posts as read. If so, it takes the user's current boundaries, runs the viewing history maintenance function like it had seen each of the search results, and rewrites the user's boundaries to reflect their new state. Given that, it's entirely possible to mark a thread, subforum, posts older than a week, etc, as read.

(I just wish I wasn't the only user that actually uses that feature.)

QuoteThat's the thing, I'd love to see it at scale but I'm finding it hard to conceptualise the structure for larger uses.

I'd like to see this too. My other implementation worked OK, but it forced me to invent the type of "pruning" I described. Once I did that, my algorithm behaved very predictably, if slightly less accurate. I plan on getting debugging hooks into this soon, get an idea of what percentage of my rendering is spent on this process.

QuoteI had a few questions above that aren't strictly related to the algorithm itself but sort of side consequences out of the overall thing you're doing.

Sure!

Arantor · January 20, 2012, 05:25:17 PM

QuoteI have a bad habit of clicking on the latest post in a thread (say, one I've never seen, particularly if I'm a new user) to see latest activity, and then reading the backlog if I'm more interested. Doing this gives me any new replies after, yes, but means I can no longer accurately answer the question "Which posts/pages have I not read prior to this?" Granted, it's not a common case, but annoying enough for me to write an algorithm to address it

I know exactly what you mean about things that are annoying! (Good developers primarily scratch their own itches) Incidentally, I find that I actually don't click on the 'last post' link in SMF much, because I'm so used to using 'new' to track my progress.

QuoteYeah, I've never found fault with loading all of the nodes into a tree. My thread building mechanism maintains the sort order in the tree structure, and everything after that is just in-order linear traversal. Being that I'll have to ask things like thread size, latest post, current post, seen/unseen anyway, I've never found case, performance or otherwise, to limit the retrieval or traversal, other than to "truncate" branch display in the case that it gets too deep and screws up styling (in that case, I stop iterating, and provide a link to view the thread in full).

From my perspective, I looked at adding threading to SMF (and I know a colleague of mine implemented a decent amount of it) but the problem I always ran into was a performance-at-scale issue with the node tree itself.

The notion of storing a post's immediate parent node in its record is straightforward enough but I couldn't come up with an efficient way to gather enough the node tree to identify where you are within it, save for the immediate parent and perhaps immediate descendants, without gathering every post in the tree in its entirety and rebuilding the tree from knowing the relations.

It might not be an issue for a thread with a few branches and only a couple of levels but it's not exactly uncommon that I get into discussions that are hundreds of posts long, and the notion of running yet another query at thread display time, to gather the entire tree (or, worse, recursively query to get parents/children) seems woefully inefficient.

QuoteI've seen that. It seems... messy. I mean, if that is the user's intention, then fine. I just think about the viewing history compression scripts people have to write to keep their viewing history manageable.

Actually, it's basically opaque for the user, most do not even realise that's how it works (but then again, nor should they have to fuss about it). Thing is, if you mark an entire board read, you are effectively saying 'I don't care what's new here' for everything in the board, and then everything after that point is fodder for 'unread' again.

There is one script for maintenance actually, which pretty much does this.

QuoteIt introduces inaccuracy into the system, but if you can assume that the oldest posts users haven't read are the ones they care least about, it at least manages itself.

*nods* It's acceptable, unlike what vBulletin seems to do with read/unread stuff in my experience.

QuoteYes, essentially. I added a "speed read" function after the fact. It ties into my search capabilities by first performing a search (based on criteria like "is unread, not myself,not replies to my posts"), and then asking the user if they want to mark those posts as read. If so, it takes the user's current boundaries, runs the viewing history maintenance function like it had seen each of the search results, and rewrites the user's boundaries to reflect their new state. Given that, it's entirely possible to mark a thread, subforum, posts older than a week, etc, as read.

(I just wish I wasn't the only user that actually uses that feature.)

Oh, that's slick.

QuoteI'd like to see this too. My other implementation worked OK, but it forced me to invent the type of "pruning" I described. Once I did that, my algorithm behaved very predictably, if slightly less accurate. I plan on getting debugging hooks into this soon, get an idea of what percentage of my rendering is spent on this process.

I'm guessing dealing with the node tree is probably the most expensive part of that process.

kaiden11 · January 20, 2012, 08:16:21 PM

Now with numbers...

QuoteThe notion of storing a post's immediate parent node in its record is straightforward enough but I couldn't come up with an efficient way to gather enough the node tree to identify where you are within it, save for the immediate parent and perhaps immediate descendants, without gathering every post in the tree in its entirety and rebuilding the tree from knowing the relations.

It might not be an issue for a thread with a few branches and only a couple of levels but it's not exactly uncommon that I get into discussions that are hundreds of posts long, and the notion of running yet another query at thread display time, to gather the entire tree (or, worse, recursively query to get parents/children) seems woefully inefficient.

There are things you can do to help. For instance, I store my message tuples similar to the following: [ msg_id, msg_root_id, msg_parent_id, msg_msg, ... ]. The msg_root_id and msg_parent_id are maintained at posting time. When it comes time to render a thread, I retrieve all thread nodes by querying on msg_root_id. Then I have a class that represents a tree node. I manually initialize the thread root, and then repeatedly push the children using a method on the root ( $root->add_reply( ... ) ). It should be O(log(n)) to build the tree, but O(n) to traverse it afterwards.

In my random sampling, 133 fully rendered, queried, uncached posts (among different threads), it took me 32.9ms. Of that, 20.1ms was spent building the tree. For a 296ms total render time, that's pretty small. I'm finding I'm spending more than half of my time doing "Posts per Subforum" queries as part of some of the navigation elements (something I plan to fix).

As for the subject at hand (the boundary calculation piece), for another random sample (123 posts, 1804ms, not having caching is killing me) the viewing history maintenance routine takes 41.3 milliseconds, of that 38.9ms is spent talking to the database.

My user account has 365 boundaries associated with it, so any slowdown when it comes to iterating over boundaries should at least be a little evident.

I'm still collecting stats, but so far, I'm finding that it's the other stuff I've ignored that is the most damning, and less so the boundary calculations and the tree building.

Arantor · January 20, 2012, 08:24:11 PM

QuoteThere are things you can do to help.

In SMF terms, that would be analogous to querying for the topic id and rebuilding the tree once the rows come in from the DB, which is the scenario I was thinking - for 100 posts that might not be too bad but I have threads of a couple of thousand posts (and I know SMF sites who have threads that cap at 25-30k posts), which means there's a point where it stops being close to efficient to rebuild the entire node tree, especially if you're instancing classes and so on (as opposed to just representing it in a form of array)

QuoteAs for the subject at hand (the boundary calculation piece), for another random sample (123 posts, 1804ms, not having caching is killing me) the viewing history maintenance routine takes 41.3 milliseconds, of that 38.9ms is spent talking to the database.

*nods* With caching, I imagine that'd be pretty efficient thereafter since I don't imagine that stale data is as much of a problem as it might be in other places.

QuoteMy user account has 365 boundaries associated with it, so any slowdown when it comes to iterating over boundaries should at least be a little evident.

*nods* Though it sort of depends how you're querying to get the boundaries, though you would surely have noticed something by now, I'm sure.

QuoteI'm still collecting stats, but so far, I'm finding that it's the other stuff I've ignored that is the most damning, and less so the boundary calculations and the tree building.

*nods* It's all relative, of course, but those were the aspects that seemed to me as having potential for causing performance issues rather than anything else.

kaiden11 · January 22, 2012, 04:22:35 PM

So here's my best attempt at an at-scale run for my rendering. I literally rendered all of my threads (611 of them), resulting in around 3398 posts total. Format is like this: functionCall(number_times_called) = time_spent (percentage_of_total_page_load_time%).

Code Select


  _gen_thread(611) = 1016.257ms (48.08%)
  add_reply_loop(3398) = 200.832ms (9.50%)
  decache(612) = 26.306ms (1.24%)
  encache(611) = 131.795ms (6.24%)
  gen_header(1) = 14.092ms (0.67%)
  gen_nav(1) = 0.294ms (0.01%)
  get_thread_object(611) = 983.397ms (46.53%)
  get_thread_object.retrieve_thread_query(611) = 558.652ms (26.43%)
  linkURL(3896) = 432.058ms (20.44%)
  query(1226) = 699.599ms (33.10%)
   total(1) = 2113.513ms (100.00%)

Still not 30,000 posts but still indicative of overall performance (I feel).

Some points of interest to explain:

Percentages don't add up because entry/exit timing points aren't mutually exclusive (example: query makes use of encache and decache, and almost everything makes use of query).
_gen_thread calls get_thread_object, which calls the database to retrieve posts for a thread (get_thread_object.retrieve_thread_query) and builds the tree with add_reply (represented by the add_reply_loop hook). The object returned from get_thread_object then is "rendered" to HTML, of which its "spendiest" method is linkURL (having to perform repeated regexes to identify and replace plaintext URLs, linking them, etc.).

So, in attempting to render my entire database, I found that 33.10% of my total time is spent querying the database (cached or uncached). Most of that time (26% of total time) is spend just grabbing the threads' posts. 9.50% is spent building the trees, and at least 20.44% is spent rendering them.

Getting off-topic here, I believe, but according to this, tree building isn't the performance hit one might think. It won't ever be as efficient as just linear processing of flat posts, but at least on a more "normal" scale it's no more than a 10% hit. As you say, it's relative, but at the point where you're having 30K+ posts in a single thread, you're having to do lots of other crazy things to keep things humming.

Anyhow, thanks for listening. Hope this is useful to someone, somewhere.

Arantor · January 22, 2012, 04:49:29 PM

It's been a pleasure discussing it with you.

The one thing I would note in terms of handling pagination of a flat thread vs a threaded one (because I've been looking at splicing threading into SMF) is that with a flat thread you don't have to get all 30k posts at once, you can let the DB server deal with it (which is what SMF does) but you'd have to implicitly get everything to actually rebuild the tree, so it's not just a straight 10% overhead of tree building itself...

kaiden11 · January 22, 2012, 05:05:15 PM

Quote from: Arantor on January 22, 2012, 04:49:29 PM
The one thing I would note in terms of handling pagination of a flat thread vs a threaded one (because I've been looking at splicing threading into SMF) is that with a flat thread you don't have to get all 30k posts at once, you can let the DB server deal with it (which is what SMF does) but you'd have to implicitly get everything to actually rebuild the tree, so it's not just a straight 10% overhead of tree building itself...

You've got me there. In that case, I'd suggest building an intermediate caching mechanism based on the thread's "skeleton" (only post IDs and their relationships in a multi-dimensional array). Say, if thread's post count exceeds 1000, check for cached data structure, and based on your pagination method within the tree (depth limit, number posts displayed limit, etc.) build a database query based on your skeleton structure ("WHERE post.id [nofollow] IN (..., ...)"), and flesh out the in-memory skeleton with post content, etc. as needed. Then just invalidate and rebuild the skeleton upon thread modification.

Kindred · January 22, 2012, 06:56:50 PM

Hmmmmmmm... I still hate that threaded/tree view... But how would this work on a system that has 200 users online at any point in time, and has well over 2 million posts?

kaiden11 · January 22, 2012, 07:18:22 PM

Quote from: Kindred on January 22, 2012, 06:56:50 PM
Hmmmmmmm... I still hate that threaded/tree view... But how would this work on a system that has 200 users online at any point in time, and has well over 2 million posts?

I still hate having to copy-paste what I'm replying to into my own replies

Yes, increased complexity causes problems at-scale. I'll gladly concede that. And you would address the processing of such complexity in the same way you would do so for any other problem: parallelism, caching, and load-balancing. Say in the case of having to scale out my thread skeleton idea: perhaps implement the same type of consistent hashing that Craigslist does with its services: http://blog.zawodny.com/2011/02/26/redis-sharding-at-craigslist/ [nofollow]

I understand it comes down to preference and/or mindset when dealing with the threaded vs. flat discussion. But I don't think discarding the threaded side of things is warranted purely on a performance basis. If users demand such a thing, then it's up to us as the developers to come up with better solutions.

And per the original inspiration for the topic, it's also up to us to figure out different and efficient ways to determine read/unread status for our users, should they demand that as well.

Kindred · January 22, 2012, 08:11:21 PM

Hmmmm... The thing is, the current SMF runs fine on a single dedicated sever to supply a site with the stats that I mentioned. No caching beyond basic... No load balancing.

kaiden11 · January 22, 2012, 08:43:30 PM

Quote from: Kindred on January 22, 2012, 08:11:21 PM
Hmmmm... The thing is, the current SMF runs fine on a single dedicated sever to supply a site with the stats that I mentioned. No caching beyond basic... No load balancing.

I will take your word for it. I understand SMF to be a mature and performant project (otherwise I wouldn't have brought my discussion here). Cost is a practical concern, and sadly, the concern that often ends up being the most important. It's also the concern that yields the least interesting results.

The burden of proof is on me to provide viable statistics on the scale at which you claim. And given my ever-dwindling community and lack of personal resources, I will probably never be able to provide those. My lack of data, however, doesn't invalidate my argument that these problems *could* be addressed at the scale you wish (and could probably even be addressed with a single, dedicated machine). Were I developing in a business context, such a feature could even be marketed as a premium addition to "offset" the additional computational costs.

But I'm not in a business context. I'm interested in solving interesting technical problems, not shoveling walls of text. And technically: a flat forum is a subset of the functionality of a threaded forum, and could be stored using the exact same data elements (and, as such, could trivially be turned on and off at administrative and user will). And, if it were proven at scale, my boundary solution might be a reasonable and more functional alternative to SMF's read/unread feature.

Anyhow. I have to shovel my driveway again. Thanks for your time.

Kindred · January 22, 2012, 08:59:02 PM

Oh... I am not putting you down or saying its a bad idea... I am just concerned about the scalability of the design, since we have to design SMF for people that can run on some pretty minimal servers... Of course, even that design does require a dedicated server when you get to the size of the forum I mentioned.

That being said, I would be willing to work with you to test some stuff on 40konline...

青山素子 · January 22, 2012, 09:07:00 PM

I've been quiet, but wanted to say that this discussion is very interesting. While I don't think such a method of topic tracking is really warranted with the current "flat" topic design, it might be very interesting if threading ever gets considered for a feature.

Having said that, I feel that the technical community around SMF is one of the better ones among all the forum software. There is often a lot of initial dismissal of ideas mostly because of the different scales SMF has worked at - everything from crappy oversold $3/mo shared hosting to several thousand a month dedicated solutions. If you can pass the technical barrier or have an interesting enough idea, it'll very seriously be considered.

kaiden11, I want to encourage you to develop your ideas more and see if you can get some testing done with the folks in this community. Don't let some of the concerns here get you down. If you really feel you have a technically better idea, make your case and get some support behind it.

kaiden11 · January 22, 2012, 09:47:55 PM

Quote from: Kindred on January 22, 2012, 08:59:02 PM
Oh... I am not putting you down or saying its a bad idea... I am just concerned about the scalability of the design, since we have to design SMF for people that can run on some pretty minimal servers... Of course, even that design does require a dedicated server when you get to the size of the forum I mentioned.

That being said, I would be willing to work with you to test some stuff on 40konline...

@Kindred, And you're right to ask about performance in those instances. I myself operate on a server shared with a couple thousand other domains, and have to develop with that looming threat in mind. And I mean this as a compliment: it puts one on the defensive when one gets such questions from somebody with "Marketing Coordinator" on their profile

Quote from: 青山素子 on January 22, 2012, 09:07:00 PM
kaiden11, I want to encourage you to develop your ideas more and see if you can get some testing done with the folks in this community. Don't let some of the concerns here get you down. If you really feel you have a technically better idea, make your case and get some support behind it.

@青山素子, I appreciate your encouragement. And I thought I *was* making my case

As for my boundary solution: I would hope that it piques somebody's interest. I mentioned in earlier posts that I found the unread features of some of the more popular forums out there (SMF and phpBB in particular) to be a little limiting. However, if that's all you've had, anything different is probably less of a concern, and the users wouldn't have case to demand it.

If I were to make my case more succinct, I would state it as such: I think there are more interesting questions to be asked in terms of which posts a user has and hasn't read. Having a finer granularity into solving this problem is technically interesting, providing a wider range of functionality, and improves the product's robustness overall if you ever decide to implement a different display paradigm.

As for testing... I wouldn't even know where to begin, aside from my own codebase (which, admittedly, is a bit of a mess). Any suggestions?

青山素子 · January 23, 2012, 12:15:48 AM

Quote from: kaiden11 on January 22, 2012, 09:47:55 PM
As for testing... I wouldn't even know where to begin, aside from my own codebase (which, admittedly, is a bit of a mess). Any suggestions?

Well, at least one person here has offered to help with some experiments on their own site. That's a good place to get started.

In general, among technical folks, the best path to getting started on feature changes or additions is packaging a modification for SMF. This not only provides reference code for examination, but also offers an easy way to get that code tested by others. It also tests the popularity and demand for a feature. Granted, this works best when it's a visible change, and not all popular modifications will be included (things relying on third-party services have traditionally been shunned), but it's a good place to begin. I'm sure there are plenty of people who could help you with getting things packaged if you need assistance.

kaiden11 · January 23, 2012, 01:39:36 AM

Quote from: 青山素子 on January 23, 2012, 12:15:48 AM
Well, at least one person here has offered to help with some experiments on their own site. That's a good place to get started.

In general, among technical folks, the best path to getting started on feature changes or additions is packaging a modification for SMF. This not only provides reference code for examination, but also offers an easy way to get that code tested by others. It also tests the popularity and demand for a feature. Granted, this works best when it's a visible change, and not all popular modifications will be included (things relying on third-party services have traditionally been shunned), but it's a good place to begin. I'm sure there are plenty of people who could help you with getting things packaged if you need assistance.

Much as I appreciate the offer, I'm not sure Kindred's 200 concurrent users would appreciate somebody tweaking their production system. It'd probably be better for me to review SMF's API for dealing with read/unread business logic and experiment on my own. That way I can, at the very least, be more cognizant of how far one would have to take things in order to implement such a system.

There's also the question of whether SMF's architecture even has access to the things I need at rendering time. As of my last iteration, I have my function declaration as such:

Code Select


function _seen_bounds(
    $usr_id,    // User's primary key
    $msg_id = null,    // Numeric or array of numerics of the messages being viewed
    $noop_ids = array()    // Array of message IDs for which no database action is to
                           // be performed. Dependent on the calling function to determine
                           // business logic on whether user's viewing history will be
                           // updated based on combined IDs and No-op IDs.
)
// Returns an array of message IDs for which the user has
// never "seen" (an array of "unseen"). Passed to later
// rendering functions to visually indicate read/unread for 
// that particular post.

I guess I'd have to have somebody with intimiate knowledge of SMF's architecture to determine if there's even a good place to insert such a thing. I do it immediate before thread rendering, but SMF likely has a more sophisticated rendering mechanism than I do. Much as it's nice to have a database of functions, http://support.simplemachines.org/function_db/index.php?action=view_function;id=5 doesn't really tell me a whole lot. Also, if my experience developing modules for other projects tells me anything, there's a lot of boilerplate that is involved. And there's also the question of whether this can be accomplished with a module at all. Not that either particularly dissuades me, it's just time. A developer would be able to answer these outright, I would imagine.

News:

Re-thinking read/unread features