News:

SMF 2.1.4 has been released! Take it for a spin! Read more.

Main Menu

Pretty URLs

Started by SMFHacks.com Team, January 31, 2007, 10:56:43 AM

Previous topic - Next topic

Nao 尚

Quote from: eldʌkaː on October 10, 2007, 06:48:28 AM
QuoteWe should make a list of what we learned in it, and what we should focus on
Lol.
Well I have a tendency to forget everything after a conversation, hence the remark :P
I will not make any deals with you. I've resigned. I will not be pushed, filed, stamped, indexed, briefed, debriefed or numbered.

Aeva Media rocks your life.

Nao 尚

Okay... I've made lots of tests using these three caching methods:
- crc32 (default one)
- md5 (much better one, but slower)
- plain URLs

The results have surprised me (but not that much): they pretty much all behave the same. I get between 0.059s and 0.2s each time I refresh the page. With an average time of about 0.07s. md5() doesn't seem to be any slower than crc32() on small-scale pages (most of SMF's pages I'd say). Plain URLs would be the right choice, then, considering there just can't be any name collision this way.

I've also tested with and without array_unique(). Same results: both versions get the same average time. I would recommend to include array_unique() because it's probably better to put the server load on the PHP server, rather than the SQL server (considering that even if the PHP server gets less busy, the database will *always* grow whatever you do about it, and so it'll be slower to crawl.)

Now, all I have left to do is figure out why there's a minimum of 2 queries in the PrettyURLs stats. I tried to add a static variable to ob_sessrewrite so that I can limit it to being called only once. This helped in removing the extra cache call that was written down at the end of the viewquery page. Now, even though there's only one "pretty" query in the viewpage log, it STILL says "2q" on topic pages. What is the other query and why isn't it included in the viewpage log?!
I will not make any deals with you. I've resigned. I will not be pushed, filed, stamped, indexed, briefed, debriefed or numbered.

Aeva Media rocks your life.

Dannii

I'm thinking too that plain text would be better. Eventually we'll be able to change the URLs of topics (mod action), so to fix the old URLs the cache will need to have them removed. With plain text we can just search for those rows, otherwise we'd have to empty the whole table.

QuoteNow, all I have left to do is figure out why there's a minimum of 2 queries in the PrettyURLs stats. I tried to add a static variable to ob_sessrewrite so that I can limit it to being called only once. This helped in removing the extra cache call that was written down at the end of the viewquery page. Now, even though there's only one "pretty" query in the viewpage log, it STILL says "2q" on topic pages. What is the other query and why isn't it included in the viewpage log?!
It needs one query to find which topic it is, and another to get the cached URLs. It probably is in the list, just at the top. On new topics you can get 4 or more queries, which is why the cache is better. Not that it's caching properly... so many things to fix hehe.
"Never imagine yourself not to be otherwise than what it might appear to others that what you were or might have been was not otherwise than what you had been would have appeared to them to be otherwise."

Nao 尚

Quote from: eldʌkaː on October 11, 2007, 09:47:33 AM
It needs one query to find which topic it is, and another to get the cached URLs. It probably is in the list, just at the top.
Nope, it really isn't...
But I don't get it. I looked at your code and... Why do you query for the topic ID, when there are so many globals that offer the same thing?
Oh, let me check... You do this before the topic ID is checked by SMF, okay.

Of course, I don't do that on my website--since the topic ID is hardcoded into the URL.

So, I guess the time indicated by PrettyURLs only takes into account the whole "ob_sessrewrite" section of the mod?
I thought the thing was called twice. So I could cut the times in half. But since it's only called once, the .07s average represents over 30% of the total loading page for my page. It's a bit... Slow.
I will not make any deals with you. I've resigned. I will not be pushed, filed, stamped, indexed, briefed, debriefed or numbered.

Aeva Media rocks your life.

Dannii

#844
QuoteBut I don't get it. I looked at your code and... Why do you query for the topic ID, when there are so many globals that offer the same thing?
Oh, let me check... You do this before the topic ID is checked by SMF, okay.

Of course, I don't do that on my website--since the topic ID is hardcoded into the URL.
Yeah it's done very early. The query is a simple one though, so I don't think it has any real effect on performance.

QuoteSo, I guess the time indicated by PrettyURLs only takes into account the whole "ob_sessrewrite" section of the mod?
I thought the thing was called twice. So I could cut the times in half. But since it's only called once, the .07s average represents over 30% of the total loading page for my page. It's a bit... Slow.
It's taken from the time that SMF thinks it's finished, but before all the templates are actually run and sent to the buffer (I think). So it's not entirely the mod. 30% seems slower than I thought... I had measured about 15% before. There are some PHP profilers that I really should test it with sometime.
"Never imagine yourself not to be otherwise than what it might appear to others that what you were or might have been was not otherwise than what you had been would have appeared to them to be otherwise."

Nao 尚

Quote from: eldʌkaː on October 11, 2007, 09:21:50 PM
Yeah it's done very early. The query is a simple one though, so I don't think it has any real effect on performance.
No, it doesn't. Actually, the topic cache query doesn't have any real effect either. I believe about 80%-90% of the performance hit is due to the buffer rewriting process in PHP. Possibly because of the use of regexp.
What do you think?

Quote30% seems slower than I thought... I had measured about 15% before.
This morning, my average is:

Page created in 0.169 seconds with 23 queries. (Pretty URLs adds 0.047s, 2q)

My average with the 1.1.3 version of my website (same contents, but uses my own URL-rewriting system):

Page générée en 0.162 secondes avec 32 requêtes.

So it has 30% more queries (I have a custom sidebar), but it's still faster... Definitely not SQL causing issues here.
I will not make any deals with you. I've resigned. I will not be pushed, filed, stamped, indexed, briefed, debriefed or numbered.

Aeva Media rocks your life.

Dannii

Regexs are another thing that I'm not too big an expert on, I'm sure they could be optimised more too.

How are you calculating your averages?
"Never imagine yourself not to be otherwise than what it might appear to others that what you were or might have been was not otherwise than what you had been would have appeared to them to be otherwise."

Nao 尚

Quote from: eldʌkaː on October 12, 2007, 05:11:29 AM
Regexs are another thing that I'm not too big an expert on, I'm sure they could be optimised more too.
Let's see... You're using preg_replace which is faster than ereg_replace... Check...
You seem to be using str_replace whenever possible... Check...
Are your expressions case-insensitive? Just in case, case-sensitive searches are fine here, and faster obviously.

QuoteHow are you calculating your averages?
With the good old method of "I'm refreshing 5 times or so, dropping the fastest and slowest results, and making a mental average."
I will not make any deals with you. I've resigned. I will not be pushed, filed, stamped, indexed, briefed, debriefed or numbered.

Aeva Media rocks your life.

Dannii

QuoteLet's see... You're using preg_replace which is faster than ereg_replace... Check...
You seem to be using str_replace whenever possible... Check...
Are your expressions case-insensitive? Just in case, case-sensitive searches are fine here, and faster obviously.
I'm using case-sensitive mostly, however there are bigger improvements to be made I'm sure.

$match = preg_replace(array('~^[\"\']|PHPSESSID=[^;]+|sesc=[^;]+~', '~\"~', '~;+|=;~', '~\?;~', '~\?$|;$|=$~'), array('', '%22', ';', '?', ''), $match);This line could be optimised more, as several of them don't actually use regexs. But would having multiple statements be slower overall? I'm not sure.

QuoteWith the good old method of "I'm refreshing 5 times or so, dropping the fastest and slowest results, and making a mental average."
Thought as much ;)

Btw, I started a topic at the dev forum.
"Never imagine yourself not to be otherwise than what it might appear to others that what you were or might have been was not otherwise than what you had been would have appeared to them to be otherwise."

Nao 尚

Quote from: eldʌkaː on October 12, 2007, 06:44:41 AM
I'm using case-sensitive mostly, however there are bigger improvements to be made I'm sure.
Let's say,

preg_match('~#.*~'$matches[2], $fragment);

It could be replaced with a "strpos" test for #, followed by a substr to put the anchor into $fragment, couldn't it?
As for PHPSESSID, I'm one of these guys who think the variable should be removed from URLs once and for all. The ob_googlebot mod does it at the beginning of index.php when the agent is a bot. I see no reason why it should be there for regular users, since guests don't have more rights than bots, and logger users won't get the variable anyway...
The only thing that really bothers me is the sesc variable.
I will not make any deals with you. I've resigned. I will not be pushed, filed, stamped, indexed, briefed, debriefed or numbered.

Aeva Media rocks your life.

Nao 尚

Another quick way to get $fragment:

http://fr.php.net/manual/en/function.parse-url.php

It says it doesn't work with relative URLs, and SMF doesn't use it much itself, but it might be worth checking out if you end up profiling your code.
I will not make any deals with you. I've resigned. I will not be pushed, filed, stamped, indexed, briefed, debriefed or numbered.

Aeva Media rocks your life.

Nao 尚

Okay I'm giving up...
I should have tried this since the beginning: remove all preg_replace and preg_match calls from the callback function. As a result, the whole thing is no more than 0.01s faster. It's not worth the hassle...
I will not make any deals with you. I've resigned. I will not be pushed, filed, stamped, indexed, briefed, debriefed or numbered.

Aeva Media rocks your life.

Dannii

QuoteIt could be replaced with a "strpos" test for #, followed by a substr to put the anchor into $fragment, couldn't it?
Hmm interesting, that should work. But as you said, if it doesn't actually help it's probably not worth it.

QuoteAs for PHPSESSID, I'm one of these guys who think the variable should be removed from URLs once and for all. The ob_googlebot mod does it at the beginning of index.php when the agent is a bot. I see no reason why it should be there for regular users, since guests don't have more rights than bots, and logger users won't get the variable anyway...
The only thing that really bothers me is the sesc variable.
PHPSESSID usually won't be shown, only if cookies aren't available, in which case it's essential. sesc is different, and only needed for a few things, none of which bots should be doing.

QuoteI should have tried this since the beginning: remove all preg_replace and preg_match calls from the callback function. As a result, the whole thing is no more than 0.01s faster. It's not worth the hassle...
Well there must be something that can be done to make it faster. I get as low as 0.007 on mine sometimes. I'll look into installing a profiler next week.
"Never imagine yourself not to be otherwise than what it might appear to others that what you were or might have been was not otherwise than what you had been would have appeared to them to be otherwise."

Nao 尚

Quote from: eldʌkaː on October 12, 2007, 10:32:47 AM
Hmm interesting, that should work.
Yup, it does... (But it doesn't save a single cycle!)

QuotePHPSESSID usually won't be shown, only if cookies aren't available,
Actually, it does show in all situations (even when you have cookies enabled), as long as you're not logged in, and this is the first page you're visiting. As soon as you click on a link, cookies are "registered" and PHPSESSID disappears.

QuoteWell there must be something that can be done to make it faster. I get as low as 0.007 on mine sometimes. I'll look into installing a profiler next week.
But you're on a dedicated server, right? Your server's default website (from the bare IP) is a SMF board, so I guess you're sharing the server with it. My test website runs on a shared hosting, so it's poised to be slower. (Although it's one of the fastest shared hostings I've ever seen. I have another board on a semi-dedicated server and sometimes it's slower!)

I should also add that my tests are done on large topics with lots of links.

As for profiling, maybe a simple PHP function would be enough. I'll give it a try...
I will not make any deals with you. I've resigned. I will not be pushed, filed, stamped, indexed, briefed, debriefed or numbered.

Aeva Media rocks your life.

Nao 尚

#854
Made my quick benchmarking tool...

Here's how the time taken is split:

- 25% is taken by the first preg_match_all to find all links on the page, and most of it by the subsequent series of preg_replace
- 25% is taken by the sql query to retrieve the cached URLs
- and finally, 50% by the callback function.

Where do you think we could gain some speed?

Oh yes, and a question: is it normal that you're calling a preg_replace on PHPSESSID, sesc etc. in the first part *and* in the last part? I'm a bit lost here ;)

Edit -- another suggestion: include an index on the "log_time" field, because the main query always uses it in the WHERE clause. I think it would benefit from the index.
Or, maybe, just remove the log_time mention in the query, and remove older entries from time to time. Why the field, by the way? If this is to minimize risks of URL collision, then it's no longer needed, right?
I will not make any deals with you. I've resigned. I will not be pushed, filed, stamped, indexed, briefed, debriefed or numbered.

Aeva Media rocks your life.

Dannii

#855
http://dev.eldacar.com/smf/general-discussion/optimisation-etc/
Continuing here so as to not clutter up this support topic so much.
"Never imagine yourself not to be otherwise than what it might appear to others that what you were or might have been was not otherwise than what you had been would have appeared to them to be otherwise."

Nao 尚

I will not make any deals with you. I've resigned. I will not be pushed, filed, stamped, indexed, briefed, debriefed or numbered.

Aeva Media rocks your life.

ian1

I get this error when trying to install.  Any suggestions?

Type Action Description
1. Execute Modification ./index.php Test successful
2. Execute Modification ./SSI.php Test successful
3. Execute Modification ./Sources/ManageErrors.php Test successful
4. Execute Modification ./Sources/ModSettings.php Test successful
5. Execute Modification ./Sources/News.php Test successful
6. Execute Modification ./Sources/PackageGet.php Test successful
7. Execute Modification ./Sources/QueryString.php Test failed
8. Execute Modification ./Sources/Subs.php Test successful
9. Execute Modification ./Sources/Display.php Test successful
10. Execute Modification ./Sources/MessageIndex.php Test successful
11. Execute Modification ./Sources/QueryString.php Test successful
12. Execute Modification ./Sources/Subs-Boards.php Test successful
13. Execute Modification ./Sources/Subs-Post.php Test successful
14. Execute Modification ./Themes/default/languages/Modifications.english.php Test successful
15. Extract File ./Sources/PrettyUrls-Filters.php 
16. Extract File ./Sources/Subs-PrettyUrls.php 
17. Execute Code install.php



Nao 尚

If you have other mods installed, uninstall them before.
If you don't, just replace your copy of Sources/QueryString.php with a fresh one from the SMF install package.
I will not make any deals with you. I've resigned. I will not be pushed, filed, stamped, indexed, briefed, debriefed or numbered.

Aeva Media rocks your life.

Dannii

Indeed, there's probably a conflict with another mod. Which others do you have installed?
"Never imagine yourself not to be otherwise than what it might appear to others that what you were or might have been was not otherwise than what you had been would have appeared to them to be otherwise."

Advertisement: