Customizing SMF > SMF Coding Discussion
Best way to parse bbcode in database?
DaKrampus:
Hello,
I am trying to optimize a mod i am writing,
I ran into following problem:
A page is displaying a list of words and their definitions (sort of glossary)
the definitions are bbcoded.
if pagination is not set (user option), the list can get quite long.
I activated page generation time AND added memory_get_peak_usage() to output
Now i ran some tests with the parse_bbc() and when i have like 80 entries,
i get following output:
Page created in 0.109 seconds with 7 queries. Memory Peak: 13.11 MB
when not parsing bbcode:
Page created in 0.044 seconds with 7 queries. Memory Peak: 7.77 MB
So i decided to add an extra field to the db and save an allready parsed version to the db.
when i then call the page, it looks exactly the same as the parsed one, but i do not have to do the parsing..
and i get following speed and mem:
Page created in 0.044 seconds with 7 queries. Memory Peak: 7.78 MB
So I will definetly opt for that, as only admin can save definitions.
(def will be saved twice, once normally and once parsed.)
The only question I have: Is there an issue when cleaning the definition for UTF-8 or not UTF-8???
i do for the moment the following way for both:
note: only [ b], [ u] and [ i] tags are used. And of course I want the line feeds and carriage returns as <br />
--- Code: --- preparsecode($definition);
$definition = strtr($definition, array('<br />' => "\n", '</div>' => "\n", '</li>' => "\n", "\r\n" => "\n", "\r" => "\n", "\cr" => "\n",'[' => '[', ']' => ']'));
$definition = strip_tags($definition);
$definition_parsed = trim(un_htmlspecialchars(parse_bbc(htmlspecialchars($definition), false, '', array('b', 'u', 'i'))));
$definition_parsed = strtr($definition_parsed, array("\n" => '<br />'));
--- End code ---
$definition and $definition_parsed being saved in 2 different text-fields
it works fine in utf-8 (even tested chinese chars), but i wondered (as i am not really familiar with all smf issues) if there was something else todo?
(Note: i do NOT want to use addslashes() )
Help would be very appreciated.
DaKrampus
PS: I wont be able to answer before tomorow, as I'm just off for a 14 hour drive.
Ahh and almost forgot. parse_bbc parses [ b] as <strong>. is there a way of making it do <b>? because in my case you often only set 1 letter of the word inside [ b] tags (example when explaining LOL => Laughing Out Loud . Here the <strong> tag should be <b> for semantic reasons - screen readers would have unexpected results.)
MrPhil:
parse_bbc() is known to be a very slow and costly function to run. It could probably use a complete rewrite. I recall that the subject of caching the parsed posts in the database has been discussed before, so you may want to search for that discussion.
It is trivial to change [b] from <strong> to <b>. It's hard to believe that you're competent enough to write a complex mod, but can't find the section of code
--- Code: --- array(
'tag' => 'b',
'before' => '<strong>',
'after' => '</strong>',
),
--- End code ---
and change strong to b!
DaKrampus:
Thanks for your thoughts...
as I am coming from the vbulletin and xenforo side, I have no idea of the basic code structure smf..
Ok, i agree it doesnt look too complicated.. ;) but it takes time to get used to it. (and one thing I dont have is time)
LoL.. no changing strong to b is trivial, I agree...
but thats not what i wanted..
I want to keep strong... because this gives more importance to a word.. in search engines and screen readers. actualy I had thought, before having to go through tons of code... that someone would have pointed me to an existing mod or code snippet
It seems though I will have to write myself..
If whole word.. sentence etc (with spaces before and after) is marked.. then do <strong>
if single letter or part of word (no space before or after is marked.. then do <b>
Actually i have to do it that way (and not -> in threads do strong and in mymod do b, because it will be used on same page.
thanks for the hint.. I will search those for discussions
DaKrampus
emanuele:
--- Quote from: DaKrampus on July 14, 2012, 07:12:26 AM ---The only question I have: Is there an issue when cleaning the definition for UTF-8 or not UTF-8???
i do for the moment the following way for both:
note: only [ b], [ u] and [ i] tags are used. And of course I want the line feeds and carriage returns as <br />
--- Code: --- preparsecode($definition);
$definition = strtr($definition, array('<br />' => "\n", '</div>' => "\n", '</li>' => "\n", "\r\n" => "\n", "\r" => "\n", "\cr" => "\n",'[' => '[', ']' => ']'));
$definition = strip_tags($definition);
$definition_parsed = trim(un_htmlspecialchars(parse_bbc(htmlspecialchars($definition), false, '', array('b', 'u', 'i'))));
$definition_parsed = strtr($definition_parsed, array("\n" => '<br />'));
--- End code ---
$definition and $definition_parsed being saved in 2 different text-fields
--- End quote ---
I'm not the best person to discuss about charsets (I'm pretty ignorant on the field), I just want to highlight that SMF has a "custom" version of htmlspecialchars ($smcFunc['htmlspecialchars']) that should automatically take in consideration the encoding.
--- Quote from: DaKrampus on July 14, 2012, 07:12:26 AM ---Ahh and almost forgot. parse_bbc parses [ b] as <strong>. is there a way of making it do <b>? because in my case you often only set 1 letter of the word inside [ b] tags (example when explaining LOL => Laughing Out Loud . Here the <strong> tag should be <b> for semantic reasons - screen readers would have unexpected results.)
--- End quote ---
In that case you may want to add a new bbcode tag along with [b], for example you can modify [b] so that it renders <b> and add a new bbcode [strong] so that it renders <strong>.
That would simply be:
--- Code: --- array(
'tag' => 'b',
'before' => '<b>',
'after' => '</b>',
),
array(
'tag' => 'strong',
'before' => '<strong>',
'after' => '</strong>',
),
--- End code ---
Arantor:
If not, there's no way directly in the tag parser to do what is being requested; when processing a tag, even one that is set up to do content-specific processing, it only receives its own contents (unless it's a nestable tag, in which case it may not even receive that!)
If you did want to make it context sensitive, you'd actually be looking at modifying the parser elsewhere, either in the preparser to search the post on saving and rewrite the b bbcode to strong as appropriate (like above), and then it'll be saved like that, or you do it after the main parsing stage has occurred and then apply a regular expression or similar when it's displayed. Both of these have their advantages and disadvantages...
Navigation
[0] Message Index
[#] Next page
Go to full version