News:

Bored?  Looking to kill some time?  Want to chat with other SMF users?  Join us in IRC chat or Discord

Main Menu

MySQL, selecting from 2 tables ..

Started by sangwe11, November 23, 2009, 10:40:43 AM

Previous topic - Next topic

sangwe11

I have to ways to do what I want, but I'm curious to know which would be more efficient, and cause less load.

The first way is to do 2 seperate querys, one to the main page table, and one to the revisions table.

<?PHP

$result = $smcFunc['db_query']('', '
SELECT *
FROM {db_prefix}wiki_pages
WHERE name = {text:name}
LIMIT 1',
array(
'name' => $pagename,
)
);

// Grab the page array from the database query result.
$page = $smcFunc['db_fetch_assoc']($result);

$result = $smcFunc['db_query']('', '
SELECT *
FROM  {db_prefix}wiki_revisions
WHERE name = {text:name}
AND date = {text:rev}
LIMIT 1',
array(
'name' => $page['name'],
'rev' => $page['current_revision'],
)
);

// Grab the revision array from the database query result.
$revision = $smcFunc['db_fetch_assoc']($result);

// Merge the two arrays
$page = array_merge($page, $revision);

?>


The second way is to do one query, that joins theses two tables, when selecting the row.

<?PHP

$result = $smcFunc['db_query']('', '
SELECT *
FROM {db_prefix}wiki_pages, {db_prefix}wiki_revisions
WHERE {db_prefix}wiki_pages.name = {text:name}
AND {db_prefix}wiki_revisions.name = {text:name}
AND ({db_prefix}wiki_revisions.date = {db_prefix}wiki_pages.current_revision)
LIMIT 1',
array(
'name' => $pagename,
)
);

?>


Which way would be more efficient, and more code friendly (for the latter, I think the second method is)

Any comments / suggestions / feedback please :D

Arantor

First up, you REALLY should also use a JOIN clause in the latter, because this forces it to do multiple comparisons when it needn't.

$result = $smcFunc['db_query']('', '
SELECT *
FROM {db_prefix}wiki_pages AS p
INNER JOIN {db_prefix}wiki_revisions AS r ON (p.name = r.name AND p.current_revision = r.date)
WHERE p.name = {text:name}
LIMIT 1',
array(
'name' => $pagename,
)
);


Tell you what though, this is a bad, bad idea. As a general rule, NEVER join tables on textual ids. The performance cost of doing this is incredible.

Suggestion: create a single id, numeric auto_increment. When you insert a new row you will automatically increment that - kind of like id_msg. Then update that in any other tables. Notionally you'd have the new column in wiki_pages and wiki_revisions would hold the page's title and simply refer to the current revision.

Let me explain how I'd see this happening, with also the index structures in play:

wiki_revisions
id_master, page_id, page_content
id_master = primary key, page_id = non-unique indexed

wiki_pages
id_page, revision_id, page_title
id_page = primary key (hmm, not sure about a secondary index, would benchmark that with EXPLAIN)

id_master and id_page are both auto increment.

So, the first page:
id_masterpage_idpage_content
11some stuff

id_pagerevision_idpage_title
11Some Page

Now we edit Some Page's content.
id_masterpage_idpage_content
11some stuff
21some more stuff

id_pagerevision_idpage_title
12Some Page

Now we edit it some more.
id_masterpage_idpage_content
11some stuff
21some more stuff
31even some more stuff

id_pagerevision_idpage_title
13Some Page

Now we add a new page.
id_masterpage_idpage_content
11some stuff
21some more stuff
31even some more stuff
42another page

id_pagerevision_idpage_title
13Some Page
24Some Page

So as you see, you query the pages (second in the above examples) table to get the title, which will give you not only the immediate id of the latest revision, but also gives you a numeric id to group the results upon (again faster than using not) as well as selecting them later.

sangwe11

Ahh I see, I did think there would be a performance cost, however I didn't know that it was because I was using textual ids to join.

I'll rewrite the query, based on your post.

Anyway, if its done the way you said, which out of the two ways would be more efficient ? (2 queries or the proper join method)

Arantor

If it's against a textual id, difference is kind of moot; because the hurt of text ids is enough of a killer it almost doesn't matter.

Joining on properly indexed numeric ids... JOIN pretty much every time. In reality you'd get a decent enough data set and benchmark the two queries, but I'd be using that to figure out if we needed another index in there rather than changing the methodology.

sangwe11

Right, so this is what I have ..

Table smf_wiki_pages:

id            mediumint( 8 )
name         varchar( 80 )
current_revision   mediumint( 8 )
is_locked         tinyint( 1 )
topic_id         mediumint( 8 )

Table smf_wiki_revisions:

id         mediumint( 8 )
page       mediumint( 8 )
date         timestamp
member      varchar( 80 ) # this isn't a member id, because we only need the username - saves a db query ;D
content      text
approved      tinyint( 1 )




And then the query code.

$result = $smcFunc['db_query']('', '
SELECT *
FROM {db_prefix}wiki_pages AS p
INNER JOIN {db_prefix}wiki_revisions AS r
ON (p.current_revision = r.id)
WHERE p.name = {text:name}
LIMIT 1',
array(
'name' => $pagename,
)
);


Arantor

Well, the structure looks about right. Note that all your numeric columns should be unsigned, and you should never be selecting * when you don't actually need all the columns. I won't even get into the issues of reusing 'id' when combining tables.

The point of username vs user id is interesting. Your method is probably slower than an INNER JOIN to the members table (which enforces there being a match on the other side)

Best compromise I can suggest: have both. You add 3 bytes per row to include the id_member (it's a mediumint(8) unsigned), LEFT JOIN the members table, and IFNULL check:

$result = $smcFunc['db_query']('', '
SELECT p.id AS id_page, otherfield, otherfield, field, IFNULL(mem.real_name, r.member) AS member_name
FROM {db_prefix}wiki_pages AS p
INNER JOIN {db_prefix}wiki_revisions AS r
ON (p.current_revision = r.id)
LEFT JOIN {db_prefix}members AS mem ON (r.id_member = mem.id_member)
WHERE p.name = {text:name}
LIMIT 1',
array(
'name' => $pagename,
)
);

sangwe11

Quote from: Arantor on November 23, 2009, 02:03:59 PM
Well, the structure looks about right. Note that all your numeric columns should be unsigned, and you should never be selecting * when you don't actually need all the columns. I won't even get into the issues of reusing 'id' when combining tables.

The point of username vs user id is interesting. Your method is probably slower than an INNER JOIN to the members table (which enforces there being a match on the other side)

Best compromise I can suggest: have both. You add 3 bytes per row to include the id_member (it's a mediumint(8) unsigned), LEFT JOIN the members table, and IFNULL check:

$result = $smcFunc['db_query']('', '
SELECT p.id AS id_page, otherfield, otherfield, field, IFNULL(mem.real_name, r.member) AS member_name
FROM {db_prefix}wiki_pages AS p
INNER JOIN {db_prefix}wiki_revisions AS r
ON (p.current_revision = r.id)
LEFT JOIN {db_prefix}members AS mem ON (r.id_member = mem.id_member)
WHERE p.name = {text:name}
LIMIT 1',
array(
'name' => $pagename,
)
);


What happens when a guest edits it though ?

As when this happens, I have it entering the IP address of the guest.

Arantor

That's why you have the LEFT JOIN with IFNULL, and why I specifically said both. Note this is EXACTLY what SMF itself does, so if you don't have an id_member join, the LEFT JOIN returns NULL for that column from the join itself.

sangwe11

Quote from: Arantor on November 23, 2009, 02:11:19 PM
That's why you have the LEFT JOIN with IFNULL, and why I specifically said both. Note this is EXACTLY what SMF itself does, so if you don't have an id_member join, the LEFT JOIN returns NULL for that column from the join itself.

Ahh, so if I store all users as a id, and all guests as a ip, and then do the left join with if null, it would work as wanted ?

If so, what would the IF NULL part look like in the query.

Arantor


sangwe11

Quote from: Arantor on November 23, 2009, 02:24:29 PM
Like the query I already gave you.

I saw no if null in there though, or is that included in left join ?

I really should research mysql joins a bit more. I know the basics, but not alot else. (first time i've needed them)

Arantor

OK, let's repeat the query:
$result = $smcFunc['db_query']('', '
SELECT p.id AS id_page, otherfield, otherfield, field, IFNULL(mem.real_name, r.member) AS member_name
FROM {db_prefix}wiki_pages AS p
INNER JOIN {db_prefix}wiki_revisions AS r
ON (p.current_revision = r.id)
LEFT JOIN {db_prefix}members AS mem ON (r.id_member = mem.id_member)
WHERE p.name = {text:name}
LIMIT 1',
array(
'name' => $pagename,
)
);


The IFNULL is in the select since you're selecting it. And you're selecting the field, or if that's null an alternative field.

sangwe11

#12
Quote from: Arantor on November 23, 2009, 02:54:49 PM
OK, let's repeat the query:
$result = $smcFunc['db_query']('', '
SELECT p.id AS id_page, otherfield, otherfield, field, IFNULL(mem.real_name, r.member) AS member_name
FROM {db_prefix}wiki_pages AS p
INNER JOIN {db_prefix}wiki_revisions AS r
ON (p.current_revision = r.id)
LEFT JOIN {db_prefix}members AS mem ON (r.id_member = mem.id_member)
WHERE p.name = {text:name}
LIMIT 1',
array(
'name' => $pagename,
)
);


The IFNULL is in the select since you're selecting it. And you're selecting the field, or if that's null an alternative field.

* sangwe11 face palm

Sorry, didn't notice you'd changed the select part.

Arantor

Ah, it's easy to miss...

* Arantor hands out his favourite smiley on his dev site (frequently using this myself)

sangwe11

#14
Quote from: Arantor on November 23, 2009, 02:57:57 PM
Ah, it's easy to miss...

* Arantor hands out his favourite smiley on his dev site (frequently using this myself)

Added :D

Woop, working great Arantor!

Just a check, but do we need to add an index to the member column (on the revisions table) ?

Arantor

I'm not sure. I'd wait until you have some big data sets, then run the select with EXPLAIN in front of it, that way you can see how MySQL evaluates the query, and using that I can help you figure out how to add smarter indexes.

Book recommendation: High Performance MySQL (2nd ed) by Baron Schwartz (and others)

sangwe11

Quote from: Arantor on November 23, 2009, 03:41:52 PM
I'm not sure. I'd wait until you have some big data sets, then run the select with EXPLAIN in front of it, that way you can see how MySQL evaluates the query, and using that I can help you figure out how to add smarter indexes.

Book recommendation: High Performance MySQL (2nd ed) by Baron Schwartz (and others)

For big data sets, are we talking 1000's or 10,000's ?

Arantor

Hundreds of rows should be enough, though when I say about 'big' I tend to refer to 100,000s or more (when supporting Sphinx, I used to deal with sets into the millions, on multiple occasions, the billions)

sangwe11

Quote from: Arantor on November 23, 2009, 04:27:07 PM
Hundreds of rows should be enough, though when I say about 'big' I tend to refer to 100,000s or more (when supporting Sphinx, I used to deal with sets into the millions, on multiple occasions, the billions)

:o
* sangwe11 is speechless!

Anywho, I've noticed a problem with doing the db tables the way decided.

When inserting a new page, I have to enter the name, and the current revision id.

I don't know this, until I enter the revision into the revisions table.

Ok you say, do the revisions table part first.

So, I go to insert into the revisions table, but oh no! I need the page id, and I can't get this until I insert the page into the pages table.

Its a vicious loop, and I'm trying to see how (if it does it anything like this) MediaWiki does it, but I figured I ask incase someone new a way also.

Arantor

Insert one, insert the other than update the first one with the newly insert id.

Advertisement: