News:

SMF 2.1.4 has been released! Take it for a spin! Read more.

Main Menu

MySQL, selecting from 2 tables ..

Started by sangwe11, November 23, 2009, 10:40:43 AM

Previous topic - Next topic

sangwe11

#20
Quote from: Arantor on November 23, 2009, 04:37:25 PM
Insert one, insert the other than update the first one with the newly insert id.

Looks like its gunna take a few queries..

Insert into pages table -> Select page id from pages table -> Insert into revisions table (using newly got page id) ->  Get revision id -> Update pages table ..

Arantor

Or...
Insert into pages table with $smcFunc['db_insert']
Get the id with $smcFunc['db_insert_id']
Insert into revisions table using the above-collected id
Update pages table with revisions id

3 queries, 4 if you're using PostgreSQL but you never see it.

sangwe11

Quote from: Arantor on November 23, 2009, 05:38:36 PM
Or...
Insert into pages table with $smcFunc['db_insert']
Get the id with $smcFunc['db_insert_id']
Insert into revisions table using the above-collected id
Update pages table with revisions id

3 queries, 4 if you're using PostgreSQL but you never see it.

Hmm, never new there was a insert_id function!

Thanks

Arantor

It's on the list on http://www.simplemachines.org/community/index.php?topic=224166.0

Ugly how it has to be done for PostgreSQL for earlier versions (below 8.3 if I remember rightly) but still...

sangwe11

I did a little optimization on my category tag parsing.

Before, I was selecting the category from the categories table, for each tag and seeing if it existed, and if so, hid the tag, and if it didn't, provide a link to create it.

I noticed this would be VERY bad on pages that have multiple categories (10+), so I moved into using a class, with the preg_replace_callback.

I changed my page selection code, to include the categories that are in the categories table (only the ones the page is in)

The code is as follows.

$result = $smcFunc['db_query']('', '
SELECT p.id , p.name, p.is_locked, p.topic_id, IFNULL(mem.real_name, r.member) AS member_name, r.content, r.date, GROUP_CONCAT(pc.category) AS categories, GROUP_CONCAT(c.name) AS created_categories
FROM {db_prefix}wiki_pages AS p
INNER JOIN {db_prefix}wiki_revisions AS r
ON (p.current_revision = r.id)
LEFT JOIN {db_prefix}members AS mem ON (r.member = mem.id_member)
LEFT JOIN {db_prefix}wiki_page_categories AS pc ON (pc.page = p.name)
LEFT JOIN {db_prefix}wiki_categories AS c ON (c.name = pc.category)
WHERE p.name = {text:name}
LIMIT 1',
array(
'name' => $pagename,
)
);


I then explode that column into an array (to make it easier later on)

// Explode the two category columns (as we grouped them into a list)
$page['categories'] = explode(',', $page['categories']);
$page['created_categories'] = explode(',', $page['created_categories']);


Now I have that included, to save me queries later, I needed to find a way to give the callback function the array of created cats.

Using global variables wouldn't work, as for every page, 2 pages are parsed (menu + page)

So I took a wild guess, and used a class.

// This looks a little complex, but its only like this to save us alot of queries (one per page category!)
$make_cat = new parse_categories();
$make_cat->categories = $page['created_categories'];
$content = preg_replace_callback("~\[category\](.+)\[/category\]~iU", array($make_cat, 'make_category'), $content);


And then the class

class parse_categories{

    public $categories;

    public function make_category($matches){
   
    global $scripturl;
    return (in_array($matches[1], $this->categories) ? '' : '<a href="'.$scripturl.'?action=wiki;sa=createcat;p='.$matches[1].'" style="color: #FF0000;">Category: '.$matches[1].'</a>');
   
    }
   
}


I was wondering if people thought it was a better solution, and more efficient, seeing as I don't have a way of testing it (haven't got a large dataset to test it with)

Arantor

Interesting solution. Nothing is shouting from the rooftops as being a performance concern.

sangwe11

Quote from: Arantor on November 24, 2009, 05:15:11 PM
Interesting solution. Nothing is shouting from the rooftops as being a performance concern.

Good good.

It was, as far as I could tell, the most practical (and shortest) solution. I might stumble upon, or someone might suggest a better way, but it works great, and as you said, nothing is standing out as a concern.

Just one last question. I know storing more than one bit of data in a column, isn't really a good idea. In my page select query, I put all the categories, and created categories, into one column (one for categories, one for created_categories), before splitting them into an array.

I see no problem with this, as its not actually being stored with more than one per column, and saves me 2 other queries.

However, I thought it best to ask ;D

Arantor

Breaks on non MySQL (meaning we won't approve it, since PostgreSQL doesn't have GROUP_CONCAT built in) and has a performance cost but other than that all good.

sangwe11

Quote from: Arantor on November 24, 2009, 05:32:17 PM
Breaks on non MySQL (meaning we won't approve it, since PostgreSQL doesn't have GROUP_CONCAT built in) and has a performance cost but other than that all good.

Hmm, well I need some sleep now, so I'll reconsider how to do it tommorrow.

Surely there has to be a way though, as doing another 2 queries seems pointless.

Arantor

JOINing to a separate table (since page -> category is by nature a many-many join rather than a one-many) would be cleanest.

sangwe11

Quote from: Arantor on November 24, 2009, 05:36:22 PM
JOINing to a separate table (since page -> category is by nature a many-many join rather than a one-many) would be cleanest.

Ahh yes, of course!

sangwe11

Hmm actually, now I have had a little think, how would this work ?

I'm trying to join the category table into the pages table, getting ONE row from the pages table and joining MANY rows from the category table to it. These then need to be returned in an array, or a list which we can split into an array later.

I've probably overlooked something, but I'm not sure.

Arantor

One to many joins is the natural heart of relational databases.

Think of boards->topics->messages: one board holds many topics/one topic lives in one board; one topic holds many messages/one message lives in one topic. That's a classic 1:m relationship. Pushing it into an array is simply a case of while($row = $smcFunc['db_fetch_assoc']($query)) { $rowset[] = $row; }

Depends what you want/need to do with it.

As for many-many, that's where you get into the realms of join tables. I can't give you a good example in SMF directly since SMF's code actually folds the relationship away (in what is IMO one of the only design flaws in the schema)

In your case, actually, let's think about how that'd work.

We have pages p, categories c. One category can hold many pages, one page can belong to many categories.

So what we do is an intermediate table, page_category. All it contains is a series of rows in two columns: page and category ids.

Then when we want the list of categories for a page, SELECT c.cat_id, c.cat_name FROM pages AS p INNER JOIN page_category AS pc ON (p.page_id = pc.page_id) INNER JOIN category AS c ON (pc.cat_id = c.cat_id) WHERE p.page_id = xyz.

Presto, you have a list of category ids and category names that page id xyz belongs to. Getting a list of pages in a category is just the reverse: SELECT p.page_id, p.page_name FROM category AS c INNER JOIN page_category AS pc ON (c.cat_id = pc.cat_id) INNER JOIN pages AS p ON (pc.page_id = p.page_id)


As you can imagine this can quickly get immensely complex; even the most complex queries in SMF never join more than about 7 tables (in some cases joining the same table to the query twice, under different joins, e.g. topic->message, joining against both first and last message ids)

sangwe11

Quote from: Arantor on November 26, 2009, 04:53:21 PM
One to many joins is the natural heart of relational databases.

Think of boards->topics->messages: one board holds many topics/one topic lives in one board; one topic holds many messages/one message lives in one topic. That's a classic 1:m relationship. Pushing it into an array is simply a case of while($row = $smcFunc['db_fetch_assoc']($query)) { $rowset[] = $row; }

Depends what you want/need to do with it.

As for many-many, that's where you get into the realms of join tables. I can't give you a good example in SMF directly since SMF's code actually folds the relationship away (in what is IMO one of the only design flaws in the schema)

In your case, actually, let's think about how that'd work.

We have pages p, categories c. One category can hold many pages, one page can belong to many categories.

So what we do is an intermediate table, page_category. All it contains is a series of rows in two columns: page and category ids.

Then when we want the list of categories for a page, SELECT c.cat_id, c.cat_name FROM pages AS p INNER JOIN page_category AS pc ON (p.page_id = pc.page_id) INNER JOIN category AS c ON (pc.cat_id = c.cat_id) WHERE p.page_id = xyz.

Presto, you have a list of category ids and category names that page id xyz belongs to. Getting a list of pages in a category is just the reverse: SELECT p.page_id, p.page_name FROM category AS c INNER JOIN page_category AS pc ON (c.cat_id = pc.cat_id) INNER JOIN pages AS p ON (pc.page_id = p.page_id)


As you can imagine this can quickly get immensely complex; even the most complex queries in SMF never join more than about 7 tables (in some cases joining the same table to the query twice, under different joins, e.g. topic->message, joining against both first and last message ids)

Isn't this what I was doing before, except I was grouping it ?

SELECT p.id , p.name, p.is_locked, p.topic_id, IFNULL(mem.real_name, r.member) AS member_name, r.content, r.date, GROUP_CONCAT(c.name) AS created_categories, GROUP_CONCAT(pc.category) AS categories
FROM {db_prefix}wiki_pages AS p
INNER JOIN {db_prefix}wiki_revisions AS r
ON (p.current_revision = r.id)
LEFT JOIN {db_prefix}members AS mem ON (r.member = mem.id_member)
LEFT JOIN {db_prefix}wiki_page_categories AS pc ON (pc.page = p.name)
LEFT JOIN {db_prefix}wiki_categories AS c ON (c.name = pc.category)
WHERE p.name = {text:name}
LIMIT 1


Arantor

No, you were pulling it into a single column, which apart from being inefficient (since you only then have to split it again after), is not cross DB compliant, because GROUP_CONCAT does not exist except in a UDF setup on PostgreSQL, which means we can't approve it anyway.

I also wouldn't bundle it into the same query, I'd actually be more explicit and have a query for categories separately. Daft as it might sound, in virtually every case it'll perform better.

sangwe11

Quote from: Arantor on November 26, 2009, 05:02:23 PM
No, you were pulling it into a single column, which apart from being inefficient (since you only then have to split it again after), is not cross DB compliant, because GROUP_CONCAT does not exist except in a UDF setup on PostgreSQL, which means we can't approve it anyway.

I also wouldn't bundle it into the same query, I'd actually be more explicit and have a query for categories separately. Daft as it might sound, in virtually every case it'll perform better.

Well I think if I do split it into multiple queries, I can keep it down to 2.

Arantor

You don't need to go beyond that - one to get the page, one to get the list of categories that page is in.

sangwe11

Quote from: Arantor on November 26, 2009, 05:17:37 PM
You don't need to go beyond that - one to get the page, one to get the list of categories that page is in.

Well, basically, I need to select the page, which I have got already, the pages categories, and the categories out of them which are created.

I have the page query here.

SELECT p.id , p.name, p.is_locked, p.topic_id, IFNULL(mem.real_name, r.member) AS member_name, r.content, r.date
FROM {db_prefix}wiki_pages AS p
INNER JOIN {db_prefix}wiki_revisions AS r
ON (p.current_revision = r.id)
LEFT JOIN {db_prefix}members AS mem ON (r.member = mem.id_member)
WHERE p.name = {text:name}
LIMIT 1


I now need a query to select the pages categories from wiki_page_categories, that then also checks the wiki_categories to see if that category is created, and if so, it gives it a 1 (true), if not a false.

So, the output should be ..

Page categories => ("name" => "Main", "created" => "1 or 0"),

I was thinking something along the lines of this.

SELECT pc.category, c.name
FROM {db_prefix}wiki_page_categories AS pc
INNER JOIN {db_prefix}wiki_categories AS c ON (c.name = pc.category)
WHERE pc.page = {text:name}


This doesn't do what I quite do, I need to get it to do a sort of if empty(c.name) return 0 else 1 sort of thing.

Arantor

So you populate page_categories on page save?

Then it's easy, LEFT JOIN instead of INNER JOIN, and turn c.name into IFNULL(c.name, 0) AS name

Returns 0 if not created or the name if it is. Or even, thinking about it, just c.name as-is and compare whether it's null when you get it back.

sangwe11

Quote from: Arantor on November 26, 2009, 05:37:35 PM
So you populate page_categories on page save?

Then it's easy, LEFT JOIN instead of INNER JOIN, and turn c.name into IFNULL(c.name, 0) AS name

Returns 0 if not created or the name if it is. Or even, thinking about it, just c.name as-is and compare whether it's null when you get it back.

So far I have this.

SELECT pc.category, IFNULL(c.name, "0") AS created
FROM {db_prefix}wiki_page_categories AS pc
LEFT JOIN {db_prefix}wiki_categories AS c ON (c.name = pc.category)
WHERE pc.page = {text:name}


Suddenly thought of the IFNULL before I read your post, but thanks for the help anyway.

Now just to get it to show as 1 if it is not null.

Is there a !IFNULL or anything ?

Advertisement: