News:

SMF 2.1.4 has been released! Take it for a spin! Read more.

Main Menu

Import of XML data into a custom table

Started by DaeronAlagos, November 18, 2014, 01:46:11 PM

Previous topic - Next topic

DaeronAlagos

Hi,

Can someone please explain what I've done wrong in the following function. I'm trying to insert data into a new table I've created (using phpmyadmin), that is taken from an XML feed. I've looked at several of the other core files for an example, and done many Google / Forum searches, and I cannot get past the following error:

The database value you're trying to insert does not exist: tu_card_id

I'm not sure if it's something I've done incorrectly when creating the table, or if it's just my code. I started off with the thread 'How to create Scheduled Tasks in SMF 2.0' and I've got the Scheduled tasks page set up fine (thanks for the tutorial).

I haven't even created that much, so I'm hoping it something simple, or even if I'm told to start again following instructions. Either way I'd be extremely greatful for some help.

// Update the database from the official cards.xml file
function scheduled_update_cardsxml()
{
    global $txt, $sourcedir, $scripturl, $smcFunc, $modSettings, $language;

    $cardsxml=simplexml_load_file("http://mobile.tyrantonline.com/assets/cards.xml");
    foreach($cardsxml->unit as $cards) {
        $smcFunc['db_insert']('',
            '{db_prefix}tu_cards',
            array(
                'tu_card_id' => 'int', 'tu_card_name' => 'string',
            ),
            array(
                $cards['id'], $cards['name'],
            )
        );
    }
}


Kind Regards,

Daeron.

Arantor

I see two things wrong here.

Firstly, you need to supply an index parameter as the fifth parameter to your db_insert call, in this case likely you mean tu_card_id. This needs to really be the indexes on the table, usually the primary key but a unique key would also work.

Secondly, and more importantly, db_insert supports multi-row insertion rather than one row at a time without hammering the database, and if you're doing this regularly, this is important - though I am curious to know how you plan on checking existing data to avoid duplication.

I would suggest something like this is what you're looking for:
// Update the database from the official cards.xml file
function scheduled_update_cardsxml()
{
global $smcFunc;

$cardsxml = simplexml_load_file("http://mobile.tyrantonline.com/assets/cards.xml");
$rows = array();

foreach($cardsxml->unit as $cards)
{
$rows[] = array($cards['id'], $cards['name']);
}

$smcFunc['db_insert']('',
'{db_prefix}tu_cards',
array(
'tu_card_id' => 'int', 'tu_card_name' => 'string',
),
$rows,
'tu_card_id'
);
}


There is also an implicit trusting of that XML file to not have vulnerable data in it, either by way of an XXE vulnerability or otherwise unencoded data but that's not really something I can advise you on too much at this stage without a thorough examination of the XML file.

DaeronAlagos

Thanks for the reply Arantor,

Second part first ;) I was going to use 'ignore' to control if a record already exists for a 'tu_card_id'. Replace, is an option as the xml file only gets added to, the existing data in it doesn't change.

First part second, since making the original post here I've re-created the table with a third column to use as the primary key. I realised that using the tu_card_id might not be suitable (even though each is unique).

Export of table:

CREATE TABLE IF NOT EXISTS `smf_tu_cards` (
  `card_id` int(11) NOT NULL AUTO_INCREMENT,
  `tu_card_id` int(11) NOT NULL,
  `tu_card_name` varchar(255) NOT NULL,
  PRIMARY KEY (`card_id`)
) ENGINE=MyISAM  DEFAULT CHARSET=latin1 AUTO_INCREMENT=3 ;

--
-- Dumping data for table `smf_tu_cards`
--

INSERT INTO `smf_tu_cards` (`card_id`, `tu_card_id`, `tu_card_name`) VALUES
(1, 1, 'Infantry'),
(2, 4, 'Bazooka Marine');


Regarding vulnerability, the xml source is part of a game, and as such is used by the game clients as well as other third part tools, so it is extremely unlikely that any malicious code would ever be contained in there.

And sadly I'm still receiving the The database value you're trying to insert does not exist: tu_card_id error, which now leads me to believe that there is something wrong with the way I have created the table.

Thanks again for your help with this, once this issue is resolved I can then move on to the next part of the project which is creating an additional page on the user profile to allow our members to select validated cards from this list to record their own personal decks.

Arantor

OK, now I was able to look at the XML, the answer is you're doing it wrong. Took me a while to remember how SimpleXML works.

Something like this should work better:

// Update the database from the official cards.xml file
function scheduled_update_cardsxml()
{
global $smcFunc;

$cardsxml = simplexml_load_file("http://mobile.tyrantonline.com/assets/cards.xml");
$rows = array();

foreach ($cardsxml->unit as $cards)
{
$rows[] = array((int) $cards->id, (string) $cards->name);
}

$smcFunc['db_insert']('',
'{db_prefix}tu_cards',
array(
'tu_card_id' => 'int', 'tu_card_name' => 'string',
),
$rows,
'tu_card_id'
);

return true;
}


You see, $cards['id'] implied an id attribute of the cards element (in this case, each instance of <unit>) but there's no attribute so it's calling you out on the lack of attribute being passed on.

So instead it needs to be $cards->id to match the <id> element inside the <unit> element and I seem to recall one can simply typecast such elements' contents like this. Not tested, though.

I forgot to spell it out in the tutorial but you need to return true from the scheduled task to notify the handler it completed successfully.

DaeronAlagos

Thank you for your help Arantor, it worked perfectly.

Sorry for getting the simplexml part wrong, I hoped at least I had that right so you only had to help with the SMF part  :(

I also learnt that I need to keep not SMF actions separate from any internal ones (I tried to combine the simplexml foreach with the db_insert).

Thanks again for your help  :D

Arantor

No worries :) It had been a while since I'd used it so it didn't occur to me to double check it until after I'd looked at the XML to be sure.

You don't explicitly need to keep them separate normally. The separation I've suggested is one that has a very direct performance consideration.

If you operate the code as originally outlined, you're doing one database insertion per entry in the XML. I didn't look at how big that XML file was but it easily looked like it ran into the hundreds if not the thousands of entries.

Which means hundreds if not thousands of database queries - which then translates into a lot of processing over and above what was actually necessary.

All I did was rearrange things to build one single query rather than hundreds or more. This means one query to the database, one bulk set of DB processing (which includes things like index key rebuilds) rather than having a much larger incremental workload.

Outside of that consideration, you could quite happily have plugged the values in directly into db_insert and it would have worked just fine.

Advertisement: