News:

Bored?  Looking to kill some time?  Want to chat with other SMF users?  Join us in IRC chat or Discord

Main Menu

Converter shiz: WTF does this mean?

Started by Antechinus, April 17, 2022, 03:32:49 AM

Previous topic - Next topic

Tyrsson

PM at your own risk, some I answer, if they are interesting, some I ignore.


Tyrsson

Quote from: Antechinus on April 24, 2022, 09:13:49 PMSNAFU :D
Lol, yea. I was jokin, cause I really do not see how that would simplify anything there other than, that is just "how" they do it, so its expected to be that way. Im sure there is a reason, I just dont know it roflmao.
PM at your own risk, some I answer, if they are interesting, some I ignore.

Sesquipedalian

Once again you will need preg_replace_callback(). This time, however, you will need to perform a database query inside the callback function in order to find the correct attachment ID number.

Since the string itself doesn't include the post ID, you'll need to pass that to your callback function via the use keyword. If you look at https://www.php.net/manual/en/function.preg-replace-callback.php, the first user comment shows you exactly how to do this.
Slava Ukraini!
Heroiam slava!

I promise you nothing.

Sesqu... Sesqui... what?
Sesquipedalian, the best word in the English language.

Antechinus

Ooooooo kkkkkkkkk. So, in essence...

1/ Get the post ID
2/ Use that to query the attachments table
3/ Get the array of attachments that match that post ID.
4/ Reverse sort them by attachment ID number
5/ That should, if plugged back into the posts table, give you the right attachments in the right order.
6/ BBC tags can then have numbers assigned by attachment ID number, a la SMF 2.1.

I get the general idea, I think.

Antechinus

Ok, simple stuff (because I can just tell the attachments are going to be "fun")...

I want to call in the existing forum directory url as a variable. I assume it can be done like this:

/* NOTE: Escape any full stops, etc in the url (as \. ). */
$old_forum = 'https://existing\.com/forums';
/* NOTE: Do NOT escape them in this one! */
$new_forum = 'https://existing.com/forums';
/* This does the major work first. */
$row['body'] = preg_replace_callback(
array(
/*--- Internal links: to be converted to SMF c/board/topic/msg format. ---*/
/*--- Internal links to categories (Need to know ID's -phpBB quirk!). ---*/
'~<URL (.+?)<s>\[url=' . $existing_forum . '/(.+?)f=(\d+?)]</s>~s',
/*--- Internal links to boards (Need to know ID's -phpBB quirk!). ---*/
'~<URL (.+?)<s>\[url=' . $old_forum . '/(.+?)f=(\d+?)]</s>~s',
/*--- Internal links to topics. ---*/
'~<URL (.+?)<s>\[url=' . $old_forum . '/(.+?)t=(\d+?)]</s>~s',
/*--- Internal links to posts. ---*/
'~<URL (.+?)<s>\[url=' . $old_forum . '/(.+?)#p(\d+?)]</s>~s',
/*--- Internal links to members. ---*/
'~<URL (.+?)<s>\[url=' . $existing_forum . '/(.+?);u=(\d+?)]</s>~s',
/*--- End of internal links. ---*/
),
array(
/*--- Internal links: to be converted to SMF c/board/topic/msg format. ---*/
/*--- Internal links to categories (Need to know ID's -phpBB quirk!). ---*/
'[url=' . $new_forum . '/index.php?c=$3]',
/*--- Internal links to boards (Need to know ID's -phpBB quirk!). ---*/
'[url=' . $new_forum . '/index.php?board=$3]',
/*--- Internal links to topics. ---*/
'[url=' . $new_forum . '/index.php?topic=$3]',
/*--- Internal links to posts. ---*/
'[url=' . $new_forum . '/index.php?msg=$3]',
/*--- Internal links to members. ---*/
'[url=' . $new_forum . '/index.php?action=profile;u=$3]',
/*--- End of internal links. ---*/
), $row['body']);

Sesquipedalian

Er, no. When using preg_replace_callback(), the second parameter needs to be a function, not just an array of strings. Look again at the example I gave you here, and at the manual page for preg_replace_callback().
Slava Ukraini!
Heroiam slava!

I promise you nothing.

Sesqu... Sesqui... what?
Sesquipedalian, the best word in the English language.

Antechinus

I did. They don't make a damned bit of sense to me (see cartoon about dogs). :D

Ok, the only reason I went to pre_replace_callback was because of the font-size conversion. TBH, if it's going to mean writing functions everywhere instead of being able to drop in a simple variable, I'd be inclined to just strip the font-size tags, or default them all to 1em. IMO they're not critical content anyway. 99% of people won't care if old posts are all standard font-size.

I'm just wanting this thing usable, not perfect.

live627

Quote from: Antechinus on April 24, 2022, 09:51:08 PMOoooooo kkkkkkkkk. So, in essence...

1/ Get the post ID
2/ Use that to query the attachments table
3/ Get the array of attachments that match that post ID.
4/ Reverse sort them by attachment ID number
5/ That should, if plugged back into the posts table, give you the right attachments in the right order.
6/ BBC tags can then have numbers assigned by attachment ID number, a la SMF 2.1.

I get the general idea, I think.
This is the same idea employed by the old ILA mod by @Spuds (and I think is in his forum platform)

Antechinus

I'm sure it's very cool, but frankly I have no idea how to write such code, and at the moment I don't really want to have to spend a lot of time learning how to write such code. I'm after a quick and effective compromise, mainly based on code I can realistically write myself (with a few simple pointers from others).

At the moment I'm thinking that as long as attachments per se convert without issues, it may be best to deal with the inline attachment tags by just converting them to display the actual file name. That way editing a post to get the right attachments back in the right places will be easy. Editing would only be done on posts where people think it matters, with the rest being ignored, but if you have the relevant file name where the inline attachment used to be you can still follow what was intended by referring to the standard (not inline) attachments beneath a post.

So offhand, convert this:
<ATTACHMENT filename=\"banner_6.jpg\" index=\"0\"><s>[attachment=0]</s>banner_6.jpg<e>[/attachment]</e></ATTACHMENT>
To this:
[attach]banner_6.jpg[/attach]
Which is fairly tidy, and IMO is probably good enough.

Sesquipedalian

Quote from: Antechinus on April 25, 2022, 12:53:05 AMI did. They don't make a damned bit of sense to me (see cartoon about dogs). :D

:) Okay then. So, for the URL replacements you posted here, just stick with plain old preg_replace() rather than preg_replace_callback(). You can do that because you are simply replacing one static string that you already know with another that you already know.

Quote from: Antechinus on April 25, 2022, 06:47:34 PMAt the moment I'm thinking that as long as attachments per se convert without issues, it may be best to deal with the inline attachment tags by just converting them to display the actual file name. [snip...]

Well, you could do that, but you'll just end up with literal "[attach]banner_6.jpg[/attach]" strings being displayed in the posts, because SMF will skip any [attach] BBCode that doesn't have an id attribute. So if that's the plan, you might as well remove the BBCode tags entirely.

But don't give up on a proper conversion yet. ;)

Have the attachments already been converted before the post text is converted?
Slava Ukraini!
Heroiam slava!

I promise you nothing.

Sesqu... Sesqui... what?
Sesquipedalian, the best word in the English language.

Antechinus

Quote from: Sesquipedalian on April 25, 2022, 07:59:09 PMWell, you could do that, but you'll just end up with literal "[attach]banner_6.jpg[/attach]" strings being displayed in the posts...
I know. :) My thinking was...
QuoteThat way editing a post to get the right attachments back in the right places will be easy. Editing would only be done on posts where people think it matters, with the rest being ignored, but if you have the relevant file name where the inline attachment used to be you can still follow what was intended by referring to the standard (not inline) attachments beneath a post.
IOW, I think it is better than deleting the tags entirely (which was my first thought).


QuoteBut don't give up on a proper conversion yet. ;)

Have the attachments already been converted before the post text is converted?
Not in the existing script. It does attachments last. However, this may just be the order someone threw things together. I don't know if attachments have to be done last, or if they can be done earlier. Just looking at what the script contains, my gut is telling me it could be done earlier.
/******************************************************************************/
--- Converting attachments...
/******************************************************************************/

---* {$to_prefix}attachments
---{
$no_add = true;

if (!isset($oldAttachmentDir))
{
$result = convert_query("
SELECT config_value
FROM {$from_prefix}config
WHERE config_name = 'upload_path'
LIMIT 1");
list ($oldAttachmentDir) = convert_fetch_row($result);
convert_free_result($result);

if (empty($oldAttachmentDir) || !file_exists($_POST['path_from'] . '/' . $oldAttachmentDir))
$oldAttachmentDir = $_POST['path_from'] . '/file';
else
$oldAttachmentDir = $_POST['path_from'] . '/' . $oldAttachmentDir;
}

/* Get $id_attach. */
if (empty($id_attach))
{
$result = convert_query("
SELECT MAX(id_attach) + 1
FROM {$to_prefix}attachments");
list ($id_attach) = convert_fetch_row($result);
convert_free_result($result);

$id_attach = empty($id_attach) ? 1 : $id_attach;
}

/* Set the default empty values. */
$width = 0;
$height = 0;

/* Is it an image? */
$attachmentExtension = strtolower(substr(strrchr($row['filename'], '.'), 1));
if (!in_array($attachmentExtension, array('jpg', 'jpeg', 'gif', 'png', 'bmp')))
$attachmentExtension = '';

$file_hash = getAttachmentFilename($row['filename'], $id_attach, null, true);
$physical_filename = $id_attach . '_' . $file_hash;

if (strlen($physical_filename) > 255)
return;
if (copy($oldAttachmentDir . '/' . $row['physical_filename'], $attachmentUploadDir . '/' . $physical_filename))
{
/* Is it an image? */
if (!empty($attachmentExtension))
{
list ($width, $height) = getimagesize($attachmentUploadDir . '/' . $physical_filename);
/* This shouldn't happen but apparently it might. */
if(empty($width))
$width = 0;
if(empty($height))
$height = 0;
}
$rows[] = array(
'id_attach' => $id_attach,
'size' => filesize($attachmentUploadDir . '/' . $physical_filename),
'filename' => $row['filename'],
'file_hash' => $file_hash,
'id_msg' => $row['id_msg'],
'downloads' => $row['downloads'],
'width' => $width,
'height' => $height,
);
$id_attach++;
}
---}
SELECT
post_msg_id AS id_msg, download_count AS downloads,
real_filename AS filename, physical_filename, filesize AS size
FROM {$from_prefix}attachments;
---*

AFAICT any variables called in that lot should keep the same values on conversion, so no drama.

Although it would be necessary to create the required db columns for inline attachments if you want a full conversion. This script currently only converts to 2.0.x, so no inline attachments stuff in the db by default. It'd need a custom query to add a couple of 2.1 bits (easy enough).

Tyrsson

@Antechinus To simplify it. The use keyword just makes the variables you pass to it usable in the functions scope (inside the function, or closure, anonymous function yada yada).

In the reference on php.net this is the callback in the example:
function ($matches) {
            return strtolower($matches[0]);
        }
Its been awhile since I looked but I think a closure can only except two arguments, $matches, would be arg1 and you could pass another ($matches, $arg2), after that you need to use the "use":
function($matches, $arg2) use($other, $variables, $here): string {
   // if you include the return type ): string
   // you must insure that the function returns a string
   // or php is gonna throw a fit ;)
}
PM at your own risk, some I answer, if they are interesting, some I ignore.

Arantor

preg_replace_callback only gets one argument supplied to it, the $matches array.

Closures in general are normal functions and can accept any number of arguments, including variadics, while the use clause is to import things from current scope into the closure.

Tyrsson

Quote from: Arantor on April 26, 2022, 03:29:00 AMpreg_replace_callback only gets one argument supplied to it, the $matches array.

Closures in general are normal functions and can accept any number of arguments, including variadics, while the use clause is to import things from current scope into the closure.
Thanks for the clarification @Arantor, not sure why I had it stuck in my head that they could only accept two arguments...

The variadics have been added since I last did any coding, which tells ya how long that has been lmao... Way back round php 5.2 - 5.3
PM at your own risk, some I answer, if they are interesting, some I ignore.

Antechinus

Did some more thinking about font sizes. Really, trying to convert them to a format that SMF understands by default is a bit stupid, particularly given that any members using the converted forum will be used to using phpBB syntax for font sizes.

So, on that basis, the sanest option is to install Doug's mod: phpBB-style Font Size BBCode. Then the conversion just becomes:
$row['body'] = preg_replace(
    array(
        /*--- Font size: install Doug's mod - https://custom.simplemachines.org/index.php?mod=3714 ;) ---*/
        '~<SIZE size=\\"(.+?)\\"><s>\[size=(.+?)]</s>~s',
        '~<e>\[/size]</e></SIZE><e>~s',
    ),
    array(
        /*--- Font size: install Doug's mod - https://custom.simplemachines.org/index.php?mod=3714 ;) ---*/
        '[size=$1]',
        '[/size]',
    ), $row['body']);

Which means anyone can use the normal range of font size definitions that SMF allows by default, or use the phpBB version (eg: size=150) or use standard CSS ( eg: size=150%). I think this is an instance where requiring the use of a small mod after conversion really makes the most sense. I'd be inclined to include this advice in any instructions for the script.

Also, regarding conversion of category links vs board links: it really needs something like this:
$row['body'] = preg_replace_callback(
array(
/*--- Internal links: to be converted to SMF c/board/topic/msg format. ---*/
/*--- Internal links to categories. Need to know ID's (phpBB quirk!): ID's are f=(1|2|3|4|5|6). ---*/
'~<URL (.+?)<s>\[url=https://www.old_forum\.com/(.+?)f=(1|2|3|4|5|6)]</s>~s',
/*--- Internal links to boards. ---*/
'~<URL (.+?)<s>\[url=https://www.old_forum\.com/(.+?)f=(\d+?)]</s>~s',
),
array(
/*--- Internal links: to be converted to SMF c/board/topic/msg format. ---*/
/*--- Internal links to categories. ---*/
'[url=https://www.new_forum.com/index.php?c=$3]',
/*--- Internal links to boards. ---*/
'[url=https://www.new_forum.com/index.php?board=$3]',
), $row['body']);

That will deal with any BBC links directly to categories, before any BBC links directly to boards are dealt with. This is necessary due to phpBB using the same syntax (f=****) for category links and board links (they are distinguished in the back end, by forum type).

Advertisement: