1.x --> 2.0.12: Old attachments can't be found if name contains umlaut (e.g. ä)

Started by lf123123, January 14, 2017, 01:12:49 PM

Previous topic - Next topic

lf123123

Hello,

I'm experiencing the following problem with attachments. Let's assume a user displays a topic with an attachment and the attachment has the following properties:


  • it was uploaded before the upgrade from 1.x to 2.0.12
  • it had a filename containing an umlaut (ä, ö, ü, Ä, Ö, Ü or ß) when it was uploaded

Then the attachment cannot be found, will therefore not be displayed in the topic and will raise an error message in the administrator center like this:

http://[...]/index.php?topic=1223.msg6663
2: fopen(/home/[...]/public_html/forum/attachments/Piperinsure_1.jpg): failed to open stream: No such file or directory
File: /home/[...]/public_html/forum/Sources/Subs-Graphics.php
Line: 346


In this example, the original filename of the file which cannot be found was "Piperinsäure_1.jpg". When I have a look into my attachments folder, I can see the actual filename of the file which cannot be found: It is "2484_PiperinsAure_1_jpg1894682558ff4b998236e04f010e9542". If you want to have a look at the post which contains this image, you can use this link: https://tinyurl.com/hyovlbe [nofollow]

The problem did not occur before the upgrade to SMF version "2.0.12".

Can anyone please help me to get all of our attachments working again? I'm rather clueless about what I could try to do. I'd be willing the even change the filenames by hand, even though there are hundreds of files, but I doubt that this would be a reasonable approach, and therefore wanted to ask what you think.

Thanks in advance!

lf123123

Since almost 1000 people read this topic and none posted a reply, I assume that there is no easy solution for this problem.

In them meantime, I tried to find a way to solve the problem by renaming the attachments by hand and found the following - an inconsistency IMO, please tell me if you agree:
The file I mentioned above, which fopen tries to locate at "/public_html/forum/attachments/Piperinsure_1.jpg" and which has a filename "2484_PiperinsAure_1_jpg1894682558ff4b998236e04f010e9542" has, according to the attachments table in our database, the following filename: "Piperinsäure 1.jpg". So the filename in the database differs from both the actual filename and from the path which fopen tried to locate the file under.

My questions:

  • Can someone see the acual source of the problem and suggest a solution? Could it be as simple as changing the character encoding of the attachments table? I already tried to change it from utf8_general_ci to latin1_swedish_ci, without success. If you think you've got an idea but are uncertain, please answer anyway.

  • Since I assume that nobody can, I'm interested in what needs to be done to solve the problem by hand. Should I rather rename the actual files or should I change the filenames in the database? In the example given, which value would be correct?

tinoest

I would say your encoding has gone wrong somewhere along the line. As you've already ascertained it seems.

You can rename all the files in the attachments directory which don't contain A-Za-z0-9 with the following;

#!/usr/bin/env bash
find "$1" -depth -print0 | while IFS= read -r -d '' file; do
  d="$( dirname "$file" )"
  f="$( basename "$file" )"
  new="${f//[^a-zA-Z0-9\/\._\-]/}"
  if [ "$f" != "$new" ]      # if equal, name is already clean, so leave alone
  then
    if [ -e "$d/$new" ]
    then
      echo "Notice: \"$new\" and \"$f\" both exist in "$d":"
      ls -ld "$d/$new" "$d/$f"
    else
      echo mv "$file" "$d/$new"      # remove "echo" to actually rename things
    fi
  fi
done


You want to go into the attachments directory on the command line then run the above. It won't actually do anything until you remove the echo before the mv but it will show you what it would do.

The other option is writing a script which pulls in all the filenames and updates the database entries. Depends on how you want to attack it really.

tinoest

Another option is to update the database with something like the following;

UPDATE `smf_attachments` SET
filename = replace(filename,'Š','S'),
filename = replace(filename,'š','s'),
filename = replace(filename,'Ð','Dj'),
filename = replace(filename,'Ž','Z'),
filename = replace(filename,'ž','z'),
filename = replace(filename,'À','A'),
filename = replace(filename,'Á','A'),
filename = replace(filename,'Â','A'),
filename = replace(filename,'Ã','A'),
filename = replace(filename,'Ä','A'),
filename = replace(filename,'Å','A'),
filename = replace(filename,'Æ','A'),
filename = replace(filename,'Ç','C'),
filename = replace(filename,'È','E'),
filename = replace(filename,'É','E'),
filename = replace(filename,'Ê','E'),
filename = replace(filename,'Ë','E'),
filename = replace(filename,'Ì','I'),
filename = replace(filename,'Í','I'),
filename = replace(filename,'Î','I'),
filename = replace(filename,'Ï','I'),
filename = replace(filename,'Ñ','N'),
filename = replace(filename,'Ò','O'),
filename = replace(filename,'Ó','O'),
filename = replace(filename,'Ô','O'),
filename = replace(filename,'Õ','O'),
filename = replace(filename,'Ö','O'),
filename = replace(filename,'Ø','O'),
filename = replace(filename,'Ù','U'),
filename = replace(filename,'Ú','U'),
filename = replace(filename,'Û','U'),
filename = replace(filename,'Ü','U'),
filename = replace(filename,'Ý','Y'), 
filename = replace(filename,'š','s'),
filename = replace(filename,'Ð','Dj')
filename = replace(filename,'ž','z'),
filename = replace(filename,'Þ','B'),
filename = replace(filename,'ß','Ss'),
filename = replace(filename,'à','a'),
filename = replace(filename,'á','a'),
filename = replace(filename,'â','a'),
filename = replace(filename,'ã','a'),
filename = replace(filename,'ä','a'),
filename = replace(filename,'å','a'),
filename = replace(filename,'æ','a'),
filename = replace(filename,'ç','c'),
filename = replace(filename,'è','e'),
filename = replace(filename,'é','e'),
filename = replace(filename,'ê','e'),
filename = replace(filename,'ë','e'),
filename = replace(filename,'ì','i'),
filename = replace(filename,'í','i'),
filename = replace(filename,'î','i'),
filename = replace(filename,'ï','i'),
filename = replace(filename,'ð','o'),
filename = replace(filename,'ñ','n'),
filename = replace(filename,'ò','o'),
filename = replace(filename,'ó','o'),
filename = replace(filename,'ô','o'),
filename = replace(filename,'õ','o'),
filename = replace(filename,'ö','o'),
filename = replace(filename,'ø','o'),
filename = replace(filename,'ù','u'),
filename = replace(filename,'ú','u'),
filename = replace(filename,'û','u'),
filename = replace(filename,'ý','y'),
filename = replace(filename,'ý','y'),
filename = replace(filename,'þ','b'),
filename = replace(filename,'ÿ','y'),
filename = replace(filename,'ƒ','f'),
filename = replace(filename, 'œ', 'oe'),
filename = trim(filename);


Which will replace them all with the equivalent replacement. Obviously run this on a test forum or take a backup first.

Kindred

Something seems to have gone wrong with your update, since filenames were updated to include a hash as a security measure...

When you say one.x, what version did you actually start from?
Слaва
Украинi

Please do not PM, IM or Email me with support questions.  You will get better and faster responses in the support boards.  Thank you.

"Loki is not evil, although he is certainly not a force for good. Loki is... complicated."

Arantor

Quote from: Kindred on January 15, 2017, 11:53:33 AM
Something seems to have gone wrong with your update, since filenames were updated to include a hash as a security measure...

When you say one.x, what version did you actually start from?

Well, this was introduced as of 1.1.9 to make that mandatory but it didn't go back and rename existing attachments from before that.

lf123123

First of all, a big thank you to everyone who made the effort to write a reply, it is very appreciated.

Quote from: Kindred on January 15, 2017, 11:53:33 AM
When you say one.x, what version did you actually start from?

Sorry for not telling you in the first place. The old version I was talking about was 1.1.4. Now that I read your replies, I think that this might be part of the problem (unexpected filenames for the upgrade scripts).

Anyway, I'm sure that I can solve the problem with tinoest's answers. I will try this in the next days and tell you if it worked. Thank you, tinoest!

Kindred

yeah, we've seen some issues going directly form an really old 1.1.x directly to 2.0.x
(but nothing consistent or reproducible)

Your best bet would have been upgrade to 1.1.21 and THEN do the 2.0.13 upgrade
Слaва
Украинi

Please do not PM, IM or Email me with support questions.  You will get better and faster responses in the support boards.  Thank you.

"Loki is not evil, although he is certainly not a force for good. Loki is... complicated."

lf123123

Problem solved! Almost, at least. The following SQL query did most of the job:
UPDATE `smf_attachments` SET
filename = replace(filename,'ä','A'),
filename = replace(filename,'ö','A'),
filename = replace(filename,'ü','A'),
filename = replace(filename,'Ä','A'),
filename = replace(filename,'Ö','A'),
filename = replace(filename,'Ü','A'),
filename = replace(filename,'ß','A'),
filename = trim(filename);


Notice that all umlauts need to be replaced with "A". A small number of files still cannout be found, but I'll rename those by hand when I find out which ones it is.

Thanks again to everyone for their support!

lf123123

Correcting myself:
filename = replace(filename,'ß','A')
should be
filename = replace(filename,'ß','AY')

tinoest


Advertisement: