• Welcome to Simple Machines Community Forum. Please login or sign up.

Google preferential indexing to posts with English titles

Started by spiros, December 20, 2004, 05:46:31 PM

Previous topic - Next topic

spiros

I have checked this thoroughly in the site. What happens is that posts with English subject are indexed in Google, whereas posts with Greek subject are not!

For example check the English posts in this board:
http://www.translatum.gr/forum/index.php/board,19.0.html

All of them are indexed - even the latest post "Useful Newsletter". None of the Greek ones are though despite the fact that they are all older! Any ideas as far as to why this happens?

[Unknown]

Interesting.  What happens if you use a different character set?

-[Unknown]

spiros

Currently I am using Windows-1253 which is pretty standard really. I could try using 8859-7 and let you know if there is any change but I would never risk trying UTF-8 again!

[Unknown]

Quote from: spiros on December 20, 2004, 05:57:29 PM
Currently I am using Windows-1253 which is pretty standard really. I could try using 8859-7 and let you know if there is any change but I would never risk trying UTF-8 again!

Heh.  I'm sorry... does Google have any support ^_^?  If SMF's doing something wrong, I definately like to know and fix it...

-[Unknown]

spiros

I have written to Google through AdSense. I will let you know as soon as I get a reply from them!

spiros

Here is the reply... I am afraid it is a pretty generic one. However, checking with other Greek sites running SMF I see that some of them get indexed for their Greek posts, some not!

Hi Spiros,

Thank you for your note. We noticed that your message board uses .php
pages. Please be aware that the Google index does include dynamically
generated webpages, including .asp pages, .php pages, and pages with
question marks in their URLs. However, these pages comprise a very small
portion of our index. Dynamically generated pages can cause problems for
our crawler and therefore may be ignored. If you suspect that your
dynamically generated pages are being ignored, you may want to consider
creating static copies of these pages for our crawler. If you do this,
please be sure to include a robots.txt file that disallows the dynamic
pages in order to ensure that these pages are not seen as having duplicate
content.

We hope the information we have provided above is helpful to you. Due to
the tremendous volume of information and help requests we receive, we are
not always able to provide personal attention to questions pertaining to
individual websites. For additional information, please visit
http://www.google.com/webmasters/. Also, you may want to comb
http://groups.google.com/groups?q=google.public.support.general for
suggestions from our users and webmasters or to post a question of your
own.

Regards,
The Google Team

spiros

The fog gets even thicker!

1) Posts with English titles and English/Greek text are indexed but the GREEK TEXT in them is not indexed!
http://www.translatum.gr/forum/index.php/topic,110.0.html

2) I found ONE Greek post which was indexed; the only thing that sets it apart is that it is the only post in its board
http://www.translatum.gr/forum/index.php/topic,117.0.html

3) Changing encoding from Windows-1253 to ISO-8859-7 did not have any effect.

4) When viewing code Greek appears perfectly normal.

spiros

Now, I ran some checks with other search engines and MSN search (http://beta.search.msn.com) seems to index all pages and Greek content without any problems. Yahoo, does not even index the single Greek post indexed by Google!

spiros

More Google correspondence. This time they recommend changing to UTF-8 (something I've done in the past which did not go very smoothly).

QuoteHi!

I have tried using UTF-8 in the past but this created all sorts of problems so I had to revert back to standard Greek encodings. Still, I do not understand why Google cannot cope with Windows-1253 in this case, when it copes without any problems in other dynamic pages. Moreover, how can MSN search (http://beta.search.msn.com) index it without problems whereas Google can't?

Thank you again for your kind feedback.

Spiros


> Hi Spiros,
>
> Thank you for your reply. We apologize for any misunderstanding. We
> noticed that your site doesn't currently use UTF-8 encoding. You might try
> adjusting your encoding to UTF-8 to see if this resolves the problem.
>
> Please note that any changes you make will not be immediate, but should
> take affect during Google's next crawl. Google's robots crawl the web on a
> regular cycle every four to six weeks, indexing more than eight billion
> webpages. Because of the vast amount of data refreshed with each crawl,
> the update process is completely automated; as a result, we're unable to
> make manual changes for individual sites. New sites, changes to existing
> sites, and dead links will all be noted in the course of the next crawl,
> which will be completed soon.
>
> For further assistance, answers to some of Google's most frequently asked
> questions can be found at http://www.google.com/help/index.html
>
> Regards,
> The Google Team

Advertisement: