News:

Want to get involved in developing SMF, then why not lend a hand on our github!

Main Menu

RSS Feeder

Started by SlammedDime, January 11, 2009, 06:06:42 AM

Previous topic - Next topic

RaidersRNA

#1300
Open subs-rss.php in the sources folder and find the following part (its quite close to the top):


$replace = array( "",
"",
"\n[br][img]\\1[/img][br][br]\n",
"\n",
"\\1",
"\"",
"and",
"(",
")",
" ",


You will see the & symbol in there instead of the word and, simply replace the & with the word and like I have and it will work perfectly. It was driving me crazy for awhile before I found this part of the file and fixed it. It's a bit of a bug with this mod because the & symbol is used for XML (RSS) syntax so the mod doesn't see it as text, therefore it will cut off everything after an ampersand is found in a RSS feed when it tries to import it. Luckily the & symbol can be easily changed to the word and, and still have the same meaning.

Oh and there is a second part to the fix which I just remembered. You need to use your forums censored words filter to clean up the fixed ands. Go into your censored words list and add the following to it:

andamp;
&
&
&

Set all of them to be replaced with just the word and. Now anytime your forum encounters an ampersand it will be converted to the word and. With the fix to the subs-rss file, it will normally output ampersands as andamp; so you use the censor filter to clean it up so it just comes out as and. It's a bit of a bandaid workaround, but it works flawlessly for me and I haven't had the problem since I did it.

imrich

Thanks for this idea.

However it does seems strange to me to:

1) convert '&' character to the string 'andamp' in the incoming RSS
2) then reconvert 'andamp' to the '&' again using the censored word filter

Why not just delete the & from the search string to begin with so as not to convert it. Do you think this would be cleaner?

To see if this will work perhaps just comment out (add a // to the begining of) this line in the $search array:

"'&(amp|#38|#038|#x26);'i", // added hexadecimal values

and the same to this line in the $replace array:

"&",


Then this will leave the original & alone and not convert it to &

I'll try to get some time to test this and let you know if I find anything.

Or maybe someone smarter than me can come up with a better fix? ;)


RaidersRNA

You're welcome, hope it helps.

1) When the RSS feed is read by the mod it is getting stuck when it finds an & which is the html/xml code for an ampersand. The mod doesn't read the xml code as & but instead gets stuck on just the & bit and cuts everything off after it, ie the amp; part is not the problem. Converting the & to the word 'and' allows the mod to continue reading the RSS feed without getting stuck on the & because it no longer exists after the conversion.

2) You don't want to convert the andamp; back to an & with the censorship filter, instead you want to convert andamp; to just and. The idea is to completely remove all ampersands and replace them with the word and instead.

I tried to remove the & from the search but it was still breaking for me as the & still remains in the RSS feed unless it gets converted. & in XML (RSS) is a special character, so when the reader sees them I don't think it knows what to do with it and simply cuts everything off after it.

I use a lot of RSS feeds from twitter in my forum so was getting this problem all the time, since people use ampersands in tweets a lot (& uses 2 less characters than the word and which is useful when you only have 140 characters available in a tweet).

As a side note, I'v also discovered that the reader has the same problem with the degrees ° symbol, which is something I haven't been able to fix as yet. I think it comes back to our old friend the ampersand, since the html/xml code for the ° symbol is ° which contains the dreaded &.

Kindred

have not tried it myself...   but what about forcing an escaped character... like \\&
Слaва
Украинi

Please do not PM, IM or Email me with support questions.  You will get better and faster responses in the support boards.  Thank you.

"Loki is not evil, although he is certainly not a force for good. Loki is... complicated."

imrich

Kindred,

I didn't even think of that, it sounds even simpler.

In the $replace array change:


"&",


with this:


"\\&",


I'll give it a try soon!

sonficyus

I set up this modification and added 10 rss feeds. But i am still waiting for being posted...

Why it didn't work?

Is there a link that will make rss feeds posted? So, i can use this link and post rss feeds when i want...

Thanks in advance...

My site: http://cevherhazirlama.com/forum/

imrich

Quote from: imrich on May 22, 2013, 10:22:07 AM
Kindred,

I didn't even think of that, it sounds even simpler.

In the $replace array change:


"&",


with this:


"\\&",


I'll give it a try soon!

Actually, this almost worked, but not quite!

I did this:

"\\&",


and ended up with "\&". LOL Which seemed to break things again

So I just did this and will now test it:


"&",


So the output should hopefully just display the ampersand.

sonficyus

Quote from: sonficyus on June 20, 2013, 04:18:11 AM
I set up this modification and added 10 rss feeds. But i am still waiting for being posted...

Why it didn't work?

Is there a link that will make rss feeds posted? So, i can use this link and post rss feeds when i want...

Thanks in advance...

My site: http://cevherhazirlama.com/forum/

i want to post these when i want...

Alpay

Very Very good mod.. ^^

Thank you very much.

luuuciano

My books are burned... help!

I am testing two feeds...
The first one is the one I want to really use:
http://pipes.yahoo.com/pipes/pipe.run?_id=33af87832810ef8e84c2e2b0556dbc29&_render=rss
As it shows a wheater forecast in 1 phrase...
BUT... RSS Feeder do not import anything... and I have no idea what is wrong... the feed looks well formed, etc...

And here the 2nd feed, just used to test...
http://pipes.yahoo.com/pipes/pipe.run?_id=a11a9e4d9b94a92e3ece850c67055f3f&_render=rss
This worked ok... imported 5 items...

Any idea? what can be wrong?
BTW, I just used it today... tomorrow it will import the new forecast?? RSS feeder do not work with this kind of feeds?



In the other hand...
Someone tried to update the simplepie class used in this mod?
It looks like the latest version of simplepie is modular? (a lot of separated files)

I was thinking about doing it, to not have all the deprecated errors... because I tried using
error_reporting = E_ALL & ~E_NOTICE | ~E_DEPRECATED | ~E_USER_DEPRECATED
at the php.ini... and it cotinues showing a TON of deprecated errors on the smf log...
No me agradan los foros que no te dejan borrar TU PROPIO usuario, como por ejemplo smfsimple.com.
E incluso te mandan emails no solicitados, de los cuales, quizá, no puedas escapar porque NO te dejan posibilidad a deshabilitarlos (a menos que NO te tengan en su lista negra).

luuuciano

Quote from: luuuciano on October 18, 2013, 01:37:51 PM
In the other hand...
Someone tried to update the simplepie class used in this mod?
It looks like the latest version of simplepie is modular? (a lot of separated files)

Well.... Here found the single file version http://simplepie.org/downloads/simplepie_1.3.1.compiled.php
Used it on the Subs-Rss.php file... and replacing the call to

// Custom class for sorting...
class RSS_Feeder extends SimplePie
{

public static function sort_items($a, $b)
{
return $a->get_date('U') >= $b->get_date('U');
}
}


There is no more "deprecated" errors, and it looks to work ok...

But it logs 3 errors:

http://domain.com/index.php?action=admin;area=scheduledtasks
2: preg_match(): Compilation failed: nothing to repeat at offset 497
File: /home/username/public_html/Sources/Subs-Rss.php
Line: 15700

http://domain.com/index.php?action=admin;area=scheduledtasks
2: preg_match(): Compilation failed: nothing to repeat at offset 497
File: /home/username/public_html/Sources/Subs-Rss.php
Line: 15634

http://domain.com/index.php?action=admin;area=scheduledtasks
2: preg_match(): Compilation failed: nothing to repeat at offset 551
File: /home/username/public_html/Sources/Subs-Rss.php
Line: 15546


Anyway... I can not import the weather feed... :(
No me agradan los foros que no te dejan borrar TU PROPIO usuario, como por ejemplo smfsimple.com.
E incluso te mandan emails no solicitados, de los cuales, quizá, no puedas escapar porque NO te dejan posibilidad a deshabilitarlos (a menos que NO te tengan en su lista negra).

Suki

I managed to do it but wasn't that simple and since I was already using composer, I added simplepie as a library using composer and its autoloading feature.

Once I got the library, I completely delete the old class inside Subs-RSS.php and did some changes on the scheduled task:


                 $rss_data = new SimplePie();
$rss_data->set_feed_url($feed['url']);
$rss_data->set_cache_duration(60*60*2);
$rss_data->init();
$rss_data->handle_content_type();

// If we don't get a valid chunk of data back, disable the feed
if ($rss_data->error())
{
$smcFunc['db_query']('', '
UPDATE {db_prefix}rssfeeds
SET enabled = 0
WHERE id_feed = {int:feed}',
array(
'feed' => $id,
)
);

// Log an error about the issue, just so the user can see why their feed was disabled...
log_error($txt['rss_feeder'] . ': ' . $feed['url'] . ' (' . $rss_data->error() . ')');
continue;
}

// Set the right order, olders first...
$get_items = array_reverse($rss_data->get_items());


But I must say, I still get some encoding issues, mostly due to bad rss feeds or some weird and strange symbols
Disclaimer: unless otherwise stated, all my posts are personal and does not represent any views or opinions held by Simple Machines.

Chaoticone

Is anyone using this with SMF 2.0.6? I'm really hoping to avoid errors and other issues. My Forum seems to be very stable and solid right now and I hate to break it. I have been trying to use the RSS Feed Poster and I do have it working as well as it can but if it can't get the latest news it's of no use to me.

Any advise would be much appreciated.

SMF 2.0.6
RSS Feed Poster
Ad Seller Pro
Adk Rules Posts
Like Posts
Welcome Topic Mod
Yandex.Speller for SMF
Tagging System
Youtube Integration Mod (0.1)
SMF Links
Event Registration Mod for SMF2

luuuciano

Someone knows how many times RSS Feeder tries to fetch an RSS until it disabled it?

I have a source (in fact, I have just one!) that I need to check every week, because it gets disabled too often...

My idea is to try to increase that value... or something, and adding a pause on each try (or it just try ONE time??)

I have setup the rss feeder task to run every 8 hours... so, just 3 times a day... it is really bad luck that too often the source do not respond
No me agradan los foros que no te dejan borrar TU PROPIO usuario, como por ejemplo smfsimple.com.
E incluso te mandan emails no solicitados, de los cuales, quizá, no puedas escapar porque NO te dejan posibilidad a deshabilitarlos (a menos que NO te tengan en su lista negra).

mj.

Well, kafooey, I like this mod, but when I try to use a google news search feed my links don't resolve at all.  My posting user is an admin and I have html enabled.  It's this feed: https://www.google.com/alerts/feeds/07378569770343342231/4632544986743109160

When I inspect the page I see this:
<a href="http://" class="bbc_link" target="_blank"><b>E-cig</b> uptake linked to official drop in quitters</a>

When I look at the raw XML from the feed I see this should be the link:

https://www.google.com/url?rct=j&sa=t&url=http://www.nursinginpractice.com/article/e-cig-uptake-linked-official-drop-quitters&ct=ga&cd=CAIyGjZhNzZjNTdjMzZjNzU5YWI6Y29tOmVuOlVT&usg=AFQjCNH76Y-c6Kcp3wENZr2GetfoBcyeZA

Any help please?

lomaalta

Ok, at the risk of being labeled lazy - can you tell me the purpose (not the how) of using regular expressions?

Do I need to look at the individual feeds to determine the regexes?

...  and btw - had installed another rss feed mod - but images weren't appearing, weird characters etc.  This looks SO much better in so many ways - THANKS!

Arantor

The purpose of using regular expressions is because there's no single definitive way to get the data you want - it's not the same page layout everywhere on the web. Regular expressions are a way of being able to explain what information you want to match inside a page.

And you look at the page the feed points to, rather than the feed itself. Since that's what the regular expression will be applied to.

420Connect.co.uk

I'm having trouble with pulling the parts of the article I'd like from the source at:
www.thedailychronic.net/2014/37930/marijuana-business-class-returns-to-new-york-city

I believe its the "div class="post-column pull-left" part I'm trying to pull.

I've tried using:

~<div class="post-column pull-left">(.*)<\/div>~siU
but it doesn't grab the text part.

whilst ~<div class="post-column pull-left">(.*)<\/div>~s
grabs parts after the article I'm not looking to include :(

could someone help get it to just grab the picture, title and article itself.

Much appreciated
Thank you
www.420Connect.co.uk ~ A Social Network For The #CannabisCommunity ~ Come say "High" ;)

Suki

The first regex should work just fine, if the page has a responsive design then make sure the "post-column pull-left" actually exists on the page that is been sent to, sometimes the page you are seeing with the browser isn't the same as the one the mod will grab.

Anyway I updated this mod for 2.1 and updated the SimplePie class too, might be worth to replace the regex feature with DOMDocument class. Too bad the license of this mod doesn't allow redistribution.
Disclaimer: unless otherwise stated, all my posts are personal and does not represent any views or opinions held by Simple Machines.

420Connect.co.uk

Quote from: Suki on February 08, 2015, 09:10:16 AM
The first regex should work just fine, if the page has a responsive design then make sure the "post-column pull-left" actually exists on the page that is been sent to, sometimes the page you are seeing with the browser isn't the same as the one the mod will grab.

Anyway I updated this mod for 2.1 and updated the SimplePie class too, might be worth to replace the regex feature with DOMDocument class. Too bad the license of this mod doesn't allow redistribution.

:( unfortunately not.

I've checked the feed page of the site and found "post-column pull-left" doesn't exist..

although I can't see what I would use from the feed page in the regex instead.

http://www.thedailychronic.net/feed/

examples of how it's grabbing:
http://www.420connect.info/forum/index.php?topic=441
www.420Connect.co.uk ~ A Social Network For The #CannabisCommunity ~ Come say "High" ;)

Advertisement: