News:

Wondering if this will always be free?  See why free is better.

Main Menu

RSS Feeder

Started by SlammedDime, January 11, 2009, 06:06:42 AM

Previous topic - Next topic

GazOutEast

OK thanks - have done it

The topper now looks like this -



2127: /* RSS Feeder Settings */
function ModifyRSSFeedSettings()
{
global $context, $txt, $sourcedir, $scripturl, $smcFunc, $board;

loadTemplate('RSSFeeds');
$context['page_title'] = $txt['rss_feeder'];



Is that the location you meant?

Gaz
I have 20:20 vision - I can see anything bigger than 20" x 20"

SlammedDime

SlammedDime
Former Lead Customizer
BitBucket Projects
GeekStorage.com Hosting
                      My Mods
SimpleSEF
Ajax Quick Reply
Sitemap
more...
                     

GohighVoltage

No, Definitely yours.   It seems like it pulls some new feeds but not all of them.   Trying to figure it out.

Satfreak

I have SMF 2.0 RC2 installed and RSS Feeder 1.1.4, everything is OK, put me with some feed and Image and the whole text, but also put a live link from the feed, and even Source: from the feed, although I have the same feed another forum which is vBulletin, but does not show the source (whence came the link), or live links, although I know that there can not adjust to go live outside links.
I'm interested in whether there is a script, or mode, or should not edit the php somewhere. to coding all the links to the forum, which means all incoming links to be scrambled.

SlammedDime

You'd have to edit ScheduledTasks.php, near the bottom of the function for the rss feeds.
SlammedDime
Former Lead Customizer
BitBucket Projects
GeekStorage.com Hosting
                      My Mods
SimpleSEF
Ajax Quick Reply
Sitemap
more...
                     

Satfreak

Quote from: SlammedDime on January 11, 2010, 04:58:49 PM
You'd have to edit ScheduledTasks.php, near the bottom of the function for the rss feeds.

Tnx mate,but where is that ScheduledTask php,I cant find that,and what I need to edit inside that php,I cant find that Scheduled TaskS php.

Satfreak

Ok,I find that ScheduledTask php via ftp inside,but what line I must replace to make that,to have all codeed that links,that comes to me via feed,I cant find that lines to make that,please need help for that,Tnx.

SlammedDime

Nothing has been ignored, I just haven't had time to update the mod.  Thanks for the suggestions, I'll look into implementing them.
SlammedDime
Former Lead Customizer
BitBucket Projects
GeekStorage.com Hosting
                      My Mods
SimpleSEF
Ajax Quick Reply
Sitemap
more...
                     

GazOutEast

SD - no pressures on this

When you do get some spare time, I'd love to see Spuds fixes in the mod to make life easier for grabbing full feeds, although I'll admit openly that, on my install, without setting any regex's or anything, I get full articles around half the time ... always from the same feeds. 

I've also noticed that, previewing the feeds at the source sites, those feeds that only appear in SMF as an excerpt, in fact only contain an excerpt within the feed file .... so could you (or anyone) please explain to this confused old dinosaur if all this regex stuff is supposed to drill past the excerpt in the feed and grab the full article from the site, or what it's all about.

Side note on the Source tag - SD, could I possibly request that at next upgrade, the source link display be switchable in admin?  Ideally, I'd like three switches -
1 - whether to display it at all, or not (as discussed above) instead of hacking the files.
2 - an admin field to change the link field-name displayed (e.g. Instead of "Source", change it to "Supplied by" and so on - saves hacking the language file and causing uninstall / upgrade headaches later on.
3 - an admin switch with a couple of position choices - e.g. right under the title link (see 2 above) as an alternative to bottom of post, or in front of date but on same line as date, and so on.

They're not key to the functioning of the mod, just prettifying and ... errr ... flexiblifying ( ;) ) so it's up to you, just a few suggestions for you.

Gaz
I have 20:20 vision - I can see anything bigger than 20" x 20"

SlammedDime

The way the regex (full article) option works is that if the option is enabled for that feed, the mod navigates to the link supplied in the item from the feed, loads up the web page, then uses the regex to find what you're looking for.  The information spuds supplied will greatly help in reducing the need to use regexes and make it more 'dummy proof', so to speak.

Some feeds do contain the full article (not normal), so in that case, you'd get the whole article in SMF because that is what the feed itself contains.

As for the 'Source' bit... I'll be introducing a 'template' so to speak that can be customized for each feed on how it is displayed, which allows the admin full control over what is displayed for each feed, and how it is displayed.
SlammedDime
Former Lead Customizer
BitBucket Projects
GeekStorage.com Hosting
                      My Mods
SimpleSEF
Ajax Quick Reply
Sitemap
more...
                     

Satfreak

#930
Now is all ok for Feeds wich doesnt have source with external links,like some feeds only with Images,but if I use Feed for example for movies,wich comes with Images,links..
I get Feed for movies with all info,Images,but links wich comes with Feeds are not coming coded,they comes Live,also shows me Live link for Source from wich board comes that Feed,.
I won to that links for movies comes coded,and to not showing from wich source comes that Feeds,check my Picture's,

Tnx.

SlammedDime

What you're asking is quite custom and I do not have the time to modify the code for your needs.  Sorry.
SlammedDime
Former Lead Customizer
BitBucket Projects
GeekStorage.com Hosting
                      My Mods
SimpleSEF
Ajax Quick Reply
Sitemap
more...
                     

GazOutEast

Quote from: SlammedDime on January 14, 2010, 03:34:31 AM
The way the regex (full article) option works is that if the option is enabled for that feed, the mod navigates to the link supplied in the item from the feed, loads up the web page, then uses the regex to find what you're looking for.  The information spuds supplied will greatly help in reducing the need to use regexes and make it more 'dummy proof', so to speak.

Some feeds do contain the full article (not normal), so in that case, you'd get the whole article in SMF because that is what the feed itself contains.

As for the 'Source' bit... I'll be introducing a 'template' so to speak that can be customized for each feed on how it is displayed, which allows the admin full control over what is displayed for each feed, and how it is displayed.

Ah-ha - thanks for explaining that - so does that mean ... going back to your tutorial on page 2 of this thread ... that when you talk about getting the <div id="blahblah" that it's the div on the source page from the link, rather than the div id in the actual feed source-code?

Here's just a thought by the way - I think it's Amazon and the Kindle service that has the terms of supply, that if you submit your blog feed (or other feed) for syndication through them, then the feed must only supply full articles, not truncated in any way, and must be advert-free.  If I'm right on who it is, then that could be a good hunting ground for full article feed sources for those who can't figure out the regex's etc.

Gaz
I have 20:20 vision - I can see anything bigger than 20" x 20"

SAFAD

Hey Guys
I Can't Grab Full Article From FIFA
Best Regards
Sadaoui "SAFAD" Abderrahim - Lead Developer @ Electron Inc.

GazOutEast

Quote from: Spuds on January 14, 2010, 11:48:53 AM
of course there are times when you have to use both, like when a web page has horribly broken HTML code.

Such as when the site splits the story and puts, say an an advert div in the middle - an example being Forbes.com ?

Quote from: Spuds on January 14, 2010, 11:48:53 AM
Now once you have targeted and extracted the text of interest from the full feed there is a good chance that any links in are relative, ie intended to work from the site where you got the feed and not from another site.  So you need to try and fix links, oh and then there is all those character set encoding issues to make things fun as well ....

Yeah, I can understand the issue with relative feeds - took me a while to get to grips with those at the beginning - osCommerce was a good training ground for them though - but the language encoding could be cured with UTF-8 couldn't it?  Apart from that, unless deliberately hunting feeds in multiple languages, surely most web masters would be either pulling feeds in English only, or maybe in their own language plus in English?  I can't imagine anyone trying to pull the same feed in all available languages, even though I could see a few reasons someone might want to try it.

Does the solution you're suggesting mean that SD's regex tutorial would become redundant, and we'd all need to learn a new system, or would yours do it all for us?

(This is getting interesting - I like this discussion and where it could take us with the RSS Feeder mod)

Gaz
I have 20:20 vision - I can see anything bigger than 20" x 20"

GazOutEast

Quote from: SAFAD on January 15, 2010, 07:20:35 AM
Hey Guys
I Can't Grab Full Article From FIFA

Beware - FIFA aggressively protects all it's copyrighted material - they WILL get your site shut down if you publish anything direct from their feeds other than official press releases marked for recirculation.  (They got one of my eBay IDs shut down for selling photos of UK Premier League stadiums that had been photographed by me - they claimed the buildings / structures were copyright property of their member clubs and only the member clubs and FIFA had the right to make money from photographs of them).

Just a friendly warning

Gaz

edit to fix typo
I have 20:20 vision - I can see anything bigger than 20" x 20"

SAFAD

Thx Gaz
But As They Say At Their Website I May Copy The Content Only If I Link Back To Em
Best Regards
Sadaoui "SAFAD" Abderrahim - Lead Developer @ Electron Inc.

GazOutEast

Quote from: Spuds on January 15, 2010, 03:07:53 PM
QuoteDoes the solution you're suggesting mean that SD's regex tutorial would become redundant, and we'd all need to learn a new system, or would yours do it all for us?
The syntax is pretty much the same as is the concept, just that instead of passing the regex string to a preg_match it passes it to a dom parser.  In the example it was grabbing the text between the <div id="articlestory"> and </div> tags. 
The regex for that was ~<div id="articlestory">(.*)<\/div>~siU

For the dom parser it would be div[id="articlestory"]    which is simpler for most folks to understand.  To take it one step further if you put in div[id="articlestory"] p  ... it would return all of the paragraphs within that div and nothing else, or div[id="articlestory"] h3 would return all of the h3 tags from within that div ....

Lets also suppose that there was another div /div combo inside of the <div id="articlestory"> The dom parser would still return the entire <div id="articlestory"> section where as the basic regex above would return from the start of the <div id="articlestory'> to the first </div> it found which is not the closing div of the story.  I hope that makes sense ....

That sounds excellent and will remove a major headache for most people, as well as hopefully reducing heavily the requests for regex writing support in this thread.

I've come across a scenario though which I think may be the sources trying to prevent full article retrieval.  I've seen this on several multi-site domains now, but as an example let's use eBay.  On their sites, each site has an announcement board (use top right menu on home page - community - announcements) which details the localised news, promotions etc for that country.  On the announcement board they don't use divs, they use a plain old html table nesting system with ul and ol tags mixed up with paragraphs, td's and other stuff.  It's a mess and I can't for the life of me see how it'd be possible to extract the full announcement with either regex or dom parsing as there's no unique formatting id for any tr or td anywhere.  I'll play around with a few options to try it, but don't think it's going to work.

Re the image cache or hotlink - the WordPress plugin WP-o-matic has that - the plugin is old and appears abandoned by the author (it's GPL) and it might give you some clues - the caching works well, as does the feed collecting, but it has no keyword filtering, nor system to get full article as per this topic.

Gaz
I have 20:20 vision - I can see anything bigger than 20" x 20"

GazOutEast

SD

Sorry if this is a painful question and asked before ...

I remember that in vbg's mod, the admin could set the collection frequency for each feed individually - so say that feed A was a high priority "must know about asap" feed, it could be set to 10 - 15 minute polling, but feed B only contained something once a week, then it could be set to a number of days between polling.  This helped reduced the load on the server (good for trigger-happy hosts).

Any chance something like that could be introduced?

Or maybe a way to batch-group the feeds, with each batch having different polling frequencies?  Even having say three groups would be useful (but would likely cause a lot of requests for more - LOL).

Gaz
I have 20:20 vision - I can see anything bigger than 20" x 20"

SlammedDime

It's not possible at this point, but the feature is definitely doable.
SlammedDime
Former Lead Customizer
BitBucket Projects
GeekStorage.com Hosting
                      My Mods
SimpleSEF
Ajax Quick Reply
Sitemap
more...
                     

Advertisement: