• Welcome to Simple Machines Community Forum. Please login or sign up.
January 26, 2022, 02:39:34 PM

News:

Bored?  Looking to kill some time?  Want to chat with other SMF users?  Join us in IRC chat or Discord


RSS Feeder

Started by SlammedDime, January 11, 2009, 06:06:42 AM

Previous topic - Next topic

Suki

Well, it appears that particular feed doesn't add any HTML so you don't actually need to do any regex work

¿How exactly do you want to show that info?  The example on your site is full of "You are not allowed to view links. Register or Login".
Disclaimer: unless otherwise stated, all my posts are personal and does not represent any views or opinions held by Simple Machines.

Making tough decisions, so you don't have to.

420Connect.co.uk

Sorry, I forget what it looks like to guests as I'm usually logged in.

a: smftest
pw: smftest

if you'd like to log in.

& I'm looking for it to grab the title, picture and article text.

something like..
[picture]

[title]

[article]

no related articles / comments etc.
www.420Connect.co.uk ~ A Social Network For The #CannabisCommunity ~ Come say "High" ;)

Suki

Looks like the full article contains comments and links to other articles, you might need to change the code a little bit but that means you won't be able to set multiple feeds.
Disclaimer: unless otherwise stated, all my posts are personal and does not represent any views or opinions held by Simple Machines.

Making tough decisions, so you don't have to.

420Connect.co.uk

Multiple feeds as in from multiple sources?
I'm fine with that if I can get this one functioning as I'd like.

& Do you mean change the code of the Feeder mod?


I'd be very grateful if you can tell me which edits to make!
www.420Connect.co.uk ~ A Social Network For The #CannabisCommunity ~ Come say "High" ;)

Suki

Yes, multiple feeds from multiple sites.

The first thing to do is setting up a test enviroment to test the changes, so create a test.php file in the root of your forum where SSI.php is with the following:


<?php

include 'SSI.php';

require 
$sourcedir'/Subs-Rss.php';

$rss_data = new RSS_Feeder();
$rss_data->enable_cache(true);
$rss_data->enable_order_by_date(true);
$rss_data->set_cache_location($cachedir);
$rss_data->set_cache_duration(60*60*2); // 2 hours
$rss_data->set_output_encoding($context['character_set']);
$rss_data->strip_htmltags(false); // Gonna do my own stripping ;-)
$rss_data->set_useragent('SMF RSSFeeder v1.1 (Feed Parser; http://simplepie.org; Allow like Gecko)');
$rss_data->set_autodiscovery_level(SIMPLEPIE_LOCATOR_NONE);

foreach (
$rss_data->get_items() as $item)
{
print_r($item->get_title());echo '<br>';
print_r($rss_data->get_image_url());echo '<br>';
print_r($item->get_description());echo '<br>';
die;
}



Now use your browser to go yourforum.com/forum/test.php changing the yourforum with the url where your forum resides, in theory you should see 1 title, image link and description.
Disclaimer: unless otherwise stated, all my posts are personal and does not represent any views or opinions held by Simple Machines.

Making tough decisions, so you don't have to.

420Connect.co.uk

Hmm, I followed your instructions but my test.php page appears as a blank white page
I'm not sure if adding that file should have done anything to the RSS posts or to that file, but they still seem formatted as they were before..  :o

www.420Connect.co.uk ~ A Social Network For The #CannabisCommunity ~ Come say "High" ;)

Suki

Oh, I forgot to set the feed url

After this part:

$rss_data->set_autodiscovery_level(SIMPLEPIE_LOCATOR_NONE);

Add this:


$rss_data->set_feed_url('http://www.thedailychronic.net/feed/');
$rss_data->init();


Your already existing posts won't be altered in any way, this is just to check what the feed actually returns, any change that you made to the rss feed will only be aplicable to newly created posts, already created post won't be altered.
Disclaimer: unless otherwise stated, all my posts are personal and does not represent any views or opinions held by Simple Machines.

Making tough decisions, so you don't have to.

420Connect.co.uk

Sorry for being a pain  :-[

I've added in the exta bit and tested the test.php page,
that page now shows:

Quote from: http://420connect.info/forum/test.phpMarijuana Business Class Returns to New York City

NEW YORK, NY — On Saturday November 1, 2014 at the Crown Plaza JFK Air Port 138-10 135th Ave Jamaica, New York 11436 Cannabis Career Institute will be offering a class focusing on how to open a medical marijuana dispensary, grow operation, edibles company or marijuana delivery service for the State of New York. The [...]

as you can see, it adds"[...]" cutting off the full text and no theres sign of an image?

also, the most recent "feed grabs" haven't changed:
http://www.420connect.info/forum/index.php?topic=441.msg3651;topicseen#new


:P I always seem to want to do tricky things >.<
Thank you for your patience and I'm very appreciative of the help!
www.420Connect.co.uk ~ A Social Network For The #CannabisCommunity ~ Come say "High" ;)

Suki

Then I'm afraid there is no real way to only get the full content without comments or links to other articles.

the get_description() method will only show a summary and the get_content() method will show the full content but since there is no HTML or any other tags you can't use regex to extract the bits you want.

Quote
also, the most recent "feed grabs" haven't changed

And they won't change, what was already posted will remain as it is, also, this is only a test environment, you haven't actually changed anything in the mod.

OK, one last attempt, lest see what get_content() actually returns, if it returns some HTML code then it would be possible to use regex on it. Change this:

print_r($item->get_description());echo '<br>';

to this:

var_dump($item->get_content());echo '<br>';
Disclaimer: unless otherwise stated, all my posts are personal and does not represent any views or opinions held by Simple Machines.

Making tough decisions, so you don't have to.

420Connect.co.uk

Using that latest edit, test.php returns:

QuoteMarijuana Business Class Returns to New York City

string(326) "NEW YORK, NY — On Saturday November 1, 2014 at the Crown Plaza JFK Air Port 138-10 135th Ave Jamaica, New York 11436 Cannabis Career Institute will be offering a class focusing on how to open a medical marijuana dispensary, grow operation, edibles company or marijuana delivery service for the State of New York. The [...]"


I'm on the same page now, thanks for explaining that lol :)

This link is an example of when it grabs "too much"
http://www.420connect.info/forum/index.php?topic=441.msg3572#msg3572

Would it be possible to remove any of the extra parts? or even format it to be ..presentable?

www.420Connect.co.uk ~ A Social Network For The #CannabisCommunity ~ Come say "High" ;)

Suki

OK, I forgot var_dump() likes to cut off stuff if its too long, just one last attempt, change the var_dump word with print_r

If the comments have some HTML, most likely an ul tag then it will be possible to remove those via regex.
Disclaimer: unless otherwise stated, all my posts are personal and does not represent any views or opinions held by Simple Machines.

Making tough decisions, so you don't have to.

420Connect.co.uk

Sadly, nope - still the [...] version.


& Luckily, there are loads of <ul tag occurrences in the source for their site!

*crosses fingers*

;D
www.420Connect.co.uk ~ A Social Network For The #CannabisCommunity ~ Come say "High" ;)

Suki

I've been reading the mod's code, do you have the "retrieve full article" option enable for this feed? the code indicates there is a way to get the full raw content via another class.

Yet one last attempt :P

Change this:


print_r($item->get_title());echo '<br>';
print_r($rss_data->get_image_url());echo '<br>';
print_r($item->get_description());echo '<br>';
die;


Or whatever code you might have now with this:

$full_article = new SimplePie_File($item->get_permalink(), 10, 5, $rss_data->useragent);
echo '<pre>';print_r($full_article);die;

It seems that var should contain raw HTML or at lease something to work with.
Disclaimer: unless otherwise stated, all my posts are personal and does not represent any views or opinions held by Simple Machines.

Making tough decisions, so you don't have to.

420Connect.co.uk

February 08, 2015, 12:01:06 PM #1333 Last Edit: February 08, 2015, 12:13:06 PM by 420connect.info
(Yep have full article checked)

Oooh, nice one Suki!
:-*

It appears we're a step closer!
check out test.php now >> http://420connect.info/forum/test.php

edit:
(I just scrolled down, - never realised how long this page is now too!)
www.420Connect.co.uk ~ A Social Network For The #CannabisCommunity ~ Come say "High" ;)

Suki

Actually your regex was just fine and did grab exactly what you told it to :P  its just that the "post-column pull-left" div also contains a lot of stuff besides the main article.  Let me install that mod on local and do some test.
Disclaimer: unless otherwise stated, all my posts are personal and does not represent any views or opinions held by Simple Machines.

Making tough decisions, so you don't have to.

420Connect.co.uk

Ahh, sure I'm with you - that makes sense..  ;)

I look forward to your update!

& Thank you again for your patience and cooperation!
You're a legend   ;D

also, would it be possible to "escape" the new code to allow me to add additional normal feeds if I were to find another feed that had an easier to use feed code..
www.420Connect.co.uk ~ A Social Network For The #CannabisCommunity ~ Come say "High" ;)

420Connect.co.uk

Hey,

one more thing I thought of, I'm hoping is possible..

If you manage to get that site's feed working,
I plan to use the MOD to create separate topics for each feed item rather than each item in the same post, but currently it was naming the 'subject' as "The Daily Chronic" every time.

Would it be possible to use the name of the article as the 'subject' ?

Many thanks again!  :-*
www.420Connect.co.uk ~ A Social Network For The #CannabisCommunity ~ Come say "High" ;)

Suki

OK, try this regex, no need to do any change in the code:


~<div class=\"post_meta\">(?:.*)<\/div>(.*)<div class=\"post-author\">~siU


This is a pretty simple approach, it grabs anything after and before certain divs which hopefully exists on every article, the downside of this is that there is no image as the image is displayed before the div used by the regex, getting it may be complicated and will require code edits since the mod only expect 1 single match.

Quote from: 420connect.info on February 08, 2015, 05:16:23 PM
Hey,

one more thing I thought of, I'm hoping is possible..

If you manage to get that site's feed working,
I plan to use the MOD to create separate topics for each feed item rather than each item in the same post, but currently it was naming the 'subject' as "The Daily Chronic" every time.

Would it be possible to use the name of the article as the 'subject' ?

Many thanks again!  :-*

Theres an option for it already, if there is a separate topic for each feed the topic title gets renamed to the feed title.
Disclaimer: unless otherwise stated, all my posts are personal and does not represent any views or opinions held by Simple Machines.

Making tough decisions, so you don't have to.

420Connect.co.uk

Ooooh! thank you so much!  ;D  O:)

This grabs as you said, the article with no image.

Theres a few characters that come out strange:



but apart from them it's awesome!  :-*

I would prefer to be able to grab the image too but can't decide if I want to lose the ability to add additional feeds.. hmm!   :-\
I don't suppose it would be possible to set it up to work for both?


& Thanks for the pointer on the 'subject' ! - I hadn't tested the separate board feature yet!
www.420Connect.co.uk ~ A Social Network For The #CannabisCommunity ~ Come say "High" ;)

420Connect.co.uk

February 11, 2015, 07:20:00 AM #1339 Last Edit: February 11, 2015, 07:50:11 AM by 420connect.info
I've had a dig around for what I am find online and think this might be what I'm looking for although I'm probably failing to add it into the statement correctly..

if anyone who can wrap their head round regex can help add these together, that would be awesomeee!  8)  O:)

"(ness)?$|(n't)?$|('re)?$|('s)?$|(s)?$|(ty)?$|('ve)?$"
(found here)

getting the above to work with the below:

~<div class=\"post_meta\">(?:.*)<\/div>(.*)<div class=\"post-author\">~siU

Thanks in advance!
;D






I seen a previous post in this thread about using the 'censored words' feature to help clean up feeds.
so at the moment I've not got occurrences of things like "’" to be replaced with a space or apostrophe - this seems to be an okay workaround for now anyway :)

I'd still prefer to be able to grab the image too & have it formatted nicely but this seems to be the closest thing so far! :)
www.420Connect.co.uk ~ A Social Network For The #CannabisCommunity ~ Come say "High" ;)

Advertisement: