News:

SMF 2.1.4 has been released! Take it for a spin! Read more.

Main Menu

RSS Feeder

Started by SlammedDime, January 11, 2009, 06:06:42 AM

Previous topic - Next topic

hartiberlin

#840
If I put for instance at:
http://www.travelhotelscheap.com/regex.php

~<div id="articlestory">(.*)<\/div>~siU
into the regex: form
what should I put into the
body: form ?

The whole body of the HTML page ?

So when I press then submit, what should it display ?

Only the extracted text of the webpage as the article or
still some HTML code with the array [1] statement around it ?

What about the array [0] statement that also is there and has parts
of the article I wish to extract ?

SlammedDime

You can't use the regex from my tutorial... that's why it's a tutorial, to show you how to create your own regex based on the HTML from your page.

Yes, the 'body' box gets the HTML, the regex box gets the regular expression.
SlammedDime
Former Lead Customizer
BitBucket Projects
GeekStorage.com Hosting
                      My Mods
SimpleSEF
Ajax Quick Reply
Sitemap
more...
                     

UndergroundChic

Quote from: SlammedDime on November 27, 2009, 06:43:31 PM


Undergroundchic - You could modify the add_settings.php script and remove {db_prefix} anywhere in the file, then modify package-info.xml and change the 'install for' line to 2.0 RC1.2.  I would recommend upgrading SMF though.

Thank You SD  :P

hartiberlin

Quote from: SlammedDime on December 01, 2009, 09:14:12 PM
the regex box gets the regular expression.

What does this exactly mean ?

What is a " regular expression" in this case ?

Please can you show it exactly on an example from any
RSS feed page please ?

Many thanks in advance and thanks again for your hard work !

juliegreen

Post Feed As=
Enter who you would like this topic posted as.


what do i need to write for the field Post Feed As? i tried many thing but nothing works. could give an example to write?

SlammedDime

you have to use an existing user on your board, like your username, or create a new user named 'Bot', and type it's name in there.
SlammedDime
Former Lead Customizer
BitBucket Projects
GeekStorage.com Hosting
                      My Mods
SimpleSEF
Ajax Quick Reply
Sitemap
more...
                     

juliegreen

thank you SlammedDime,

i was just confused the forums and tried the username which wasn't in the forum im using your mod.

thanks again.

juliegreen

there are two links in the feeds and i understand the source link but how can i remove the title link? and maybe place a nofollow attr. to source link?

thank you

hartiberlin

Are there no more comments to my last questions ?

Please how can we get this better done with the regex thing.
I need a good example, so I can understand this better.

Many thanks in advance.

Regards, Stefan.

GreenMotion

Quick Question here,

How does this tool determine whether or not a RSS feed item has already been posted or not?  What does it check for to make this determination?

Thank you,

   GM

SlammedDime

Stefan - can you point out exactly what from my tutorial you don't understand, then I might be able to better help you out.

GreenMotion - For every feed posted, it calculates an MD5 sum of the title of the item and stores it in the database for 30 days.  The log is pruned on a regular basis and items older than 30 days are discarded (to keep down on database size).
SlammedDime
Former Lead Customizer
BitBucket Projects
GeekStorage.com Hosting
                      My Mods
SimpleSEF
Ajax Quick Reply
Sitemap
more...
                     

GreenMotion

Thanks SlammedDime.

I got confused for a bit because it wouldn't update the RSS posts but after some googling and looking through the code I noticed it is caching the RSS feeds for 2 hours which explains a lot.

It would be nice if these cache settings were configurable through the configuration screen.

    GM

hartiberlin

Hi SlammedDime,
how could I extract the FULL articles for instance from this RSS feed:

http://www.asiatraveltips.com/travelnews.xml

What must I set for the REGEX there and how would I find it ?

Many thanks in advance.

Regards,Stefan.


hartiberlin

#853
Hi,
as there was no div tag
that started the articles in:

http://www.asiatraveltips.com/travelnews.xml

I used the following
regex code:

~<\/script><\/div>(.*)<br>~siU

Now it extracts the articles,
but as soon as inside the article there is an apostrophe like this for example
ship´s cabin

the RSS text does stop after the word
ship
and is no further extracted...

How could that be avoided ?

Many thanks.

hartiberlin

Another question.

How can I disable the link at the top of the imported posting to the
original article ?

Many thanks.

frymaster

Quote from: SlammedDime on December 04, 2009, 07:35:33 PM
Stefan - can you point out exactly what from my tutorial you don't understand, then I might be able to better help you out.

GreenMotion - For every feed posted, it calculates an MD5 sum of the title of the item and stores it in the database for 30 days.  The log is pruned on a regular basis and items older than 30 days are discarded (to keep down on database size).

does this mean that with the feed I have (which has every article ever posted since the blog was created) that after 30 days it will post old articles again?

hartiberlin

Also the character:

"
stops the import of the full text.

How could that be avoided ?

Many thanks.

SlammedDime

Stefan, I'm heading out the door now, but I'll take a look at your feed when I get back later this afternoon.  To answer your other questions: You'd have to open ScheduledTasks.php and scroll down to the bottom and find the bit that puts the whole post together and edit that.  In a future version of the mod, I'll create user editable templates for posting feeds.  As for the quote mark, I'll have to test it locally to see why that happens.

frymaster - yes.  Typically, RSS feeds would contain the latest and greatest items from a blog or site, rarely do they contain all items since inception.  You can change the length of time of pruning in the Pruning Options admin panel... just set the number of days to something high and it'll never be pruned... keep in mind though that if you have multiple feeds, none of them will ever be pruned and that log table can grow very large in size depending on how many items are posted.
SlammedDime
Former Lead Customizer
BitBucket Projects
GeekStorage.com Hosting
                      My Mods
SimpleSEF
Ajax Quick Reply
Sitemap
more...
                     

hartiberlin

Hi.

I am trying to suppress the quote mark and apostrophe signs,
with:

~<\/script><\/div>(.*['"])<br>~siU

but that did not work...
What is the regex code  to supress these signs.. ?

Many thanks.

hartiberlin

Hmm,
it seems some RSS Feeds are just not very compatible or are wrong programmed, so
also RSS-Feeder can not retrieve it.

I found now another RSS Feed that works with the REGEX code I used.


2. How can I change the Header of the posting, so that the domain name
is not displayed, where the feed is pulled from ?

Also I would like to have an option to disable the
"Source:"
at  the bottom of the postings.

How could that be done ?

Must I hack the RSS-Feeder php files for this or will you soon
include an option to disable it ?

Many thanks.

Advertisement: