News:

Want to get involved in developing SMF, then why not lend a hand on our github!

Main Menu

Scraping statistical data from sports site to appear on your own site

Started by njtweb, August 19, 2019, 06:25:38 PM

Previous topic - Next topic

njtweb

A lot of independent sports sites, (a size like mine) use stats that are hosted on professional sports sites.

Is anybody familiar with how this is done?

I'd like to somehow have the standings on this site https://www.shshl.org/standings/show/5154591?subseason=623871

Show on my site here, https://www.youthhockeyinfo.com/index.php?page=43#shshlcont when the stats are updated after the games are played.

If I'm using the correct terminology it's called scraping another site? This is the school hockey league stats site.

Thanks in advance for any responses.

vbgamer45

It would be scraping you can do with cURL their terms of use says you can't do it though.
Community Suite for SMF - Take your forum to the next level built for SMF, Gallery,Store,Classifieds,Downloads,more!

SMFHacks.com -  Paid Modifications for SMF

Mods:
EzPortal - Portal System for SMF
SMF Gallery Pro
SMF Store SMF Classifieds Ad Seller Pro

njtweb

Quote from: vbgamer45 on August 19, 2019, 07:20:55 PM
It would be scraping you can do with cURL their terms of use says you can't do it though.

Ok, thank you. Who has the terms of use? The school site doesn't allow it?

vbgamer45

The website lists it.
The rule with scraping check if you can. Then make sure your program is not overly aggressive when crawling the data. You could use a different user agent with curl and rotate ip's when doing  the scraping.
Community Suite for SMF - Take your forum to the next level built for SMF, Gallery,Store,Classifieds,Downloads,more!

SMFHacks.com -  Paid Modifications for SMF

Mods:
EzPortal - Portal System for SMF
SMF Gallery Pro
SMF Store SMF Classifieds Ad Seller Pro

njtweb

Quote from: vbgamer45 on August 19, 2019, 07:29:05 PM
The website lists it.
The rule with scraping check if you can. Then make sure your program is not overly aggressive when crawling the data. You could use a different user agent with curl and rotate ip's when doing  the scraping.

Oh, ok. I found it in their terms of use. No big deal I'll just update my site after they update the most recent results. Would be much better if I could automate it but I don't want to get in trouble.

Advertisement: