Yahoo Groups going away

Dave Wade dave.g4ugm at gmail.com
Thu Oct 24 02:51:36 CDT 2019


have been reasonably successful, after making a few mods, in backing up Yahoo groups using a clone of "Yahoo Group Archiver" which broadly works (but see below) and doesn't need  any scraping tools.
I made a few tweaks concentrating on speed rather than documenting the code and the current mess is here:-

https://1drv.ms/u/s!Ag4BJfE5B3onleMG29vMs5czmPcoTw?e=TrqawF

The script yahoo.py is supposed back up things to files. I couldn't get use the user/password login part to work, but noted scripts also have support for putting the cookies in the command line.
So I downloaded cookie manager for Firefox, logged into Yahoo and added code to set the values at the top of the code. The result is "yahoo1.py" Its pretty obvious which cookies are needed.
I found this fails on unnamed photo albums. I also found file download flaky. So Yahoo2 will fix photo albums with duff names and skip downloading any existing files.
This leaves one bug. If a download fails the script may leave an empty file. If you want to restart that download you need to remove it before restarting the download.
Sometimes Yahoo barfs at a file because updated av/malware scanners mark its as bad. E.g. archives which contain "netcat" In this case leave the partial download in place and allow the script to skip
We also don't get file descriptions.

I am running the scripts on Windows/10 on Python 3.7.5 on Windows/10 and use "py" to run the scripts
When installing the required "requests" package (see the readme.md) I found I had to enter the full path to pip (Its it the scripts folder)

I am happy to answer any questions but note I am in the UK (that is East Pondia not the University of Kentucky) so please allow for my time zone.

Dave
G4UGM
P.S. I now hate python....
PPS I also now hate Yahoo.


> -----Original Message-----
> From: cctalk <cctalk-bounces at classiccmp.org> On Behalf Of Steve Malikoff via
> cctalk
> Sent: 24 October 2019 02:51
> To: General Discussion: On-Topic and Off-Topic Posts <cctalk at classiccmp.org>
> Subject: Re: Yahoo Groups going away
> 
> Jim said
> > On 10/17/2019 6:52 PM, Cameron Kaiser via cctalk wrote:
> >>>> Yeah, it sucks. The Tomy Tutor users group has been there for
> >>>> years, and I guess we'll jump over to groups.io. I managed to
> >>>> archive everything last night.
> >>> What's your strategy for archiving material off YahooGroups? Their
> >>> Files and Photo (photostreams) sections are so heavily
> >>> Javascript-encrusted that it's not at all easy to bulk archive from
> >>> them. I tried a few tools (httrack, wget,
> >>> curl) with no valid results, but I only used some basic settings.
> >> For the messages, I used
> >>
> >> 	https://github.com/andrewferguson/YahooGroups-Archiver
> >>
> >> Unfortunately, the (rather inadequate) Y!G API for files makes it
> >> difficult to iterate over files in a directory tree. I ended up
> >> manually downloading them, since it was only about 30 files and not
> >> worth ginning up something to scrape them. Some people have used
> >>
> >> 	https://github.com/csaftoiu/yahoo-groups-backup
> > I didn't get that to work.  Has anyone here got suggestions? Contact
> > off list.  It is getting errors, and I spent about an hour trying to
> > figure it out.
> >
> > every issue was a bug in either Python that was unresolved, or the
> > tools they were using, not errors in the tool, so I'm not really
> > interested in a lot more debugging.
> >
> > I suspect it ran at some point, maybe I've got the wrong versions of
> > some sort.
> >
> > thanks
> > Jim
> >> to get everything but it needs a MongoDB instance which seemed kind
> >> of overkill for a one-time dump.
> 
> 
> I set it up with python 3.7.3, pip installed the required modules such as
> Selenium, installed geckodriver for Firefox (but I don't run Firefox on this
> machine, I use a popular fork) and it emitted an error that referes to Selenium
> not being the correct match to Firefox.
> I have other things to do so that's where I left it for now, will try it out again
> sometime soon with an earlier actual Firefox.
> 
> Steve.




More information about the cctalk mailing list