have been reasonably successful, after making a few mods, in backing up Yahoo groups using
a clone of "Yahoo Group Archiver" which broadly works (but see below) and
doesn't need any scraping tools.
I made a few tweaks concentrating on speed rather than documenting the code and the
current mess is here:-
https://1drv.ms/u/s!Ag4BJfE5B3onleMG29vMs5czmPcoTw?e=TrqawF
The script yahoo.py is supposed back up things to files. I couldn't get use the
user/password login part to work, but noted scripts also have support for putting the
cookies in the command line.
So I downloaded cookie manager for Firefox, logged into Yahoo and added code to set the
values at the top of the code. The result is "yahoo1.py" Its pretty obvious
which cookies are needed.
I found this fails on unnamed photo albums. I also found file download flaky. So Yahoo2
will fix photo albums with duff names and skip downloading any existing files.
This leaves one bug. If a download fails the script may leave an empty file. If you want
to restart that download you need to remove it before restarting the download.
Sometimes Yahoo barfs at a file because updated av/malware scanners mark its as bad. E.g.
archives which contain "netcat" In this case leave the partial download in place
and allow the script to skip
We also don't get file descriptions.
I am running the scripts on Windows/10 on Python 3.7.5 on Windows/10 and use
"py" to run the scripts
When installing the required "requests" package (see the readme.md) I found I
had to enter the full path to pip (Its it the scripts folder)
I am happy to answer any questions but note I am in the UK (that is East Pondia not the
University of Kentucky) so please allow for my time zone.
Dave
G4UGM
P.S. I now hate python....
PPS I also now hate Yahoo.
-----Original Message-----
From: cctalk <cctalk-bounces at classiccmp.org> On Behalf Of Steve Malikoff via
cctalk
Sent: 24 October 2019 02:51
To: General Discussion: On-Topic and Off-Topic Posts <cctalk at classiccmp.org>
Subject: Re: Yahoo Groups going away
Jim said
On 10/17/2019 6:52 PM, Cameron Kaiser via cctalk
wrote:
> Yeah,
it sucks. The Tomy Tutor users group has been there for
> years, and I guess we'll jump over to groups.io. I managed to
> archive everything last night.
What's your strategy for archiving material off YahooGroups? Their
Files and Photo (photostreams) sections are so heavily
Javascript-encrusted that it's not at all easy to bulk archive from
them. I tried a few tools (httrack, wget,
curl) with no valid results, but I only used some basic settings.
For the
messages, I used
https://github.com/andrewferguson/YahooGroups-Archiver
Unfortunately, the (rather inadequate) Y!G API for files makes it
difficult to iterate over files in a directory tree. I ended up
manually downloading them, since it was only about 30 files and not
worth ginning up something to scrape them. Some people have used
https://github.com/csaftoiu/yahoo-groups-backup I didn't get that to work.
Has anyone here got suggestions? Contact
off list. It is getting errors, and I spent about an hour trying to
figure it out.
every issue was a bug in either Python that was unresolved, or the
tools they were using, not errors in the tool, so I'm not really
interested in a lot more debugging.
I suspect it ran at some point, maybe I've got the wrong versions of
some sort.
thanks
Jim
> to get everything but it needs a MongoDB instance which seemed kind
> of overkill for a one-time dump.
I set it up with python 3.7.3, pip installed the required modules such as
Selenium, installed geckodriver for Firefox (but I don't run Firefox on this
machine, I use a popular fork) and it emitted an error that referes to Selenium
not being the correct match to Firefox.
I have other things to do so that's where I left it for now, will try it out again
sometime soon with an earlier actual Firefox.
Steve.