68kMLA Classic Interface

This is a version of the 68kMLA forums for viewing on your favorite old mac. Visitors on modern platforms may prefer the main site.

Click here to select a new forum.
restoring 68kmla from google and archive.com web caches
Posted by: sunder on 2007-05-04 10:44:56
For those that don't know, when 68kmla went down I thought to grab it off Google's cache and wrote up a couple of scripts to do so. After talking to some folks on PPCMLA, I got a better search string, and ran a second pass that way.

Next, I was able to grab all the pages from archive.org based on the search criteria found here:

http://www.ppcmla.com/forums/viewtopic.php?p=991#991

I've not yet looked much at the HTML, but you can download it here:

http://lisaem.sunder.net/68kmla-archive.tar.bz2

The google cache ones can be found here:

http://lisaem.sunder.net/68kmla-gcache.tar.bz2

Note that the file names contain strange characters such as &'s ?'s and @'s.

So the final step is to write something to extract the posts from the HTML, etc.

I suspect between google and archive.org we should have most of the old posts.

I don't quite know what format messages should be in so that they can be fed back into PHPbb or whatever the current board software is. What would make sense to you guys?

Trouble is that it's not very easy to reverse the special tags such as [ code ] and [ quote ] back from the html into their tags.

Or should I just leave it as HTML pages? Trouble with that is that you won't be able to use do anything but read the old messages.

:b&w:

Posted by: ~tl on 2007-05-04 11:08:43
I'll take a look, see what the pages are like and see what I can do with it.

Posted by: iMac600 on 2007-05-10 09:43:34
I've been sorting through them and while they are complete, it looks like we'll need a team to go through all those.

I think I have a way to sort through all of them, but at the moment it's quite slow.

Posted by: Dan 7.1 on 2007-05-10 10:50:56
...this might be the minority viewpoint here, but why exactly do we need to restore all the old data?

a fresh start is a healthy thing for any community. unless you guys just really love to read old threads.

Posted by: iMac600 on 2007-05-10 10:53:52
The same reason we have http://68kmla.org/forums/archive online. We covered thousands of Mac problems and how to repair them on the old forums. They were a great resource to everyone. If we archive them, we can then pull out a few choice ones later for an FAQ, etc.

That and there's a few that need restoring for the heck of it, most notably the "You know you're obsessed when..." and "HATS!" threads (both of which I have located some of already).

Posted by: gobabushka on 2007-05-10 11:37:34
hey, it wont let me download the archive.org file from your server. im getting a 403 error

Posted by: sunder on 2007-05-15 17:17:51
hey, it wont let me download the archive.org file from your server. im getting a 403 error
Oops! Sorry about that. s/b good now.

Posted by: Bunsen on 2007-06-03 18:40:28
*bump*

Any news, guys?

Posted by: funkytoad on 2007-06-03 18:55:50
Well, we also have these:

This is a whole link [http://web.archive.org/web/*/68kmla.org]

I love the way back machine, great tool!

Posted by: Bunsen on 2007-06-19 07:56:58
Next, I was able to grab all the pages from archive.org
Posted by: Flash! on 2007-06-20 03:06:58
...this might be the minority viewpoint here, but why exactly do we need to restore all the old data?
I tend to agree. Though there is a lot of information there, there is also a lot of junk. If the whole lot can be put online somewhere as html that would probably suffice - I don't really see the point of someone spending however many hours trying to get the data sorted out again (unless of course they really really want to do it, and they don't care if 80% of the MLA don't appreciate their effort).

If a question/problem pops up, I'm sure that we (the MLA) will just post an answer. We've got a new Feets thread running again, I'm sure that we can repeat ourselves on a few other topics too [😉] ]'>

Posted by: ~tl on 2007-06-20 05:21:52
Yeah, maybe the best solution would be to archive any useful information that crops up on the forums in some sort of information database. Some sort of wiki type setup would probably be the best. What do you think? It would be a lot easier to find useful information if it was all in one place... plus you wouldn't need to sort through all the crap that gets posted on the forums.

Posted by: Mr. 680x0 on 2007-06-20 06:27:56
We should release a yearly book with all the info from the forum. 😀 Or a monthly magazine - I know I'd subscribe.

Posted by: Bunsen on 2007-07-30 19:14:57
Just wondering who's working on this and how it's going? What kind of assistance would be helpful?

Posted by: Mars478 on 2010-01-28 08:22:08
Any headway on this topic? I know this is almost 2 years old. 🙂

Posted by: Osgeld on 2010-01-28 08:39:32
i thought it was archived somewhere around here ... restoration and inclusion with this forum could be easy if the old database is in tact (aside from changing all the post numbers)

Posted by: johnklos on 2010-01-28 12:33:07
i thought it was archived somewhere around here ... restoration and inclusion with this forum could be easy if the old database is in tact (aside from changing all the post numbers)
Since the database is what is lost, perhaps we need to look at this differently. Rather than try to load old static content into a dynamic web site (which has no real advantage since it's not going to change), we should instead try to massage the links to all be relative, create a static site for all of the archives, and put it up outside of the 68kmla.org site. If I were running 68kmla, I wouldn't want to try to import stuff taken from Google or Archive.org directly into the site.

I'll grab those archives and see how usable they are in their current form. Perhaps between sed and mod_rewrite, they can be put on a static domain and used that way.

John

Posted by: ~tl on 2010-01-28 12:56:06
We (well, I) have the archives for the era beginning with the transition from the Snitz forum (30th Dec. 2003) until a backup was made on the 23rd Feb. 2006. The only thing that is actually missing is between then and the server crash on the 5th Apr. 2007. I've been considering merging that database with the current one, but I haven't found an easy tool to do so for phpBB. If anyone has any ideas, I'd be interested to hear them.

As for the topics in the "gap". Sure it may be possible to recover some of them from the archive.org/etc (many have been downloaded already) but sorting them out and getting them back into a usable format is a BIG task. Certainly more than I've been able to take on over the past few years.

Posted by: Osgeld on 2010-01-28 13:00:09
you probally wont find a easy tool, the last couple times ive had to do similar I had to write a script

it doesnt really have to be an interactive thing i guess, one could setup the old system and mirror it into static html just for reference (and would be quicker and more reliable than archive.org ... maybe)

1 >