Daniel's Blog: Project SOULTRAIN Notes

« Kim | Main | Response to: "You Relentless Cock," an Essay by Will Pardue »

December 29, 2004

Project SOULTRAIN Notes

I got a book, “Managing AFS (Andrew File System)” for Christmas, and now I’m rethinking some aspects of how Project SOULTRAIN should work. I’m not longer thinking about a costly and shitty RAID array, so much as a costly and weird network of about three little computers. AFS has some pretty serious advantages over something simple like a RAID array:

The users don’t need to know where the files are, really, they can just get at them at /afs/storytotell.org/soultrain/artist/album/…
The users will get (for free) the cache-action, if they’ve bothered to set up AFS clients on their machines. The cache action reduces my network load while maintaining transparency for them.

This also enables me to use Prolog as my querying system, as I can put the database as a Prolog source file in the filesystem, and anyone who wants to search will just run some program which utilizes the file. The file is then cached locally, reducing the load on my server, and we get to have the beauty of Prolog for the database, meaning that relationships which are hard to model in SQL can be used to the fullest possible extent.

Each album will be an AFS “volume” of size ~800 MB, which is to say, a 700 MB CD image (assuming FLAC wasn’t really able to compress it at all) + 100 MB of Ogg or MP3 rips. I’ll provide tools which will generate whatever formats you like.

This has interesting ramifications. For one, it means I’ll be using Kerberos here, and I’ll also be doing cross-realm authentication with Kerberos. I’ll have three groups of users: shell users, audiophiles, and archivists. Shell users will be for people who want accounts on my personal computer (me and Alex, basically), audiophiles will be anyone I trust who wants to have access to the music, and archivists will be people who have permission to run my archiving command.

The archiving command will basically create the disk image and populate the Prolog database. In the process it will have to create an AFS volume (with a name like “a.death01” for the first Death album to be archived, “a.andromeda03” for the third Andromeda album to be archived, or “a.godspeed_yo05” for the fifth Godspeed You Black Emperor! album to be archived) This mapping will need to be stored somewhere else, either in LDAP or an SQL database, I’m really rather torn. This might also mean that “archivist” isn’t implementable as a group, there can only be “afsadmin” and I wouldn’t be able to restrict it further, but I’m not really sure. It would be nice, because I anticipate getting Dressel in on this, and having him run a fileserver at his place, archiving into my AFS, and then us replicating for each other.

I’m going to draw up some more detailed notes before the night is out. I’ve sworn not to fuck up the BSD box before the Mac gets back from Apple (it’s being shipped out tomorrow, yay for paying for service). Unfortunately, OpenAFS server can’t be run on FreeBSD due to the brain-damaged way they implemented it. I’d like to implement something like AFS 2, someday when I have thousands of hours of free time and a huge network to experiment with. AFS2 would be just like AFS1, except:

it wouldn’t depend on Kerberos explicitly (or, if it had to, it would depend on version 5 and not have this token cache weirdness)
it wouldn’t have its own weird ACL system, it would instead use a more POSIXly right system
it would use LDAP for some of its own internal directories (users and groups in the protection server comes to mind)
it would discover other cells upon attempted traversal rather than forcing you to add every cell to CellServDB (or else it would at least use LDAP to hold the data)

Apart from these concerns, AFS is pretty much the shit of the hour. NFS4 is going to be closer, but still not have the whole unified tree thing which I like quite a bit about AFS. DFS apparently misses the point by quite a bit, and is pay-ware and really expensive. Coda doesn’t work, and neither does InterMezzo or that other wacky network filesystem that those same guys were working on. Additionally, AFS works on a variety of platforms (indeed, I could run an AFS client on the BSD box, but what I need right now is servers).

So, here’s the rough outline for implementation at home, as far ahead as I can see:

Install Linux on the now-BSD box
Make BIND hand out some random names for the computers around here, to make Kerberos 5 work
Install and make work Kerberos 5
Make OpenAFS work over Kerberos 5 without the ticket translator. Prove that it works via the Mac.
Set up the AFS framework I’ll need, like for the USS tool and the script which deals with making archive folders

We’ll see how it goes.

Posted by FusionGyro at December 29, 2004 10:18 PM

Trackback Pings

TrackBack URL for this entry:
http://www.clanspum.net/~fusion/blog/admin/mt-tb.cgi/72

Comments

If you’re into network filesystems, check out http://www.lustre.org/. Intermezzo in the kernel support was dumped for Lustre and because not enough people were testing Intermezzo. Intermezzo was supposed to be the “next generation” AFS/Coda.

I was investigating AFS and Coda as alternatives to NFS. Coda made a lot of promises about how good it is, but I felt it to have some very annoying setup requirements and constraints (max size shares of 8GB??!).

One of the main reasons I was interested in these is because I do EVERYTHING over NFS (only one of my computers has hard drives in it): root filesystems, home directories, etc.. I thought it would be nice if I could put hard drives in my other computers for some cache-action to beat the performance of NFS over 100Mbits. For some reason I got turned off and got the impression that AFS, Coda, and even Intermezzo (which had the best performance I think) did not yield performance as good as NFS, but I think I need to reinvestigate this.

Posted by: David Baird at January 7, 2005 07:49 PM

Daniel's Blog

get dan'd

December 29, 2004

Project SOULTRAIN Notes

Trackback Pings

Comments

Post a comment