amXor

Blogging about our lives online.

3.06.2010

File Management Tool - Part 2

I have worked with my program a bit more and there are some interesting aspects of this kind of backup.

At the end of the backup all duplicate files point to the same inode, so the SHA1 version can be deleted.

eg. 

inode   name
1299    workingdir/folder1/file1.txt
1299    workingdir/folder5/file1.txt
1299    workingdir/folderx/file1_renamed.txt
1299    backupdir/54817fa363dc294bc03e4a70f51f5411f4a0e9a9

All these files now point at the same inode and so the backup directory can be erased and no file has executive control over this inode. All three files would have to be deleted to finally get rid of inode 1299. Generally it seems that programs save files with new inodes (Text Edit ...), so editing any of the versions breaks the links. It seems that UNIXy programs respect the inode better, vim saves with the same inode and so editing any version edits every version.

Removing the "backup" directory also helps Spotlight resolve the names and filetypes. Deleting that folder and running `mdimport ./workingdir` complained mightily but more or less re-indexed the folder. Here is a quick slice of the errors it produced, I'm not going to try to make sense of them, but think they're interesting; maybe Spotlight encounters these kinds of problems always and just keeps silent about them.

$mdimport ./workingdir
...
font `F88' not found in document.
font `F82' not found in document.
font `F88' not found in document.
font `F82' not found in document.
font `F88' not found in document.
font `F82' not found in document.
font `F88' not found in document.
encountered unexpected symbol `c'.
encountered unexpected symbol `c'.
encountered unexpected symbol `c'.
encountered unexpected symbol `c'.
encountered unexpected symbol `c'.
encountered unexpected symbol `c'.
encountered unexpected symbol `c'.
choked on input: `144.255.258'.
choked on input: `630.3.9'.
choked on input: `681.906458.747'.
choked on input: `680.335458.626'.
choked on input: `682.932458.507'.
choked on input: `530.3354382.624'.
font `Fw' not found in document.
font `Fw8' not found in document.
encountered unexpected symbol `w6.8'.
encountered unexpected symbol `w0.5'.
font `Fw8' not found in document.
encountered unexpected symbol `w0.5'.
encountered unexpected symbol `w6.8'.
choked on input: `397.67.'.
choked on input: `370.5g'.
choked on input: `370.5g'.
choked on input: `D42.32 m
314.94 742.32 l
S
306.06 751.2 m
306.06 7...'.
choked on input: `67.l'.
choked on input: `67.l4'
failed to find start of cross-reference table.
missing or invalid cross-reference trailer.

To reiterate, this is a funny trick that my program is doing. It builds a list of SHA1 named files from the source directory and then you just delete the index it just made and you're left with all the duplicates hard linked. I think that's pretty cool.

Metadata

One stated aim of this backup tool was to preserve metadata. So far this tool preserves the time stamps and metadata of whatever file it indexes first and the filename of every file it indexes. I'm not sure how to implement any more than this in a transparent way. As far as I can tell from the documentation, you can't have a single inode with multiple access and modification times. And building an external database of that kind of information would not get used.

No comments:

Post a Comment

Twitter

Labels

Followers

andyvanee.com

Files