fs2svn

Convert a bunch of archive folders into a Subversion repository

If you’ve kept a series of historical snapshots of your work in folders, fs2svn can help you upgrade to a full-fledged Subversion version control system.

fs2svn goes through all the folders under a given parent folder (in filesystem order) and creates a Subversion revision for each one, backdated to the most recent file’s last modified date. The log message is set to the folder name.

Additions, changes, and deletions between one folder and the next are all recorded in the repository.

The input format is very simple. It only covers the mainline trunk, not any tags or branches (though tags for major versions could be manually created later, if your folder names carry enough information).

The format is so simple it could be used as a common intermediary. If you wanted to migrate a mainline trunk from some exotic version control system to Subversion, you could write a script to export it to regular folders, then use this script to import the result into Subversion.

Deltas

As an optional feature, you can mark any subfolders as “deltas” using a configurable naming convention. Files missing from delta folders won’t be marked as deleted in the repository. The idea is that you had only put in the newly changed files. (If it turns out that a file hasn’t changed, it isn’t committed a second time, delta folder or not.) Subfolders of delta folders may be regular full folders or delta folders, dependent on the same naming convention. Whatever you add to the name to indicate a delta will be stripped off before the folder hits the repository.

You can configure the naming convention with one or more --ignore-deletes-in command line options. See the examples below.

Folder Renaming: Shorthand Folders

You may find your project has a lot of pieces, and so you’d like to create top-level folders within your project trunk for each one. Unfortunately, in the archive folders you’re using to create the repository, they may not have the right names, or the names are inconsistent. Or, since most of the changes just involve one particular piece, you didn’t bother creating a folder for it on each revision. Changes to that main piece were usually stored loose in the revision’s folder. But now you’d like them to go into a designated subfolder in the repository.

Shorthand folders let you rename folders as the root level of the project as fs2svn reads them. If none of those rules match, everything goes under a default folder. All of this is configurable.

Shorthand folders deserve an example.

Say you have this:

  • My Project Archive
    • 2005-01-05 backup
      • wwwroot
        • index.html
        • about.html
      • scripts
        • helper.py
      • sql work
        • db structure.sql
    • 2005-01-08 - exported db - delta
      • sql work
        • db structure.sql
    • 2005-02-12 - fixed typo - delta
      • about.html
    • 2005-02-18 - fixed datatype - delta
      • database
        • db structure.sql
    • etc.

You want the repository to have a "www" (originally "wwwroot") folder, a "scripts" folder, and a "db" folder (orignally "sql work" or "database"). But in many cases, the source folders don’t contain any of those folders, and instead have loose files that really belong in "www". So make "www" the default folder, "wwwroot" a special folder that maps to "www", "sql work" a shorthand folder that maps to "db", "database" a shorthand folder that maps to "db", and "scripts" a special folder (with no renaming). The repository will look like this:

Revisions 1 and 2 (created automatically)

  • add /branches
  • add /tags
  • add /trunk

Revision 3 (date: 2005-01-05, log: "2005-01-05 backup")

  • add /trunk/www/index.html
  • add /trunk/www/about.html
  • add /trunk/scripts/helper.py

Revision 4 (date: 2005-01-08, log: "2005-01-08 - exported db - delta")

  • add /trunk/db/db structure.sql

Revision 5 (date: 2005-02-12; log: "2005-02-12 - fixed typo - delta")

  • change /trunk/www/about.html

Revision 6 (date: 2005-02-18; log: "2005-02-18 - fixed datatype - delta")

  • change /trunk/db/db structure.sql

Implementation

fs2svn depends on cvs2svn, a tool for converting CVS repositories to Subversion. When setting out to write it, I considered creating my own Subversion dumpfile writer from scratch, but then decided to use the one the cvs2svn team had already written and tested.

Unfortunately, cvs2svn wasn’t written to be pulled apart. Its SVNRevision class depends on CVSRevision, which in turn depends on everything else.

So fs2svn, in order to use any cvs2svn functionality, has to inject its own replacement CVSRevision class into cvs2svn. Instead of calling out to the CVS command-line tools, the new class reads directly from the filesystem.

A disadvantage of this approach is that fs2svn is now rather tightly coupled to cvs2svn, and may have to be updated along with cvs2svn. It’s open to debate whether this cost is worth the perks, which include many command-line options from cvs2svn that work for free in fs2svn. (Unfortunately, command-line parsing was the one area I had to copy some of cvs2svn’s source, rather than just importing it.)

Sample Command Lines

This command line generates a dumpfile suitable for svnadmin load, fills in MIME types from your Apache mime.types file, and suppresses native line ending conversion. (Apache mime.types location is correct for Mac OS X, at least.)

python fs2svn.py --dumpfile=../svndumpfile.txt --dump-only --username=$USER --svnadmin=/usr/local/bin/svnadmin --no-default-eol --keywords-off --mime-types=/etc/httpd/mime.types --exclude="[.].*" --exclude="[.]DS_Store" --exclude="_vti_cnf" --ignore-deletes-in="(.*?) *delta" --ignore-deletes-in="(.*?) *part" --ignore-deletes-in="from +(.*)" --shorthand-folders=shorthand-folders.txt ../folder-with-many-revision-subfolders

Making a dumpfile is often useful (you may want to perform futher processing before importing it into a repository—beware of using a text editor, though, because some, including BBEdit, aren’t binary-safe and will silently normalize line endings, destroying binary files.) You can skip straight to the repository, though:

python fs2svn.py -s ../myrepository --fs-type=fsfs --username=$USER ... (continue as before)

More on Shorthand Folders (Folder Renaming)

If there’s a shorthand folder config present, then there’s another layer of structure inside the revision folders that fs2svn recognizes.

For each revision folder, either:

  1. All the children are shorthand folders, or
  2. None of them are, and the default shorthand folder will be assumed as a parent.

Sample ShorthandFolderMapper file (set with --shorthand-folders=)

# Format is dir-name-in-filesystem:repository path (under trunk)
#
# If there's no colon, the name and path are taken to be the same.
# If the first part is empty (line starts with a colon), the second part
# specifies a default respository path. If revision dir's subfolders don't
# match the other shorthand folders, it's assumed that all its contents were meant to be
# put under this default parent.
#
# Example: ":www" means that revision folders containing no shorthand subfolders
# will have their contents placed in /trunk/www/.
# "mssql:db" means that revision folders containing a direct "mssql" child
# will have that sufolders contents placed in /trunk/db/.
# "programs-shared" means that programs-shared is recognized as a shorthand
# folder but the name is unchanged in the repository.
#
:www
wwwroot:www
www
programs-shared
mssql:db
db:db
wwwroot dev:www
wwwroot myproject:www
myproject:www

Installation

Download cvs2svn and its dependencies. The script has been tested with Python 2.3 and higher and cvs2svn 1.2.1. Notably, you may have to install the Python bindings for BerkeleyDB, which cvs2svn really does depend on.

Next, name your downloaded copy of fs2svn “fs2svn.py”, and put it in the same folder as cvs2svn.py. (Or move cvs2svn.py to the same folder as fs2svn.py. Either way works.)

Next, open a command prompt and try one of the commands above (with the path to your actual archive folder).

Look for messages about revisions that delete files. These are often clues that folders aren’t being matched up properly between revisions. (This tends to lead to a file being removed in one revision and added back subsequently.) The error could be in the revision with the deletion, or the revision where the file was originally added, so search back through the log for the filename. If folder names were mismatched between revisions, fix them and run the command again. (If you’ve used the command-line options to create a repository, rather than --dump-only, you’ll need to delete the new repository first.) fs2svn doesn’t change your archive folders, so you can keep running the command until the repository looks right.

Download

[View/Download fs2svn.py] (26K) Version 1.0, released 2005-06-23.