Lokesh Kumar Bandi: Subversion

What is Subversion's client/serverQ interoperability policy?

The client and server are designed to work as long as they aren't more than one major release version apart. For example, any 1.X client will work with a 1.Y server. However, if the client and server versions don't match, certain features may not be available.

See the client/server interoperability policy is documented in the "Compatibility" section of the Subversion Community Guide.

What operating systems does Subversion run on?

All modern flavors of Unix, Windows, BeOS, OS/2, MacOS X.

Subversion is written in ANSI C and uses APR, the Apache Portable Runtime library, as a portability layer. The Subversion client will run anywhere APR runs, which is most places. The Subversion server (i.e., the repository side) is the same, except that it will not host a Berkeley DB repository on Win9x platforms (Win95/Win98/WinME), because Berkeley DB has shared-memory segment problems on Win9x. FSFS repositories (introduced in version 1.1) do not have this restriction; however, due to a limitation in Win9x's file-locking support, they also don't work in Win9x.

To reiterate, the Subversion client can be run on any platform where APR runs. The Subversion server can also be run on any platform where APR runs, but cannot host a repository on Win95/Win98/WinMe.

I heard that Subversion is an Apache extension? What does it use for servers?

No. Subversion is a set of libraries. It comes with a command-line client that uses them. There are two different Subversion server processes: either svnserve, which is small standalone program similar to cvs pserver, or Apache httpd-2.0 using a special mod_dav_svn module. svnserve speaks a custom protocol, while mod_dav_svn uses WebDAV as its network protocol. See chapter 6 in the Subversion book to learn more.

Why does the entire repository share the same revision number? I want each of my projects to have their own revision numbers.

First, note that Subversion has no concept of projects. The repository just stores a versioned directory tree — you may consider certain sub-trees to be projects, but Subversion doesn't treat them differently from any other sub-tree. Thus, the interpretation of what constitutes a project in the repository is left entirely up to the users. (This is similar to how branches andtags are conventions built on top of copies, instead of being basic concepts built into Subversion itself.)

Each time you commit a change, the repository stores a new revision of that overall repository tree, and labels the new tree with a new revision number. Of course, most of the tree is the same as the revision before, except for the parts you changed.

The new revision number is a sequential label that applies to the entire new tree, not just to the files and directories you touched in that revision. However, colloquially, a revision number is used to refer to the change committed in that revision; for example, "the change in r588" ("r588" is shorthand for "revision 588") really means "the difference between repository trees 587 and 588", or put another way, "the change made to tree 587 to produce tree 588".

Thus, the advancing revision number marks the progress of the repository as a whole; you generally can't gauge the progress of a particular project within the repository by watching the revision number. Also, the revision number should not be used as the publicly-visible release number of a particular project in the repository. For that, you should devise some other mechanism of distinguishing releases, such as using tags.

oes Subversion have Changesets?

The question is a bit loaded, because everyone seems to have a slightly different definition of "changeset", or a least a slightly different expectation of what it means for a version control system to have "changeset features".

For the purposes of this discussion, here's a simple definition of changeset: it's a collection of changes with a unique name. The changes might include textual edits to file contents, modifications to tree structure, or tweaks to metadata. In more common speak, a changeset is just a patch with a name you can refer to.

Subversion manages versioned trees as first order objects (the repository is an array of trees), and the changesets are things that are derived (by comparing adjacent trees.) Systems like Arch or Bitkeeper are built the other way around: they're designed to manage changesets as first order objects (the repository is a bag of patches), and trees are derived by composing sets of patches together.

Neither philosophy is better in absolute terms: the debate goes back at least 30 years. The two designs are better or worse for different types of software development. We're not going to discuss that here. Instead, here's an explanation of what you can do with Subversion.

In Subversion, a global revision number 'N' names a tree in the repository: it's the way the repository looked after the Nth commit. It's also the name of an implicit changeset: if you compare tree N with tree N-1, you can derive the exact patch that was committed.

For this reason, it's easy to think of "revision N" as not just a tree, but a changeset as well. If you use an issue tracker to manage bugs, you can use the revision numbers to refer to particular patches that fix bugs -- for example, "this issue was fixed by revision 9238." Somebody can then run 'svn log -r9238' to read about the exact changeset which fixed the bug, and run 'svn diff -r9237:9238' to see the patch itself. And svn's merge command also uses revision numbers. You can merge specific changesets from one branch to another by naming them in the merge arguments: 'svn merge -r9237:9238 branchURL' would merge changeset #9238 into your working copy.

This is nowhere near as complicated as a system built around changesets as primary objects, but it's still a vast convenience over CVS.

How do I check out the Subversion code?

Use the Subversion client:

$ svn co http://svn.apache.org/repos/asf/subversion/trunk subversion

That will check out a copy of the Subversion source tree into a directory named subversion on your local machine

How do I create a repository? How do I import data into it?

Steps:

1. Create a Repository:

svnadmin create /svnrepos

2. Create a SVN User

vi /svnrepos/conf/svnserve.conf

anon-access = none

auth-access = write

password-db = passwd

And add users in the format: user = password

E.g.:

tony =
mypassword

3. Import Your Project

(Assuming you’ve put your project files in /projects/myrailsproject)

svn import /projects/myrailsproject
file:///svnrepos/myrailsproject

4. Start the SVN Server as Daemon

svnserve -d

Done! You should now have an Apache Subversion server running with one project named myrailsproject.

Try checking it out of the repository:

svn co
svn://192.168.0.2/svnrepos/myyrailsproject

Since we set anon-access to none you should be prompted for username and password which you created in the file /svnrepos/conf/passwd.

How do I manage several different projects under Subversion?

It depends upon the projects involved. If the projects are related, and are likely to share data, then it's best to create one repository with several subdirectories like this:

$ svnadmin create /repo/svn

$ svn mkdir file:///repo/svn/projA

$ svn mkdir file:///repo/svn/projB

$ svn mkdir file:///repo/svn/projC

If the projects are completely unrelated, and not likely to share data between them, then it's probably best to create separate and unrelated repositories.

$ mkdir /repo/svn

$ svnadmin create /repo/svn/projA

$ svnadmin create /repo/svn/projB

$ svnadmin create /repo/svn/projC

The difference between these two approaches

In the first case, code can easily be copied or moved around between projects, and the history is preserved. ('svn cp/mv' currently only works within a single repository.)

· Because revision numbers are repository-wide, a commit to any project in the first case causes a global revision bump. So it might seem a bit odd if somebody has 'projB' checked out, notices that 10 revisions have happened, but projB hasn't changed at all. Not a big deal, really. Just a little weird at first. This used to happen to svn everytime people committed to rapidsvn, when rapidsvn was in the same repository. :-)

· The second case might be easier to secure; it's easier to insulate projects from each other (in terms of users and permissions) using Apache's access control. In the 1st case, you'll need a fancy hook script in the repository that distinguishes projects ("is this user allowed to commit to this particular subdir?") Of course, we already have such a script, ready for you to use.

How do I merge two completely separate repositories?

If you don't care about retaining all the history of one of the repositories, you can just create a new directory under one project's repository, then import the other.

If you care about retaining the history of both, then you can use 'svnadmin dump' to dump one repository, and 'svnadmin load' to load it into the other repository. The revision numbers will be off, but you'll still have the history.

For Example:

There was a point in time when I created a copy of a project and it was then committed into another repository. This as such is not a big problem, but merging those repositories back together while keeping all the changes in the history is a challenge.

The challenge

Subversion does not support the combinig of two repositories. This is because of the way subversion stores revisions. When you have two repositories to combine, it is important to understand that the revisions of the same directory of the two repositories can not merged into each other, but you can merge two repositories into one by importing the two repositories into two different directories in a repository. Lets assume we have the following source repositories, where repository A was the first, which was later moved to repository B.

repo_A

|-- branch

|-- tags

\-- trunk

|-- file1.txt

\-- file2.txt

repo_B

|-- branch

|-- tags

| |-- tag_1

| \-- tag_2

\-- trunk

|-- file1.txt

\-- file2.txt

Notice that both repositories contain the same files in the trunk. If we want to combine these repositories, we can not merge the two trunk directories into one, but what we can do is to merge both repositories and their history into one repository. The resulting repository might look like the following:

repo_combo

|-- branch

| \-- repo_A_trunk (created from trunk of repository A)

| |-- file1.txt

| \-- file2.txt

|-- tags

| |-- tag_1 (the tag from repository B)

| \-- tag_2 (the tag from repository B)

\-- trunk (the trunk from repository B)

|-- file1.txt

\-- file2.txt

Subversion does not have a fixed structure which allows you to place the “trunk” from repo_A wherever you want. This is just an example I used to merge them.

This is how it works

The following steps will explain the procedure to merge the two repositories. As you will see, this procedure will dump both repositories and merge them into a completely new repository.

It would be possible to import one repository directly into the other one but for safety reasons I decided not to do that. With this procedure you always have the possibility to go back to the two unchanged repositories in case something goes wrong or you forgot to merge something something.

To start merging the repositories, we need a dump from each of them. This is done with the following commands:

svnadmin dump file:///path/to/repo_A/ >repo_A.dump

svnadmin dump file:///path/to/repo_B/ >repo_B.dump

This dump files contains all the commits from the whole repositories. This is actually a full copy of the complete repository in one single file. Earlier I explained the structure we want to go for; so the content of repository A should be in the “branch” directory of the new repository but without the “tags” and “branch” directories of repo_A.

So the next step is to filter the unneeded content out of the dump files. In this case the “tags” and “branch” are not needed, we only want the content of the “trunk” directory, and for repo_B we need the “trunk” and “tags”, but we don’t need the “branch” directory. This is done with the following command. For repo_A it defines only the content that should be included, while for repo_B it defines what should be excluded. These commands will also cause the dump to exclude revisions that do not contain any changes as well as renumbering the revisions. If you don’t want that, omit the appropriate parameter.

cat repo_A.dump | svndumpfilter include "trunk" --drop-empty-revs --renumber-revs >repo_A_trunk.dump

cat repo_B.dump | svndumpfilter exclude "branch" --drop-empty-revs --renumber-revs >repo_B_trunk_tags.dump

If you need any other directories from the dump as well, you need to adapt the filter accordingly. See the svndumpfilter manpage for details.

Now we need to build up the new repository structure as described above. To do so we need to create a new repository and check this out locally to build up the structure.

svnadmin create /path/to/repo_combined

svn checkout file:///path/to/repo_combined/ /path/to/checkout_combined

Create the structure of the new repository as usual with a new repository. After the directory structure has been created, it needs to be added and committed to the repository. All this is done with the following commands:

cd /path/to/checkout_combined

mkdir branch

mkdir branch/repo_A_trunk

mkdir tags

mkdir trunk

svn add *

svn commit -m "commit message for the structure of the new repository"

Now that we have the structure created we can load the dump into the new repository. When doing this, the parent directory you load it to needs to already exist. That’s why we needed to create the parent directory (“branch/repo_A_trunk”) before loading the dump.

svnadmin load repo_combined --parent-dir branch/repo_A_trunk --ignore-uuid <repo_A_trunk.dump

svnadmin load repo_combined --ignore-uuid <repo_B_trunk.dump

After this, the repository contains the “trunk” from repository A and from repository B, the “trunk” and “branch”. Repository A is located at /branch/repo_A_trunk and the “trunk” from repository B is in the “trunk” of the new repository. By adding first repo_A and afterwards the repo_B dump, we keep the revisions in their chronological order.

To check this has all worked, just execute “svn update” in the already checked out directory. With “svn log -v” you will then be able to print the complete history.

Alternative structure

Of course, you can use the same procedure to create a structure like the following just by not filtering out anything, and load the repositories into the directories repo_A and repo_B.

repo_combined

|-- repo_A

| |-- branch

| |-- tags

| |-- trunk

| |-- file1.txt

| \-- file2.txt

\-- repo_B

|-- branch

|-- tags

| |-- tag_1

| \-- tag_2

\-- trunk

|-- file1.txt

\-- file2.txt

With this procedure you can create any structure you want, but keep in mind that you can not load a dump into a directory which does not already exist in the repository or which already contains files with the same names as those you would be importing.

Why is my repository taking up so much disk space?

The repository stores all your data in a Berkeley DB "environment" in the repos/db/ subdirectory. The environment contains a collection of tables and bunch of logfiles (log.*). Berkeley DB journals all changes made to the tables, so that the tables can be recovered to a consistent state in case of interruptions (more info).

The logfiles will grow forever, eating up disk space, unless you, (as the repository administrator) do something about it. At any given moment, Berkeley DB is only using a few logfiles actively (see this post and its associated thread); the rest can be safely deleted. If you keep all the logfiles around forever, then in theory Berkeley DB can replay every change to your repository from the day it was born. But in practice, if you're making backups, it's probably not worth the cost in disk space.

Use svnadmin to see which log files can be deleted. You may want a cron job to do this.

$ svnadmin list-unused-dblogs /repos

/repos/db/log.000003

/repos/db/log.000004

[...]

$ svnadmin list-unused-dblogs /repos | xargs rm

# disk space reclaimed!

You could instead use Berkeley DB's db_archive command:

$ db_archive -a -h /repos/db | xargs rm

# disk space reclaimed!

See also svnadmin hotcopy or hotbackup.py.

Note: If you use Berkeley DB 4.2, Subversion will create new repositories with automatic log file removal enabled. You can change this by passing the --bdb-log-keep option tosvnadmin create. Refer to the section about the DB_LOG_AUTOREMOVE flag in the Berkeley DB manual.

How do I completely remove a file from the repository's history?

There are special cases where you might want to destroy all evidence of a file or commit. (Perhaps somebody accidentally committed a confidential document.) This isn't so easy, because Subversion is deliberately designed to never lose information. Revisions are immutable trees which build upon one another. Removing a revision from history would cause a domino effect, creating chaos in all subsequent revisions and possibly invalidating all working copies.

The project has plans, however, to someday implement an svnadmin obliterate command which would accomplish the task of permanently deleting information. (See issue 516.)

In the meantime, your only recourse is to svnadmin dump your repository, then pipe the dumpfile through svndumpfilter (excluding the bad path) into an svnadmin loadcommand. See chapter 5 of the Subversion book for details about this.

An alternative approach is to replicate the repository with svnsync after configuring path-based authorization rules that deny read access to any paths that need to be filtered from history. Unlike svndumpfilter, svnsync will automatically translate copy operations with an unreadable source path into normal additions, which is useful if history involving copy operations needs to be filtered.

How do I change the log message for a revision after it's been committed?

Log messages are kept in the repository as properties attached to each revision. By default, the log message property (svn:log) cannot be edited once it is committed. That is because changes to revision properties (of which svn:log is one) cause the property's previous value to be permanently discarded, and Subversion tries to prevent you from doing this accidentally. However, there are a couple of ways to get Subversion to change a revision property.

The first way is for the repository administrator to enable revision property modifications. This is done by creating a hook called "pre-revprop-change" (see this section in the Subversion book for more details about how to do this). The "pre-revprop-change" hook has access to the old log message before it is changed, so it can preserve it in some way (for example, by sending an email). Once revision property modifications are enabled, you can change a revision's log message by passing the --revprop switch to svn propedit or svn propset, like either one of these:

$ svn propedit -r N --revprop svn:log URL

$ svn propset -r N --revprop svn:log "new log message" URL

where N is the revision number whose log message you wish to change, and URL is the location of the repository. If you run this command from within a working copy, you can leave off the URL.

The second way of changing a log message is to use svnadmin setlog. This must be done by referring to the repository's location on the filesystem. You cannot modify a remote repository using this command.

$ svnadmin setlog REPOS_PATH -r N FILE

where REPOS_PATH is the repository location, N is the revision number whose log message you wish to change, and FILE is a file containing the new log message. If the "pre-revprop-change" hook is not in place (or you want to bypass the hook script for some reason), you can also use the --bypass-hooks option. However, if you decide to use this option, be very careful. You may be bypassing such things as email notifications of the change, or backup systems that keep track of revision properties.

How can I do an in-place 'import' (i.e. add a tree to Subversion such that the original data becomes a working copy directly)?

Suppose, for example, that you wanted to put some of /etc under version control inside your repository:

# svn mkdir file:///root/svn-repository/etc \

-m "Make a directory in the repository to correspond to /etc"

# cd /etc

# svn checkout file:///root/svn-repository/etc ./

# svn add apache samba alsa X11

# svn commit -m "Initial version of my config files"

This takes advantage of a not-immediately-obvious feature of svn checkout: you can check out a directory from the repository directly into an existing directory. Here, we first make a new empty directory in the repository, and then check it out into /etc, transforming /etc into a working copy. Once that is done, you can use normal svn add commands to select files and subtrees to add to the repository.

If the entire contents of the directory shall be imported, rather than a subset of contents, this shorter sequence of commands can be used to perform the import and then transform the directory into a Subversion working copy:

# cd /etc

# svn import file:///root/svn-repository/etc

# svn checkout --force file:///root/svn-repository/etc .

There is an issue filed for enhancing svn import to be able to convert the imported tree to a working copy automatically; see issue 1328

What is this "dump/load cycle" people sometimes talk about when upgrading a Subversion server?

Subversion's repository database schema has changed occasionally during development. Old repositories, created with a pre-1.0 development version of Subversion, may require the following operation when upgrading. If a schema change happens between Subversion releases X and Y, then repository administrators upgrading to Y must do the following:

1. Shut down svnserve, Apache, and anything else that might be accessing the repository.

2. svnadmin dump /path/to/repository > dumpfile.txt , using version X of svnadmin.

3. mv /path/to/repository /path/to/saved-old-repository

4. Now upgrade to Subversion Y (i.e., build and install Y, replacing X).

5. svnadmin create /path/to/repository, using version Y of svnadmin.

6. svnadmin load /path/to/repository < dumpfile.txt , again using version Y of svnadmin.

7. Copy over hook scripts, etc, from the old repository to the new one.

8. Restart svnserve, Apache, etc.

See this section of the Subversion book for more details on dumping and loading.

Note: Most upgrades of Subversion do not involve a dump and load. When one is required, the release announcement and the CHANGES file for the new version will carry prominent notices about it. If you don't see such a notice, then there has been no schema change, and no dump/load is necessary.

I can't use tags to merge changes from a branch into the trunk like I used to with CVS, can I?

As shown below it is possible to merge from a branch to the trunk without remembering one revision number. Or vice versa (not shown in the example).

The example below presumes an existing repository in /home/repos in which you want to start a branch named bar containing a file named foo you are going to edit.

For the purpose of tracing branch merges, this repository has set up tags/branch_traces/ to keep tags.

# setup branch and tags

$ svn copy file:///home/repos/trunk \

file:///home/repos/branches/bar_branch \

-m "start of bar branch"

$ svn copy file:///home/repos/branches/bar_branch \

file:///home/repos/tags/branch_traces/bar_last_merge \

-m "start"

# checkout branch working copy

$ svn checkout file:///home/repos/branches/bar_branch wc

$ cd wc

# edit foo.txt file and commit

$ echo "some text" >>foo.txt

$ svn commit -m "edited foo"

# switch to trunk and merge changes from branch

$ svn switch file:///home/repos/trunk

$ svn merge file:///home/repos/tags/branch_traces/bar_last_merge \

file:///home/repos/branches/bar_branch

# Now check the file content of 'foo.txt', it should contain the changes.

# commit the merge

$ svn commit -m "Merge change X from bar_branch."

# finally, update the trace branch to reflect the new state of things

$ svn delete -m "Remove old trace branch in preparation for refresh." \

file:///home/repos/tags/branch_traces/bar_last_merge

$ svn copy file:///home/repos/branches/bar_branch \

file:///home/repos/tags/branch_traces/bar_last_merge \

-m "Reflect merge of change X."

How do I check out a single file?

Subversion does not support checkout of a single file, it only supports checkout of directory structures.

However, you can use 'svn export' to export a single file. This will retrieve the file's contents, it just won't create a versioned working copy.

How do I detect adds, deletes, copies and renames in a working copy after they've already happened?

You don't. It's a bad idea to try.

The basic design of the working copy has two rules: (1) edit files as you please, and (2) use a Subversion client to make any tree-changes (add, delete, move, copy). If these rules are followed, the client can sucessfully manage the working copy. If renames or other rearrangements happen outside of Subversion, then the UI has been violated and the working copy might be broken. The client cannot guess what happened.

People sometimes run into this problem because they want to make version control "transparent". They trick users into using a working copy, then have a script run later that tries to guess what happened and run appropriate client commands. Unfortunately, this technique only goes a short distance. 'svn status' will show missing items and unversioned items, which the script can then automatically 'svn rm' or 'svn add'. But if a move or copy has happened, you're out of luck. Even if the script has a foolproof way of detecting these things, 'svn mv' and 'svn cp' can't operate after the action has already occurred.

In summary: a working copy is wholly under Subversion's control, and Subversion wasn't designed to be transparent. If you're looking for transparency, try setting up an apache server and using the "SVNAutoversioning" feature described in appendix C of the book. This will allow users to mount the repository as a network disk, and any changes made to the volume cause automatic commits on the server.

How can I make svn diff show me just the names of the changed files, not their contents?

svn diff doesn't have an option to do this, but

· If you only are interested in the diffs between, say, revision 10 and the revision just before it,

svn log -vq -r10

does exactly what you want;

· otherwise, if you're using Unix, this works for any range of revisions:

svn log -vq -r123:456 | egrep '^ {3}[ADMR] ' | cut -c6- | sort | uniq

Version 1.4 of the svn diff command will have a "--summarize" option.

How can I use wildcards or globbing to move many files at once?

You want to do something like

svn mv svn://server/trunk/stuff/* svn://server/trunk/some-other-dir

but it fails with

svn: Path 'svn://server/trunk/stuff/*' does not exist in revision 123

... or some other inscrutable error message.

Subversion doesn't expand wildcards like "*" in URL arguments. (Technically speaking, Subversion does not expand wildcards in local paths either, but on most operating systems the shell expands wildcards in local paths in the command line before passing the resulting list to Subversion.)

You have to generate the list of source URLs yourself. You could do it like this (in Bash):

s=svn://server/trunk/stuff

items=$(svn ls "$s")

urls=$(for item in $items; do echo $s/$item; done)

svn mv $urls svn://server/trunk/some-other-dir -m "Moved all at once"

In Subversion v1.4 and earlier, Subversion did not allow you to "cp" and "mv" multiple paths or URLs in one command. You have to issue multiple commands. If you happen to have a working copy that contains all the source files as well as the destination directory, then you can exploit your shell's wildcard feature to do the move, like this (for Bash):

for i in stuff/*; do svn mv $i some-other-dir; done

svn ci -m "moved all the stuff into some other dir"

In any case, you can always accumulate a list of the names of the source files, and then run "svn mv" on each item in that list, like this:

s=svn://server/trunk/stuff

svn ls "$s" | \

while read f

do svn mv "$s/$f" svn://server/trunk/some-other-dir -m "Moved just one file"

done

Note, however, that this will generate one commit per source file; that's in contrast to the above method (using a working copy) which generates just one commit total.

There is a program called "svnmucc" (previously "mucc"), whose source is distributed with Subversion, that enables you to combine multiple commands into one commit. See the Tools and Contrib page.

Lokesh Kumar Bandi

Thursday, 2 June 2016

Subversion - General Questions

0 Comments:

Post a Comment

About Me

Previous Posts