Jump to content

Resgrp:comp-photo-version control

From ChemWiki

Introduction

This document will teach you the basics of using mercurial, a distributed version control system (DVCS). It is particularly directed towards using mercurial with the development version of Gaussian, so the examples will use Gaussian. For more information on mercurial you can use the built-in help system hg help, look at the official website: http://mercurial.selenic.com/wiki/Mercurial, or check out the official O'Reilly book, the text of which is freely available at: http://hgbook.red-bean.com/.

A version control system can sometimes seem an onerous imposition unless the user understands how it is going to help them in their work, so I'll try to briefly explain. Version control is about tracking changes to data. If one has a collection of files, say a distribution of Gaussian, then one is interested in changes that have been made to those files, whether from a new distribution from Gaussian or our own modifications. We are particularly interested in cases where overlapping changes have been made. These might be updates and bug fixes from Gaussian conflicting with our own modifications to a particular link, or it might be different researchers in a group working on the same piece of code independently. A version control system will not only detect such cases but provide automated and safe methods for merging conflicting changes.

Central to the idea of a VCS is the concept of a revision or changeset. This is like a snapshot of some set of files and directories at some particular instant in their history. A repository is where a VCS stores revisions. So, using Gaussian as an example, some revisions in a particular repository might be the Gaussian source tree corresponding to Gaussian development versions H01, H08, H10, H11, etc. The data that are actually kept inside the repository are the differences, or deltas, between the revisions. This saves space compared to storing all the revisions.

Short version

I've made a summary version of this document here. I've also made an example workflow. Also here is how to import a new version of the gdv.

Setting up your environment (.hgrc)

There are some one-off things you must do. Firstly, edit your login scripts (.bash_profile or similar) to include the command:

module load mercurial

(If you're not on an IC HPC machine mercurial is available anywhere that runs Python. It is generally available in distribution package systems, if not you can get it from http://mercurial.selenic.com/wiki/Mercurial). Note that the Redhat provided mercurial on RHEL 5 and 6 is quite old and does not, at time of writing, support the repositories I have created.

Next, create a file called .hgrc in your home directory and insert the following lines:

[ui]
username = Your Name <your@email.address>

[extensions]
graphlog=

(so for example, my .hgrc contains this line: username = Simon Clifford <simon.j.clifford@gmail.com>). Mercurial uses this username field to record who has changed what. (If you're not on an IC HPC machine your version of mercurial may not have the graphlog extension. If it doesn't, don't worry, it's merely a tool for visualising changesets).

Checking out a repository (hg clone, hg log)

Now let's get started and check out a repository. Once you have loaded the mercurial module you will be able to type:

tmp$ hg clone /home/gaussian-devel/example-gaussian-repo testrepo
tmp$

This will create a copy of the repository at /home/gaussian-devel/example-gaussian-repo and place it in a directory inside the current directory called testrepo. You can give a full path as the second argument, or, you can leave it off altogether in which case mercurial will clone the repository into a directory named example-gaussian-repo. A note here for Imperial HPC users. Copying a repository involves quite a bit of filesystem activity. I have found that on cx1 this can be quite slow on the /home filesystems. You may wish to try these examples in the /tmp filesystem which is a lot faster. Be aware, however, that this raises two potentially serious problems. The first is that /tmp is globally readable so before you clone you must make sure that your umask is set to 0077. The second, of course, is that /tmp is periodically wiped. If you plan on working in /tmp there are simple ways of transferring information between repositories (hg push and hg pull) that I will cover later.

Now change into the testrepo directory. Inside you will see three scripts and a gdv directory; also some hidden files and a hidden directory. Inside the gdv directory you will see the Gaussian source in the new layout. The hidden .hg directory inside the testrepo directory is the repository proper; you will almost never need to do anything in here. The rest of the files and directories are called the working directory; this is where you will do your work.

The first thing you might want to do is check the history of this repository to see past revisions that have been checked into it. You do this with the hg log command.

testrepo$ hg log | head -20
changeset:   23:1de40efd1c4b
tag:         tip
user:        Simon Clifford <simon.j.clifford@gmail.com>
date:        Wed Jan 18 14:43:13 2012 +0000
summary:     Added tag h13 for changeset 6c81eb8dcbab

changeset:   22:6c81eb8dcbab
tag:         h13
parent:      21:5b6ac3665a29
parent:      11:a0c273aeeb2b
user:        Simon Clifford <simon.j.clifford@gmail.com>
date:        Wed Jan 18 14:43:08 2012 +0000
summary:     Gaussian devel version H13 with our makefiles

changeset:   21:5b6ac3665a29
user:        Simon Clifford <simon.j.clifford@gmail.com>
date:        Wed Jan 18 14:41:40 2012 +0000
summary:     Added tag h12p for changeset 0ef14d7dff56
...
changeset:   1:515a93bfc3b5
branch:      raw
user:        Simon Clifford <simon.j.clifford@gmail.com>
date:        Sun Jan 15 11:39:10 2012 +0000
summary:     Added tag h01-raw for changeset 073d6aa63ea7

changeset:   0:073d6aa63ea7
branch:      raw
tag:         h01-raw
user:        Simon Clifford <simon.j.clifford@gmail.com>
date:        Sun Jan 15 11:39:09 2012 +0000
summary:     Output from old-to-new.sh on version H01

You will see that there are various fields that may appear for each changeset. The changeset field shows two numbers separated by a colon. The second, longer, hexadecimal number is the unique identifier. A particular hex identifier will always refer to the same changeset, even in different copies of the repository. The first number is a local identifier. These local IDs refer to particular changesets in one copy of the repository; if these changesets are present in another repository they may have a different local identifier. You can use either identifier as an argument to hg commands. The user and date fields show who committed the change and when. The summary field shows the first line of the log entry for that revision. This means it is important to try to make the first line of your log entries a useful summary of what you have done!

To see the entry for a particular revision use the -r flag with a revision number, either short or long. To see more information, including the full log entry and a complete list of all files involved in the change, use the -v flag, e.g.:

testrepo$ hg log -r 2 -v | less
changeset:   2:9f69ed4616ad
branch:      raw
tag:         h08-raw
user:        Simon Clifford <simon.j.clifford@gmail.com>
date:        Sun Jan 15 11:54:50 2012 +0000
files:       gdv/archlib/arcvib.F gdv/archlib/brcpyw.F gdv/archlib/letset.F gdv/arctmp.F gdv/bsd/GauDiff_Compare.pm gdv/bsd/G
....
at.F gdv/utilnz/xpndnb.F gdv/utilnz/zerock.F gdv/utilnz/zerod.F gdv/wrappers/ggeev.F gdv/wrappers/lappar.F gdv/wrappers/xgetrf.F  gdv/wrappers/xgetrs.F gdv/wrappers/ygeru.F gdv/wrappers/ytrsv.F
description:
Import of gdv H08.  See below for details.

Result of these commands on the H01 raw rev:
rm -fr gdv
cp -pr h08/new-style-gdv .
hg add gdv/bsd/*.a
hg addremove -s 50

h08/new-style is from old-to-new.sh on freshly untarred h08.tgz

As the log entry explains, this particular revision is the change from the H01 version of Gaussian to the H08 version, so there are a lot of changed files. The description entry is intended to provide some human readable explanation of the changes. When you start committing data to the repository you should aim to at least make your log entries understandable.

As with all the commands that I will mention you can get more information by typing hg help <command>.

Committing changes (hg summary, hg status, hg diff)

Now let's make some alterations to this repository. This is quite safe; we can't break the original repository because our copy is an entirely separate clone. In fact, this is a very common working practice with mercurial. As long as the filesystem isn't slow cloning a repository can be quite quick. Indeed, cloning a repository into a destination on the same filesystem as the source makes use of hard linking which is very fast and saves space. Therefore, it is very common to clone a repository, make some alterations to it, and then decide whether to push them back to the original repository or just delete the new clone.

First, we want to know where it is we are starting from. To get a quick summary you use the hg summary command. This should show something like this:

testrepo$ hg summary
parent: 23:1de40efd1c4b tip
Added tag h13 for changeset 6c81eb8dcbab
branch: default
commit: (clean)
update: (current)

This shows, on the parent line, the revision that has been checked out into the working directory. Below that is the summary line from the log entry for that revision. The commit line shows if any files have been modified since the check out. The update line shows if there are any applicable newer revisions in the repository. In this example the output shows us that we checked out the revision with the local identifier 23, and unique identifier of 1de40efd1c4b, that we have modified nothing and that there are no appropriate newer revisions. This checkout occurred automatically as a part of the hg clone command. You can clone without checking anything out into the working directory (say to make a backup copy of the repository) by passing the -U flag, or you can check out a particular revision with the -u REV flag. See hg help clone for how mercurial chooses what to check out by default.

Let's say we're adding a feature to link 510. If you have something in mind go ahead and do it. For this example I have just added a few lines to the file a1tdep.F, and will assume that your current working directory is gdv/l510.

As you work you may be curious to know what it is you have changed. The command to show this is hg status:

testrepo$ hg status
M gdv/l510/a1tdep.F

The output shows a list of files that have been altered in some way. Here the capital M shows that the file a1tdep.F has been modified. By default hg status does not report on unchanged files. If we wish to see how a1tdep.F has been modified we can use the hg diff command:

testrepo$ hg diff a1tdep.F
diff -r 1de40efd1c4b gdv/l510/a1tdep.F
--- a/gdv/l510/a1tdep.F Wed Jan 18 14:43:13 2012 +0000
+++ b/gdv/l510/a1tdep.F Wed Jan 25 11:07:55 2012 +0000
@@ -26,6 +26,9 @@
 C
 C     **OPTIONS FOR TDEP CODE
 C
+c
+c     Add super new feature
+c
       iTDep=813
       iTrans=822
       jTDep=iTDep

This shows the output in a unified different format, refer to the man page for diff for more information.

Adding, copying, renaming, and deleting file (hg add, hg forget, hg mv, hg rm, hg cp)

If you have created a new file then you must let mercurial know about its existence with the hg add command:

testrepo$ echo "a new subroutine" > foo.F
testrepo$ hg status
M gdv/l510/a1tdep.F
? gdv/l510/foo.F
testrepo$ hg add foo.F
testrepo$ hg status
M gdv/l510/a1tdep.F
A gdv/l510/foo.F

Here, I create a new file called foo.F. hg status shows the file with a ?, to indicate that it is unknown to the repository. I then run hg add foo.F, after which hg status shows the file with an A. This indicates that the file is scheduled to be added at the next commit. If you change your mind about the file before the next commit you can use hg forget to undo the add.

If you are renaming a file, including the situation where you move the file from one directory to another (e.g. from l510 to utilam) then you may use the hg mv command. This will actually perform the move just like the normal mv command. If you've already moved the file you can give hg mv the -A flag to record the rename after the fact. Similarly, the hg rm command removes files from the working directory and records this fact in the repository, and hg cp copies a file.

It is important to realise that these commands act on the working directory immediately but only affect the repository when they are committed. Also, they do not affect the repository's previous history, so removing a file and then committing that change does not affect that file's existence in previous revisions in the repository. However, hg mv and hg cp create a relationship between the source and destination files. This becomes useful when merging in changes from somewhere else. So for example, above I have made changes to l510/a1tdep.F. Let's say that someone else has decided to make this into a utility routine and has done the command hg mv l510/a1tdep.F utilam/a1tdep.F. If I choose to merge my changes with this other person's, mercurial will correctly apply the changes and move the file. If instead of hg mv they had used hg cp then my changes would be applied to both files. This means that you should only use hg cp when it is appropriate that changes that are recorded before the copy are applied to both files. Note that this does not mean that changes to the source file are forever applied to the destination file; this will only occur when merging a revision that does not contain a copy with a revision that does. In general, you can expect mercurial to do the right thing.

Committing the changes (hg commit, hg tip)

The hg commit (like many hg commands, hg commit has an alias, in this case hg ci. I tend to use hg ci) command creates a new revision and records the differences between the entire working directory and the parent revision (as shown on the parent line of hg summary). This will be the same as the output from hg status. When you run the command mercurial will open a text editor. If you wish to specify which editor is used add an editor=vim (for example) line to the [ui] part of your ~/.hgrc file. The editor will start with a couple of empty lines and then some lines beginning with "HG:". These are there to give you helpful reminders of what you've changed while you write your log message and will be ignored by mercurial when you commit. If you close the editor without writing anything, or of the editor quits with an error, mercurial will abort the commit. So for our current example:

testrepo$ hg ci
... [opens vim]
HG: Enter commit message.  Lines beginning with 'HG:' are removed.
HG: Leave message empty to abort commit.
HG: --
HG: user: Simon Clifford <simon.j.clifford@gmail.com>
HG: branch 'default'
HG: added gdv/l510/foo.F
HG: changed gdv/l510/a1tdep.F
~
~

For short messages you can skip the editor step by passing the -m "Log message here" flag.

Understanding what you are doing here is essential to knowing what to write in the log and, importantly, when to commit. There are two schools of thought. When you commit you are creating a new revision in your local repository. Since it is your repository, and as I have previously mentioned, it is very simple to create new repositories, you should commit whenever you feel like it and feel free to write cryptic log messages that only you will understand. On the other hand, you are engaged in a collaborative exercise with other people: creating what will be a single version of Gaussian that contains yours and others' modifications and Gaussian's updates. Therefore, you should only commit when your code satisfies pre-agreed group requirements, and your log message should be detailed and comprehensible to anybody who reads it, even 50 years in the future. The correct approach, of course, is to do both.

Mercurial is a distributed version control system which means that each repository is the responsibility of its owner who can feel free to commit as frequently (or infrequently) as needed. Sometimes adding a feature or removing a bug might take weeks, and you might feel that there is no point taking snapshots of the code until it works. Other modifications might proceed incrementally with naturally defined stopping points between the start and the end. Committing at these points makes perfect sense: not only does it leave a history of the modification, it provides checkpoints, versions of the code that you can retreat to without having to start again. The log messages may tersely explain what has happened since the last revision (and note there is no point listing modified, added, etc, files as this information is stored in the commit anyway) or they may contain detailed essays on how a bug arose and the steps you have taken to fix it. Information in the logs is searchable and should be thought of as both an aide memoir for yourself and a journal entry for others.

When the time comes to merge your changes into other people's repositories, you might start thinking about the quality of your commit. Does the code compile? Does it run the test suite? Can it run any test? Have you committed a test that checks the code you have added or altered? The group may decide that revisions that are being shared must answer yes to some or all of these questions. You may decide that some or all of your revisions must pass these quality checks. Or, you may be satisfied with noting in the log entry what this particular revision can or can't do (such as compile or run).

It should be said though, that if you ever think "perhaps I should check in at this point", then go for it. Revisions live in the history of your repository forever (although see hg rollback), but when you transfer them to other people's repositories multiple revisions can be collapsed into one, or fixed in other ways.

Returning to our example I enter a simple message. Since hg log only shows the first line of any log entry by default it is a good idea to make this line a summary of the rest of the message. I close the editor and the commit is done.

We can see the results in the repository with an hg log command (the -l 2 flag shows the last two revisions):

testrepo$ hg log -l 2
changeset:   24:13ecff0136c2
tag:         tip
user:        Simon Clifford <simon.j.clifford@gmail.com>
date:        Wed Jan 25 17:13:00 2012 +0000
summary:     A test message.

changeset:   23:1de40efd1c4b
user:        Simon Clifford <simon.j.clifford@gmail.com>
date:        Wed Jan 18 14:43:13 2012 +0000
summary:     Added tag h13 for changeset 6c81eb8dcbab

Alternatively we can use the hg tip command to show the most recently added repository, whether by ourselves, or by pulling from another repository (see later).

Fixing mistakes (hg revert, hg rollback)

You can return one or more files, or the entire repository, to the state they were in when you last checked them out with in the hg revert command. Let's make an ill-advised alteration to our now committed change to a1tdep.F:

testrepo$ sed -i 's/super/super duper/' a1tdep.F
testrepo$ hg status
M gdv/l510/a1tdep.F
testrepo$ hg diff a1tdep.F
diff -r 13ecff0136c2 gdv/l510/a1tdep.F
--- a/gdv/l510/a1tdep.F Wed Jan 25 17:13:00 2012 +0000
+++ b/gdv/l510/a1tdep.F Wed Jan 25 17:23:42 2012 +0000
@@ -27,7 +27,7 @@
 C     **OPTIONS FOR TDEP CODE
 C
 c
-c     Add super new feature
+c     Add super duper new feature
 c
       iTDep=813
       iTrans=822
testrepo$ hg revert a1tdep.F
testrepo$ hg status
? gdv/l510/a1tdep.F.orig
testrepo$ hg diff a1tdep.F
testrepo$ rm a1tdep.*
testrepo$ hg status
! gdv/l510/a1tdep.F
testrepo$

Here I use revert to undo the modifications to a file. Notice that hg revert leaves a copy of the modified file in a1tdep.F.orig. This shows up in hg status as an unknown file ("?"). Also notice that while trying to delete the .orig file I have accidentally deleted a1tdep.F, which now shows up in hg status as a missing file ("!"). I can revert this mistake too:

testrepo$ hg revert a1tdep.F
testrepo$ hg status
testrepo$ ls a1tdep.F
a1tdep.F
testrepo$

hg revert can also be used to cancel scheduled adds, removes, copies, and renames.

If you have just committed a revision and then change your mind you may be able to undo the effect with hg rollback. Doing this for our example gives us:

testrepo$ hg rollback
repository tip rolled back to revision 23 (undo commit)
working directory now based on revision 23
testrepo$ hg status
M gdv/l510/a1tdep.F
A gdv/l510/foo.F
testrepo$

This removes the revision we just checked in from the repository but does not alter the current working directory. If we choose, we could now use hg revert to restore these to their revision 23 state. There are two important caveats with hg rollback: it can only remove the last checked in revision, and that it is usually pointless to rollback a revision that has already been pushed or pulled (see later) into somebody else's repository, as we can only affect our repository. For the purposes of this example, I will once more check in the changes I have made:

testrepo$ hg ci -m 'A test message'
testrepo$ hg tip
changeset:   24:13ecff0136c2
tag:         tip
user:        Simon Clifford <simon.j.clifford@gmail.com>
date:        Wed Jan 25 17:38:52 2012 +0000
summary:     A test message
testrepo$

Collaboration (hg incoming, hg pull, hg outgoing, hg push, hg update, hg parent)

The tools described are already quite useful, and form the basis of the very earliest revision control systems. The benefits of being able to detect what you've changed, and how you've changed it, being able to undo the changes selectively, and being able to take a snapshot of your work at any point of your choosing should be apparent. The true power, however, of a modern version control system is how it mediates different streams of changes, particularly those from other users. Note that a DVCS does not solve problems of merging and so on, but provides you, the user, with tools to solve them.

Revisions in the repository form a directed acyclic graph (DAG). Every revision apart from the first has at least one parent revision and may have zero or more child revisions. A revision with no children is called a head. Generally development takes place at a head of the repository: checking in a new revision onto a head creates and consumes a head. However, it is possible to add a child to any revision, possibly creating a new head. This is known as a branch, and may be named or unnamed. A revision with children may still be a branch head if it has no children on its branch.

There are various ways of managing branches. You can have a few repositories with many branches, named and unnamed. Or you can have many repositories with largely unbranched graphs; it's up to you. Since mercurial uses heads as the default targets for some of its operations the latter approach is probably best for beginners as it makes operations within each repository simpler.

This is probably best illustrated with an example. I'll make a copy of the original repository, make a different change, and then merge the changes. Change directory out of the testrepo repository and type:

tmp$ hg clone /home/gaussiandevel/example-gaussian-repo testmerge
updating to branch default
13427 files updated, 0 files merged, 0 files removed, 0 files unresolved
tmp$

Now we go into the new repository and make some modifications. This time I alter the files a1tdep.F and a2tdep.F, and add a file bar.F in the l510 directory:

…[alterations to a1tdep.F, a2tdep.F, and bar.F]...
testmerge$ hg add gdv/l510/bar.F
testmerge$ hg diff
diff -r 1de40efd1c4b gdv/l510/a1tdep.F
--- a/gdv/l510/a1tdep.F Wed Jan 18 14:43:13 2012 +0000
+++ b/gdv/l510/a1tdep.F Thu Jan 26 12:15:13 2012 +0000
@@ -26,6 +26,9 @@
 C
 C     **OPTIONS FOR TDEP CODE
 C
+c
+c     Some new feature
+c
       iTDep=813
       iTrans=822
       jTDep=iTDep
diff -r 1de40efd1c4b gdv/l510/a2tdep.F
--- a/gdv/l510/a2tdep.F Wed Jan 18 14:43:13 2012 +0000
+++ b/gdv/l510/a2tdep.F Thu Jan 26 12:15:13 2012 +0000
@@ -16,6 +16,9 @@
 C
 C     **DETERMINE IF WE RUN THE TIMEDEP CODE
 C
+c
+c     Super duper feature here too
+c
       LenTst=0
       CALL FILEIO(11,jTDep,LenTst,0,0)
       IF (LenTst.EQ.0.OR.iopv.NE.23)RETURN
diff -r 1de40efd1c4b gdv/l510/bar.F
--- /dev/null   Thu Jan 01 00:00:00 1970 +0000
+++ b/gdv/l510/bar.F    Thu Jan 26 12:15:13 2012 +0000
@@ -0,0 +1,1 @@
+Stuff in here
testmerge$
testmerge$ hg ci -m 'Added second new feature'
testmerge$ hg log -l 2
changeset:   24:457db23c0b1a
tag:         tip
user:        Simon Clifford <simon.j.clifford@gmail.com>
date:        Thu Jan 26 12:15:54 2012 +0000
summary:     Added second new feature

changeset:   23:1de40efd1c4b
user:        Simon Clifford <simon.j.clifford@gmail.com>
date:        Wed Jan 18 14:43:13 2012 +0000
summary:     Added tag h13 for changeset 6c81eb8dcbab
testmerge$

The situation now is that we have two repositories where there are revisions whose parent is revision 23 (23:1de40efd1c4b, in fact). This sort of scenario might arise because we have been working on two different features in two different repositories, or it may be that two users have been working separately. At some point you may wish to merge the work. It's up to you when and how you do this: you may wish to merge in bug fixes quite frequently, and incorporate brand-new features much less frequently. You can of course clone another repository in which to do the merge so that if it isn't satisfactory you can just delete the repository.

First we must pull the revisions from one repository to another. We use the hg incoming and pull commands to do this:

testmerge$ hg incoming ../testrepo
comparing with ../testrepo
searching for changes
changeset:   24:13ecff0136c2
tag:         tip
user:        Simon Clifford <simon.j.clifford@gmail.com>
date:        Wed Jan 25 17:39:50 2012 +0000
summary:     A test message
testmerge$ hg pull ../testrepo
pulling from ../testrepo
searching for changes
adding changesets
adding manifests
adding file changes
added 1 changesets with 2 changes to 2 files (+1 heads)
(run 'hg heads' to see heads, 'hg merge' to merge)
testmerge$ hg log -l 3
changeset:   25:13ecff0136c2
tag:         tip
parent:      23:1de40efd1c4b
user:        Simon Clifford <simon.j.clifford@gmail.com>
date:        Wed Jan 25 17:39:50 2012 +0000
summary:     A test message

changeset:   24:457db23c0b1a
user:        Simon Clifford <simon.j.clifford@gmail.com>
date:        Thu Jan 26 12:15:54 2012 +0000
summary:     Added second new feature

changeset:   23:1de40efd1c4b
user:        Simon Clifford <simon.j.clifford@gmail.com>
date:        Wed Jan 18 14:43:13 2012 +0000
summary:     Added tag h13 for changeset 6c81eb8dcbab
testmerge$

The hg incoming command takes another repository as its argument and shows a list of revisions that are not present in the current repository. The hg pull command brings those revisions over into the current repository. The hg log command shows changeset 25 in the testmerge repository corresponds to number 24 in the original testrepo repository. Note the long ID is the same in both cases. The hg pull command reports that it has created a new head. We can see this clearly with the hg glog command (from the graphlog extension). The -r 23: flag tells glog to show revisions 23 and greater, for clarity:

testmerge$ hg glog -r 23:
o  changeset:   25:13ecff0136c2
|  tag:         tip
|  parent:      23:1de40efd1c4b
|  user:        Simon Clifford <simon.j.clifford@gmail.com>
|  date:        Wed Jan 25 17:39:50 2012 +0000
|  summary:     A test message
|
| @  changeset:   24:457db23c0b1a
|/   user:        Simon Clifford <simon.j.clifford@gmail.com>
|    date:        Thu Jan 26 12:15:54 2012 +0000
|    summary:     Added second new feature
|
o  changeset:   23:1de40efd1c4b
|  user:        Simon Clifford <simon.j.clifford@gmail.com>
|  date:        Wed Jan 18 14:43:13 2012 +0000
|  summary:     Added tag h13 for changeset 6c81eb8dcbab
|
testmerge$

The hg push command can also be used to copy revisions from one repository to another. It works much as you would expect, except that by default it does not permit the creation of new heads in the destination repository. The idea is that you tend to pull into your own repository, where you're expected to know what you're doing, while you might be pushing into someone else's repository, where creating a new head might cause confusion. hg outgoing is the corresponding analogue to the hg incoming command.

In our example we now have a situation where we have two heads. This can arise from revisions being created in different repositories and then pulled together, as shown. Alternatively you can just create new heads in one repository. A disadvantage of the single repository technique is that if you decide branch is going nowhere and elect not to merge or proceed with it, it still remains in your repository. With multiple repositories you can just delete the offending repository.

Merging two sets of changes (hg merge, hg resolve)

Let's say that I decide that these two features will work together and I want to merge the two branches. I can use the hg up (hg update) command to update the current working directory to a particular revision in the repository. If no revision is specified it will update to the current branch head. The revision updated to becomes the parent (as shown by hg summary or hg parent) of the working directory. If there are modified files in the working directory the update may attempt a merge. The revision of the working directory when you start to merge is called the base of the merge. This is important if the two revisions you are merging are on different branches, because the newly merged revision will stay on the base branch. Note that the hg pull command does not update the working directory, so in this case it will still be at revision 24. Let's update to revision 25 and then merge the changes from revision 24:

[Which revision to merge from: in general you merge changes into whatever is your current baseline. So, if you're merging the latest updates from Gaussian (from H10 to H11 for example) into your code, you update to your code and merge in the H11 revision. If you're merging in your changes into the soon to be sent to Gaussian group development version you'd start with that and merge in the revision(s) containing your changes]

testmerge$ hg up 25
3 files updated, 0 files merged, 1 files removed, 0 files unresolved
testmerge$ hg merge 24
merging gdv/l510/a1tdep.F
warning: conflicts during merge.
merging gdv/l510/a1tdep.F failed!
2 files updated, 0 files merged, 0 files removed, 1 files unresolved
use 'hg resolve' to retry unresolved file merges or 'hg update -C .' to abandon
testmerge$

The software automatically brings in the changes to foo.F, bar.F, and a2tdep.F (as hg status will show). However, in our contrived example a1tdep.F is subject to changes from both revisions. Mercurial will attempt to merge automatically, and in this case it fails. If you edit a1tdep.F you will see it contains the lines:

<<<<<<< local
c     Add super new feature
=======
c     Some new feature
>>>>>>> other

which were inserted during the failed merge. There is also an a1tdep.F.orig file created as a result of the failed merge. As the output from hg merge says you can either now resolve this failed merge or use hg update -C to undo it (the .orig file will remain).

Since we want to resolve the merge we should edit a1tdep.F so that it works. At a bare minimum we will have to remove the “<<<<<<< local”, “=======”, and “>>>>>>> other” marker lines. In a more complex situation we might have to completely rewrite this and other files. This is what I mean when I say that mercurial only provides tools to do merging. Even a completely successful merge, from mercurial's point of view, may be completely bogus code.

There are tools available to make this task easier. For example I use vimdiff, a part of the popular vim package. To enable this I have altered my ~/.hgrc file to look like this:

[ui]
merge = vimdiff
username = Simon Clifford <simon.j.clifford@gmail.com>

[extensions]
graphlog=

[merge-tools]
vimdiff.executable = vim
vimdiff.args = -d $base $local $output $other +close +close

as per http://mercurial.selenic.com/wiki/MergingWithVim. The page http://mercurial.selenic.com/wiki/MergeProgram contains instructions for other tools, including Emacs, and graphical tools. If you edit your ~/.hgrc file in this way then you now can type hg resolve --all to run your chosen tool on all unresolved files. Note that by default if your chosen tool exits without an error then hg resolve will regard that file as resolved. To cause vim to exit with an error use :cq. See hg help resolve for more details.

In the example my resolution is to have the relevant lines in a1tdep.F look like this:

c
c     Add super new feature
c     Some new feature
c

Once you have resolved all the files that mercurial thinks need fixing you should check that the final merged result makes sense. Checking that it compiles, or runs tests appropriate to both of the original revisions, for example. When you are satisfied, you should check in the merged revision:

testmerge$ hg ci -m 'Merged super and duper features'

testmerge$ hg tip

changeset:   26:857f81e59d66
tag:         tip
parent:      25:13ecff0136c2
parent:      24:457db23c0b1a
user:        Simon Clifford <simon.j.clifford@gmail.com>
date:        Thu Jan 26 13:10:52 2012 +0000
summary:     Merged super and new features


testmerge$

You can see that the new revision has two parent revisions; a merge always consumes a head. This is clear from the output of:

testmerge$ hg glog -l 4
o    changeset:   26:857f81e59d66
|\   tag:         tip
| |  parent:      25:13ecff0136c2
| |  parent:      24:457db23c0b1a
| |  user:        Simon Clifford <simon.j.clifford@gmail.com>
| |  date:        Thu Jan 26 13:10:52 2012 +0000
| |  summary:     Merged super and new features
| |
| o  changeset:   25:13ecff0136c2
| |  parent:      23:1de40efd1c4b
| |  user:        Simon Clifford <simon.j.clifford@gmail.com>
| |  date:        Wed Jan 25 17:39:50 2012 +0000
| |  summary:     A test message
| |
o |  changeset:   24:457db23c0b1a
|/   user:        Simon Clifford <simon.j.clifford@gmail.com>
|    date:        Thu Jan 26 12:15:54 2012 +0000
|    summary:     Added second new feature
|
o  changeset:   23:1de40efd1c4b
|  user:        Simon Clifford <simon.j.clifford@gmail.com>
|  date:        Wed Jan 18 14:43:13 2012 +0000
|  summary:     Added tag h13 for changeset 6c81eb8dcbab
|
testmerge$

Another example of merging here? Perhaps Lee's code merging in changes on official version.

Tags and named branches (hg tag, hg tags, hg branch, hg branches)

It is often useful to give a name to a particular revision. This might be a changeset that corresponds to a given version of the code, or it might simply be a revision that marks a particular milestone. In the example-gaussian-repo you have probably noticed there are revisions in the history with descriptions like "Added tag h13 for changeset 6c81eb8dcbab". These are revisions corresponding to particular versions of the Gaussian development version, such as H01, H10, H13, etc. You can see a list of the tags in a repository by doing:

testmerge$ hg tags
tip                               26:857f81e59d66
h13                               22:6c81eb8dcbab
h12p                              20:0ef14d7dff56
h11                               18:7daba638a830
h10                               16:e1d0af4e84d9
h08                               14:656073c38db9
h01                               12:737d1720e79a
h13-raw                           10:b039f5c274e2
h12p-raw                           8:522a8fe79d22
h11-raw                            6:a28b789a9555
h10-raw                            4:356081dab79d
h08-raw                            2:9f69ed4616ad
h01-raw                            0:073d6aa63ea7
testmerge$

(the tip tag is automatically assigned to the most recently checked in revision). You may use a tag name anywhere where you might use a revision number, so for example, to check out the changeset corresponding to the official Gaussian development release H10, you would type:

testrepo$ hg up h10
13887 files updated, 0 files merged, 0 files removed, 0 files unresolved
testrepo$

(this will look different if you have unchecked-in modifications in your working directory). You could also pass h10 to the hg clone -u flag, and so on. You tag a revision with the hg tag command. Issue it in a working directory and it will tag the revision that is the parent of the working directory. Tags become part of the repository, that is they are shared during hg pull and hg push, so choose wisely. If you'd like to have tags that are only part of the local repository, use the -l flag to hg tag.

Tags are stored in an .hgtags file in the root directory of the repository. Running hg tag alters this file and then commits the change to the repository. Sometimes you can see conflicts in this file during merges. I normally just resolve them by keeping as much information in the merged file as possible.

The hg branches command can be used to show the named branches in the repository:

testrepo$ hg branches
default                       24:13ecff0136c2
raw                           11:a0c273aeeb2b (inactive)
testrepo$

Here I have two named branches, raw and default (the default branch is, er, the default). If you examine the full output of hg glog, which is quite long and I won't reproduce it here, you'll see, starting at the bottom, revisions 0 to 11 are all in the raw branch. I have created and named this branch to contain the official Gaussian development code as processed by the old-to-new.sh script, without any of our makefiles or other alterations. The default branch branches off from revision 1 and starts at revision 12. Its descendants, revisions 12 to 23, are also in the default branch. These were created by starting at revision 1, adding our makefiles and modifications to the build system, renaming this branch to the default branch, and then committing revision 12. I tagged revision 12 as "H01", which created revision 13. I then merged revision 3, fixed the conflicts, and committed revision 14, and so on. A more detailed guide on how to import a new official version of gdv exists in LINKY.

You can use branch names just like tags wherever mercurial expects a revision identifier. If you use a branch name mercurial will attempt to give you the revision that is most appropriate, this will usually be the newest head on that branch.

To create a new branch you type hg branch <branchname>. This will take effect when you commit the working directory. As I've previously mentioned a simpler alternative to having branches, named or otherwise, in your repository is to have multiple repositories.

Ignoring files

You may be wondering how the system will work when you start compiling files. Won't all the .o, .a and .exe files start showing up in the output to hg status? The answer is that they will not, because I've told mercurial to ignore such files. This information is stored in the .hgignore file in the root of the repository. It's quite long but it starts like this:

syntax: glob
gdv.make
*.o
*.lo
*.a
*.exe
*.exel
.*.swp
*~
*.log
*.log.gz
*.chk
gdv/arc
...

Any file that matches either the shell patterns or the file names in .hgignore is ignored by mercurial. This means it does not show up to hg status or hg commit. hg add, hg rm, and so on will ignore it unless it is explicitly mentioned on the command line. So hg add gdv will add all non-ignored files in the gdv directory, while hg add gdv/* will add all the files in the gdv directory.

The .hgignore file is tracked, so please only commit changes to it if you think they will be useful to everyone.

Other commands

There are quite a few other commands available to mercurial, I will cover just a few here, if you're interested in becoming a power user refer to the documentation I have mentioned.

hg addremove

You can use this command when you have large numbers of files to add and remove. It also includes the -s flag to attempt to detect when files have been moved or copied. Using this flag does require some subtlety, as it's not helpful to mark files as moved or copied unless they have been. Also note that it is possible to inadvertently include ignored files in a mass hg add or hg addremove; check what's being added before you commit.

hg grep and hg locate

Similar to the standard UNIX grep and locate commands, but cognisant of the mercurial repository.

hg init

Creates a new mercurial repository in the current directory

hg serve

This starts a mini web server serving information about the repository. I would strongly recommend you do not use this while working on the Gaussian development version as it would be all too easy to allow any user on the system to view all of the files in your repository. I only mention it here because some of the other documentation may refer to it.