Pg:data
Data Management
This is a "crowd-sourced" page for tips and best practice in (thesis) data management techniques. See here for a typical disaster no-one would ever wish to experience. Please feel free to contribute by logging in and adding to the items below.
Hard drives
There are two kinds of hard drive: hard drives that have died and hard drives that are about to die.
All hard drives will fail and it is highly likely that you will lose all the data on the drive when it does. It is sometimes possible to recover some data from a failed drive, but it is very ($1000s) expensive. Consequently, backups are essential. If your data is on a single drive, it might as well not exist.
Laptop Backups
Laptops are the most ubiquitous hardware for assembling a thesis or research article. Hard drives contain moving parts which wear out and fail. Solid state drives do not have any moving parts, but it is unclear as to whether they are more reliable than hard drives (see here for a detailed discussion). Unfortunately, the HD is also a smaller version of the desktop form factor, and put simply, it wears out faster (it can reach 70-80° inside a laptop casing). So if you are reading this now, ask yourself when you last made a full backup of its contents, or when the last incremental backup was made. If your data hasn't been backed up, it effectively doesn't exist. Fortunately, there are various ways of making a backup:
- The easiest option is to use an external drive and software to make automatic backups, for example every hour. On a Mac, this can be easily done using Time Machine. If your computer dies, you can get a new machine, plug in your Time Machine drive and restore everything back to the previous state within a few hours. However your efforts will be in vain if you carry the external drive around in your backpack complete with your laptop and leave the whole lot in the pub.
- For Macs, an excellent idea is to use a drive cloning tool, such as SuperDuper! in addition to Time Machine. SuperDuper will create an exact copy of your hard drive. If your hard drive fails, you can plug in your clone and then boot your computer from the clone and be up and running within minutes of a failure (handy if you have a deadline).
- As a last resort, you can burn your crucial files to a DVD or CD. The downside to this method is that it's slow and you have to remember to do it.
- Mac users (Lion ) will shortly have the option of syncing up to 5 Gbyte of data onto iCloud (+ $20 pa for each additional 10 Gbyte). There are plenty of cloud options for Windows users. Dropbox is a convenient cloud service that works for PC, Mac, Linux, iPad, iPhone, Android and Blackberry. It backups one-month work and everything can be undone or undeleted. And most importantly, it is free.
- Or you could even use your H drive, which is backed up nightly and available worldwide. Many groups also buy additional central storage space. Be aware however that if you do not have access to a fast network connection, your H: drive will not be available and (most importantly) operationally unobtrusive.
- ICT offer a laptop backup service (at cost of £15 pa). This is intended to back data up, not the entire machine.
- Run a Laptop health check periodically. At least, check your hard drive using a SMART utility (this is the one I use for a Mac laptop). This may give you enough warning to get all your data off before the drive fails (or it may not of course).
- Keep the HD (and laptop) cool by running the fans faster than normal. I use smcFanControl which has fast fans if you are using a power adaptor and slower (but still faster than default) for battery operation. I have done this for five years on one laptop and thus far its worked in the sense that the original HD is still running and showing no SMART errors.
Laptop Encryption
If the documents on your laptop are even remotely likely to be valuable (in terms of IPR etc), then you should take measures to ensure that in the event of a loss of the laptop, the data is protected. This would be done by encryption of the hard drive on the device. This is achieved using BitLock (Windows 7) or FileVault (MacOS). These processes generate unlocking keys which need to be stored in a recoverable manner. Systems for depositing this key safely are being developed. Watch this space.
USB drive encryption
Bibliographic Managers
- Mendeley for managing that (possibly vast) collection of reprints acquired over years.
Citation Managers
EndNote.
Handling large documents in Word
It would appear that common wisdom decrees that the largest document you should create using default procedures on Word should be no longer than ~30 pages. A thesis therefore requires special treatment.
Data archives for deposition with theses
Currently, submitting a thesis requires only the Word version of the thesis. Any useful data which may have future use has to be handled separately. There are many possible solutions
- Digital data repositories. This is currently used by computational researchers using the Gaussian program. Each calculation is in effect assigned a DOI, and quoting the DOI in the thesis is all that is required to make that data available.