Pg:data
Contents
Data Management
This is a "crowd-sourced" page for tips and best practice in (thesis) data management techniques. See here for a typical disaster no-one would ever wish to experience. Please feel free to contribute by logging in and adding to the items below.
Laptop Backups
Laptops are the most ubiquitous hardware for assembling a thesis or research article. Most (Mac Airbook excepted) contain moving parts which wear out, most obviously the hard drive. Unfortunately, the HD is also a smaller version of the desktop form factor, and put simply, it wears out faster (it can reach 70-80° inside a laptop casing). So if you are reading this now, ask yourself when you last made a full backup of its contents, or when the last incremental backup was made. There are various ways of achieving a backup.
- Burn a DVD. If you are writing a thesis, do this daily! However, this cannot backup the entire laptop for e.g. restoration to a new unit.
- Attach an external hard drive (via USB, Firewire, or even Thunderbolt cable) and run backup software to back up hourly. On Macs for example, this could be via the system TimeMachine option. This allows the entire laptop to be backed up, operating system and all, so that a full restore can get you back in action (on a new laptop if necessary) quite quickly. However you efforts will be in vain if you carry the external drive around in your backpack complete with your laptop and leave the whole lot in the pub.
- Mac users (Lion ) will shortly have the option of syncing up to 5 Gbyte of data onto iCloud (+ $20 pa for each additional 10 Gbyte). There are plenty of cloud options for Windows users. Dropbox is a convenient cloud service that works for PC, Mac, Linux, iPad, iPhone, Android and Blackberry. It backups one-month work and everything can be undone or undeleted. And most importantly, it is free.
- Or you could even use your H drive, which is backed up nightly and available worldwide. Many groups also buy additional central storage space. Be aware however that if you do not have access to a fast network connection, your H: drive will not be available and (most importantly) operationally unobtrusive.
- ICT offer a laptop backup service (at cost of £15 pa). This is intended to back data up, not the entire machine.
- Run a Laptop health check periodically. At least, check your hard drive using a SMART utility (this is the one I use for a Mac laptop). This may give you enough warning to get all your data off before the drive fails (or it may not of course).
- Keep the HD (and laptop) cool by running the fans faster than normal. I use smcFanControl which has fast fans if you are using a power adaptor and slower (but still faster than default) for battery operation. I have done this for five years on one laptop and thus far its worked in the sense that the original HD is still running and showing no SMART errors.
Bibliographic Managers
- Mendeley for managing that (possibly vast) collection of reprints acquired over years.
Citation Managers
EndNote.
Handling large documents in Word
It would appear that common wisdom decrees that the largest document you should create using default procedures on Word should be no longer than ~30 pages. A thesis therefore requires special treatment.
Data archives for deposition with theses
Currently, submitting a thesis requires only the Word version of the thesis. Any useful data which may have future use has to be handled separately. There are many possible solutions
- Digital data repositories. This is currently used by computational researchers using the Gaussian program. Each calculation is in effect assigned a DOI, and quoting the DOI in the thesis is all that is required to make that data available.