Rdm:data

From ChemWiki
Revision as of 09:38, 9 February 2017 by Rzepa (Talk | contribs) (The Web-based Mpublish procedure for NMR Spectra)

Jump to: navigation, search

Research data management system

This is a lightweight digital repository for data based on the concepts of collections of filesets. Both the collection and the fileset are assigned a DOI by the DataCite organisation which can be quoted in articles.

There are two rather different approaches to such RDM:

  1. To use a simple web based interface: https://data.hpc.imperial.ac.uk/
  2. To use command-line scripts based on Python, which can themselves be used to create alternative graphical user interfaces if desired.

Preparing the data

This section will contain notes about the kinds of data that can be deposited/published. If there is a rule of thumb, you should keep each individual datafile small and also consider how YOU might find access to that file useful in the future. This means do NOT convert the data to eg a PDF file (where the main purpose is printability and human readability but not reuse of the data). Examples of chemical data are welcomed here.

  1. The deposition collection DOI:10.14469/hpc/200 (shortform DOI:bbnw ) contains one dataset (DOI:10.14469/hpc/202 ) and two further associated DOIs, one to a published article and one to a calculation held on another repository.
    1. The dataset DOI:10.14469/hpc/202 (shortform DOI:bbnx ) contains three files; a ChemDraw file which was use by the system to automatically generate inChI identifiers for the system shown in the entry as metadata and two data files containing the input and outputs of the KINISOT program which is itself referenced with a DOI.

Please add your own examples here.

The Web-based deposition tool

This requires a standard web browser and hence can be used on desktop, laptop and tablet devices.

  1. overview
    If you are staff, you will have an ORCID already assigned to you. You need to have its password handy. If you do not have an ORCID, go to http://orcid.org and sign up for one.
  2. log into https://data.hpc.imperial.ac.uk/ using your College account
  3. Next is a one-off operation to associate your ORCID with the data repository. Follow the prompts, and at the ORCID site, allow https://data.hpc.imperial.ac.uk/ to authenticate using these credentials.
  4. If you are a research group director, you may wish to create one or more project collections.
    • Go to Add collection and add a title and brief description. These two properties will be used as meta-data and eventually sent to DataCite for use in their data search interfaces. If you have an existing collection (perhaps a superset of the group's work) add the new collection as a member. Click submit to register the collection.
  5. Now go to browse in the lhs taskbar and click on the DOI of the just created collection.
    • This provides you with an access code that can be used by others to access the collection. The collection is embargoed (ie private) except to those who you send the access code.
    • You can also chose to invite other group members who have already completed the ORCID registration steps described above. To do this, click edit and in the Collaborators list, add whoever you wish to the collection.
    • You can also if you wish now also add associated DOIs. This might be previous publications arising from the project and can be added to at a later stage as desired.
    • Back on the browse page you now have a summary of your entries. There should now be one collection, to which you (and others) can now start to populate with datasets.
  6. collection
    If you are a research student, your group may now have collections already created by another member (supervisor). You can either follow the procedure to create a collection as above, or you can start to add data to an existing collection.
    • In the deposit data item in the lhs taskbar, add a title and description. Using choose files select the files from your local hard drive or Cloud storage (Box). You can select multiple files for upload, but all will inherit the same title and description. In the Member of dropdown, select either a collection you have previously created or one that your supervisor has invited you into.
      • Data3.jpg
        If one of the files in the set uploaded contains chemical connectivity information (e.g. a Chemdraw .cdx file), this will be used to generate metadata to be associated with the fileset. For this reason, you should ideally create a separate fileset for each distinctly different molecule.
  7. Back in Browse you should see the recently uploaded fileset, with the DOI assigned to it.
    • Click on the DOI of any fileset and you should see the files listed and their descriptions.
      • If you click on edit you can add an associated DOI to this fileset (for example an article about to be published on the topic). This can be added at a later stage if it is not yet known.
  8. To summarise, you now have a collection and its own DOI, and within that collection you have one or more filesets, with each fileset also having its own DOI. This is all embargoed until the original creator of the collection releases the embargo. At this stage, all the collected metadata is released and sent to DataCite, and shortly thereafter it becomes accessible using the standard http://doi.org/DOI invocation.
  9. You can before the embargo is released cite both the collection DOI and the individual dataset DOIs in any articles that make use of the data.
  10. A recent example illustrating some of the above aspects, with the embargo released: DOI:10.14469/hpc/200 .

The Web-based Mpublish procedure for NMR Spectra

The instructions in this section are available as a Pdf icon.jpg printable documentInfo circle.png

The command line deposition tool

The command line python script should be downloaded and run on the user's computer.

  • First, create a collection. This could be used for depositing data associated with a specific project or sub-project:

publish.py --make-collection --title "Collection title" --description "Collection description" or publish.py --make-collection --title "Collection title" --description=@filename for long descriptions contained in a pre-prepared file. The script will ask for your college username and password, and then return a DOI for the collection unless you also pass these in as per: publish.py --make-collection --title "Collection title" --description "Collection description" --username --password

  • To create a files associated with a collection:

publish.py --title "dataset title" --description "dataset description" or @filename --collection <collection DOI> or "existing collection title" --file filename1 "file 1 description" --file filename2 "flle 2 description" Repeat the --file flag as many times as desired.

shortDOIs

The DOI returned by the scripts above can be shortened using http://shortdoi.org/

Search Queries

Examples of search queries that make use of the metadata collected during deposition are shown here.

  1. http://search.datacite.org/ui?q=ORCID:0000-0002-8635-8390+publicationYear:[2015+TO+2016]
  2. http://search.datacite.org/ui?q=has_media:true&fq=prefix:10.14469
  3. http://search.datacite.org/ui?q=ORCID:*+prefix:10.14469
  4. http://search.datacite.org/ui?q=InChIKey=LQPOSWKBQVCBKS-PGMHMLKASA-N
  5. http://search.datacite.org/ui?&q=alternateIdentifier:smiles\:*.*+alternateIdentifier:NCI\:*
  6. http://search.datacite.org/ui?q=ORCID:*+doi:10.14469\/CH\/*
  7. http://search.datacite.org/ui?q=has_media:true&fq=prefix:10.14469
  8. http://search.datacite.org/ui?q=alternateIdentifier:InChIKey\:*

Bug reporting and Suggestions for future enhancements

Please log all queries via https://github.com/ICHPC/hpc-repo/issues (you will need to create a Github account to do this)