Jump to content

It:lectures-2011

From ChemWiki

Go to Introduction | Go to Workshops | Go to Coursework | Go to Assignment | List of Software | List of Searches

Chemical Information Technology 2015-2016

IT for lecture courses, tutorials and set projects

Two pages available for general IT skills; not fully part of this course, but important for reference

Self Study - Managing your location

It is important to manage your location while accessing scientific journals as access to almost every electronic resource is controlled by your internet (IP) address. You will not need to worry about this when you are connecting to your resources on campus (which by definition also includes South Kensington Halls of residence). If you are outside this catchment area however, some IT services will not work unless you enter the campus virtually by switching on something called a VPN, or Virtual Private Network. It allows you to access resources through the campus network from your own internet connection.

Self Study - Accessing lecture notes and submission of assignments

Lecture materials online

Blackboard VLE

Course notes may be found either using the College Virtual Learning Environment, called Blackboard or using the Chemistry Wiki (which you are using now). Most lecture notes will be available as a Portable Document Format (a PDF document) or, less commonly, a Powerpoint slide show. These may be downloaded to your computer and stored in a document library (e.g., EndNote or Mendeley), printed or viewed on the computer. You are encouraged to annotate lecture notes during the lecture and to review them afterwards, rather than simply filing them away until just before your exams!

Lecture notes are not intended to be exhaustive in their content and you are encouraged to support them with further information from textbooks and other sources as appropriate.

Lecture notes will be placed on Blackboard by the lecturers for your convenience, however the PDF document alone will not necessarily contain all the content required for understanding; in the event of missing a lecture you should go over the content with the aid of textbooks to ensure you understand the concepts.

Online submission of assignments

Almost all assignments are now submitted electronically through Blackboard. Submissions will be then be checked by Turn-It-In, a software package designed to 'read' assignments and, through comparison with a huge number of resources, will assign it an "originality score". This is also known as "Plagiarism Detection Software". Instructions for the submission of assignments can be found within each module on Blackboard. Always ask in advance if you are unsure how any course submission works, or if you are having problems.

If you know you will not be able to submit work on time, contact your tutor or course leader as soon as possible - the sooner we know if you are having problems the more we can do to help you find a solution. If you miss a deadline for whatever reason, you must contact the course leader as soon as possible or you will not receive a mark for that assignment.

Plagiarism

The College has a simple attitude towards plagiarism; "Do not plagiarise" .

The College's official statement on plagiarism is thus:

"Plagiarism is interpreted by the College as the act of presenting the work of others as one's own work, without acknowledgement. Plagiarism is considered as academically fraudulent, and an offence against College discipline. The College considers plagiarism to be a major offence, and subject to the disciplinary procedures of the College.
"Plagiarism can arise from deliberate actions and also through careless thinking and/or methodology. The offence lies not in the attitude or intention of the perpetrator, but in the action and in its consequences."[1]

The Library has extensive information on plagiarism and how to avoid it here: you are encouraged to read and learn about plagiarism here.


  1. Humanities Student Handbook, Imperial College London, 2010

Workshop material - Accessing scientific journals and Bibliographic Searches

The world's scientific and chemical data, information and knowledge resides in the following types of reliable resources:

  1. Course Notes, written by experts in their field, which will often themselves cite:
  2. Primary Peer-reviewed scientific journals (1665 - onwards), as articles with identified authors and hence provenance;
  3. Secondary Peer-reviewed scientific journals, as review articles/books with identified authors
  4. Tertiary sources such as abstracts gleaned from the above two sources, collaboratively authored (peer review by a different name) wikipedia-like entries, some blogs and curated/edited database collections.

Golden rule: Always cite your sources, and if possible always cite a primary source.

Silver rule: If you cite something, it means you have read it - do not cite something you haven't read first hand. (If "Smith et al" tell you that "Jones et al. tell us that oranges actually smell of peas", do not cite Jones yourself unless you have read that paper. If you cite Smith, you should say "Smith et al. report Jones' findings that oranges smell of peas")

Bronze rule: Don't cite Wikipedia. It is far too transient to be a reliable resource.

Scientific Journals (Primary and Secondary sources)

Journals and books are identified using a formal citation within the article (or course notes). The style of this citation varies according to publication, but within chemistry it traditionally takes the form of a numeric superscript[1]. Most journals are available online and, as long as the institution has paid for access, are readily available for viewing and searching. It has become almost universal to add the citation to a digital object identifier, or DOI link which allows you to visit the journal directly[2]. Clicking the DOI link will take you directly to the journal page, presenting you with an abstract. Such links can be embedded in HTML or in PDF files. If only the DOI is given, it can be linked to by typing http://dx.doi.org/the-DOI-itself into the browser window (or it can be found here if you know the author and title). The article can then be viewed in either HTML or PDF format.

  1. HTML (Hypertext markup language) vs PDF (Portable document format)
    • PDF is the format preferred for producing printed copies, and is just starting to be deployed in new Bibliographic database systems such as Mendeley and in 3D forms[3].
    • HTML is nowadays viewed by an increasing number of publishers as the medium best suited for enhancing the journal article beyond the printable form. Many articles nowadays include rotatable molecules, and other interactive media.
  2. Journals themselves divide into those published by learned societies and by purely commercial organisations. Between them, the four below should cover perhaps 90% of the journals that you will need to access.
  3. The above represent the primary literature, and the articles there designed primarily for researchers. An excellent journal which addresses the more pedagogic aspects of chemistry is the Journal of Chemical Education (abbreviated to J. Chem. Ed.) which not only covers aspects of lectures, but also describes new and interesting laboratory experiments (some of which materialise in our own labs!).
  4. The central library has a chemistry librarian (Katharine Thompson) and many chemistry collections and a complete alphabetic list of Journals, together with an Inter-library loan (ILL) system for requesting reprints of journal and books not held on campus. A fully digital version of the ILL has recently been introduced (try this direct link here), although (unlike most digital music) this has DRM (digital rights management).

When writing a laboratory report (and in later years literature reports, essays and perhaps even your own published article), you will be expected to cite your sources, in the manner shown below.

Tertiary sources (Wikipedia and others)

Wikipedia and scientific blogs are very good sources of information, however be aware that these may be skewed by opinion, and facts may be distorted (though Wikipedia tries very hard to balance its articles for neutrality). Most chemistry sources tend to be fairly good as a first point of reference, but do not always assume it is correct; aim to back up the information with primary sources such as textbooks (particularly for equations), and cite the primary source.

Do not think that Wikipedia is bad - on the contrary, Wikipedia is a marvelous resource, and its culture of knowledge philanthropy is something of which it should be rightfully proud. It is an excellent place to start your reading on a given subject, but when it comes to your authoritative writing you should cite a permanent, peer-reviewed primary or secondary source, not Wikipedia.



  1. S. D. Rychnovsky, Org. Lett., 2006, 13, 2895-2898. DOI:10.1021/ol0611346
  2. N. Paski, Digital Object Identifiers for scientific data, Data Science Journal, 2005, 12-20. DOI:10.2481/dsj.4.12
  3. P. Kumar, A. Ziegler, J. Ziegler, B. Uchanska-Ziegler and A. Ziegler, Trend. Biochem. Sci., 2008, 33, 408-412. DOI:10.1016/j.tibs.2008.06.004
  4. C. S. Wannere, H. S. Rzepa, B. C. Rinderspacher, A. Paul, H. F. Schaefer III, P. v. R. Schleyer and C. S. M. Allan, J. Phys. Chem., 2009, DOI:10.1021/jp902176a

Bibliographic Searches

Conversion of a penicillin core to a cephalosporin core

This part of the course deals with how to find information using search strings To illustrate this, we will define the following search:

The conversion of penicillin to cephalosporin.

In order to locate information, we need to introduce a few concepts:

  • Boolean logical operators: AND, SAME, OR, NOT, XOR
  • Wildcard (Stemming) characters: ? vs * vs $, *SULPHUR vs SUL*UR
  • Grouping: A AND (B OR C) vs (A AND B) OR C
  • Metadata-driven searches (fielded searches): author, year-of-publication with the syntax author:Bloggs or au=Bloggs

A summary of these features for hte four main search engines can be found here

Web of Science, WOS

WOS (Web of Science) uses:

  • field tags (such as title, author, publication name or organization)
  • Booleans: AND, OR, NOT, SAME = Proximity operator,
  • ? = 1 wild character, SUL*UR and BIOLOG* (but not *NATAL, ie middle and right) = 1 or more wild character,
  • (...) for grouped expressions, i.e. A NOT (B OR C). Examples:
    • au=Welton t* and og=imperial and py=2001-2010 and SO=(CHEMICAL COMMUNICATIONS)
    • TI=Reaction AND (TI=penicillin OR TI=cephalosporin) (141)
    • (TI=Reaction AND TI=Penicillin) OR Ti=cephalosporin (2683)
    • TI=Carbapenem AND TI=Penicillin and ti=synthesis (8)

Other Search Engines

Robot Based Internet Indices

Using software for preparation of reports and assignments

There are a great deal of software combinations out there for preparing reports, however we shall only deal here with the most common combination.

Using Microsoft Office with Endnote: Bibliographic Citation Software

Many source of bibliographic information allow the export of the hit list to citation management software. Here the use of just one combination: WOS and Word+EndNote will be demonstrated, and you will have a chance to try it for yourselves in the lab sessions.

Using Mendeley as an organiser

Mendeley is a document organiser and knowledge mining system. The inputs to the program are citation lists obtained from bibliographic searches, and the associated Acrobat files for the documents themselves. Mendeley will index these, and allow you to search a collection of documents in a very similar manner to the iTunes music tracks. It also has a feature similar in concept to the iTunes Genius bar, whereby articles in your collection can be compared with related articles found by others. For example, you could add a reprint associated with a lab course, and find similar articles which may provide you with additional information.

Document submission through Blackboard and Turn-It-In

When submitting a document through the online system, the only formats available are to submit as a PDF or as a Microsoft word document (.doc or .docx). Windows does not have the ability to produce a PDF without dedicated software, but PrimoPDF is a package that is available free of charge and behaves as a virtual printer, delivering any document to a PDF format through the "print document" dialog box.

"Print to PDF" is readily available in Mac OS-X through the "print document" dialog box without installing additional software.

Submissions in Microsoft Word format are preferable as documents can be annotated by markers for delivering feedback, however conversion to PDF means that a wider range of softwares can be used, e.g., LaTeX, OpenOffice, Microsoft Works, Apple's Pages etc...

Workshop Exercises - IT for Laboratories and Reports

Within the laboratory, many of the same skills are used as for lecture support with regard to bibliographic searches, however in a laboratory we are more interested in locating physical properties, synthetic routes and safety data.

MSDS Safety Sheets

The Aldrich catalogues can be searched for compounds and their MSDS safety sheets. These are useful for completing COSHH forms (Control of Substances Hazardous to Health). It is also useful for searching to find an Aldrich catalogue number (e.g. 254738) to acquire an MSDS data sheet which can then be stored in your Mendeley library for future access (e.g. using a mobile device).

Property based searches

A number of search tools can be used to find properties of compounds:

  1. Reaxys
    • "penicillin AND cephalosporin" as a text authors and more search. Available Booleans: AND, NOT, OR, PROXIMITY, NEAR and NEXT with * as a wildcard anywhere in the query (unlike WOS). Grouping not supported.
    • MP.MP=155-156 and IDE.MF=C29H28N2O6S1 and ORP.ORP=190-200 as a field search using Properties (Advanced) from the Substances and Properties option and illustrating property ranges (which implies you have to be aware of the typical errors in many of the experimental measurements made on chemical instrumentation, such as melting points, optical rotations or as below NMR chemical shifts).
  2. The Spectral Database for Organic Compounds SDBS can be searched for matching observed spectral peaks (with estimated errors).
    • 13C peaks: 163, 141, 133, 130, 129, 128, 98 (how big is the error?)
    • H peaks: 8.1,7.5,5.1,4.7 (how big is the error?
    • IR peak: 1733 (how big is the error?)
  3. The NIST Chemistry WebBook can be searched for thermodynamic and spectral properties
  4. Use of "added-value" properties such as ChemCalc for molecular mass calculations as either C47H51NO14 (Taxol), which predicts how the mass spectrum (MS) may look given a formula, or as input of a (MS-derived) accurate mass (say 148.052±0.0005) which is converted to the most likely formula.

Using ChemDraw for 2D searches

Using ChemDraw we can produce simple, tidy chemical structures for use in reports; it also has the very powerful tool that it can produce a SMILES string. This string is a way of 'encoding' chemical structure information e.g. O=C1C(N)C2N1C(C(O)=O)C(C)(C)S2.

To output a SMILES string, select a structure in ChemDraw, and go to "Edit"->"Copy as"->SMILES.

Another type of string which encodes structural information is an InChI string; this can be copied out of ChemDraw in a very similar manner.

These strings can then be used to search structural databases:

  • ChemSpider from the RSC can take a SMILES string and deliver information on that compound.
  • The PubChem database can be searched using a SMILES or InChI string from ChemDraw; O=C1C(N)C2N1C(C(O)=O)C(C)(C)S2 or InChI=1S/C8H12N2O3S/c1-8(2)4(7(12)13)10-5(11)3(9)6(10)14-8/h3-4,6H,9H2,1-2H3,(H,12,13) for "95% similar" (XX Hits)
  • ChemNetBase has compilations of drugs, inorganic and organometallic and natural products which might prove useful for laboratories
  • Organic syntheses for specific molecule queries.
  • Application of Reaxys for specific molecule queries: search for the melting point of aspirin.

Extra Material for Future Reference - 3D Structure based searches

Smiles strings are a 1D way of encoding 2D information; when we start to consider larger molecules, and particularly biological molecules, consideration of the three dimensional properties becomes important, particularly within medicinal chemistry.

  • The on-line Corina service will take and convert a 1D SMILES string into 3D molecular coordinates - a 1D to 3D conversion!
  • Substructure searching of the Cambridge Crystal Database of organic and organometallic molecules for specific molecules and intermolecular interactions (e.g. unusual π-H-O hydrogen bonds). Using the ConQuest program:
    • Name based search: penicillin, 34.
    • 2D structure based search: penicillin, 54 (SMILES string is NOT accepted by this program)
    • 2D structure based search for one of the four molecules shown above
    • 3D structure based search for hydrogen bonds is shown in the lab course pages.
  • Use of JMol to display complex protein structures; there is also a demo page and Nanotech model for use.
  • The Protein Databank (DOI:10.1107/S0108767307035623) can be used to locate many protein structures - keywords penicillin and tetrahedral should reveal any enzyme inhibited with an analogue of a transition state and relating to penicillin
  • Also Protein Explorer (direct entry and trying entering 1blh).
  • Finally, there are alternate searches of biomolecules, including DNA availble.

The materials presented here are by no means exhaustive, and there are many more search engines out there - if you find any good ones, feel free to add to this Wiki!


Go to Introduction | Go to Workshops | Go to Coursework | Go to Assignment | List of Software | List of Searches