IT Relevant to Lecture Courses, Tutorials and Set Projects
Objectives of these lectures:
To define Chemo-informatics as the collection, representation and organisation of chemical data to create chemical information, to which theories and models can be applied to create chemical knowledge[1]
To introduce the background to the course, and the skills to be acquired during the course laboratories, including the use of computers, their software and network information resources available, prioritising and organising the information obtained using these tools and how to cite the chemical literature in your laboratory reports and essays.
To introduce the chemistry computer laboratory sessions and what you are expected to achieve during these sessions.
This course does not deal with any aspects of data logging, analysis and mining (often called Chemometrics) e.g. Excel spreadsheets, Mathematica, MatLab etc.
Lecture 1: Managing your Computer Desktop
Organisation of filesThis course is all about managing data/information/knowledge with the help of computers.
They do so with the help of an:
Operating System (OS), examples of which include:
Microsoft Windows: Windows 7.
Unix: Mac OS X, Redhat Linux
Mobile devices/SmartPhones: Symbian, Windows Mobile, Android, iOS (+ iTunes)
Access to which is controlled by authentication against User names/passwords and via Web-pages by the same authentication, and which serves to identify the author/curator of data and information so created.
Organisation: is historically be a metaphor based on Files or Documents which are located in Hierarchical Folders (Directories). Directories referred to as Home or My documents have special status for each authenticated user.
Files: adopt naming convention can use up to 256 characters, but with some caveats:
do not use characters such as space, $, /, :, ? .
If you are tempted to use a space, use the underscore _ instead!
On Linux (only), Filenames are case sensitive. Often the cause of much confusion!
File Content/Data type: is normally (approximately) indicated by adding a 2-4 character extension after a period (.docx) to the name.
This extension may or may not be visible. Chemical files reserve ~8 different extensions, so you may end up with up to 8 files with apparently the same name!
Special types of file, used by the operating system, may be invisible by virtue of their name starting with a period.
The (free text) content of a file may have been indexed and hence may become searchable by the utilities provided by the operating system.
File Metadata (Properties): Creation/Modification Dates, sizes, access permissions, "ownership", content, etc is also organised by the OS.
File Location is in a hierarchy and is located by searches using file metadata as criteria.
File Size: In "bytes" (approximately, 1 character = 1 byte, sometimes 2 bytes). 106 bytes =~1 Mbyte, 109 bytes = ~1 Gbyte, 1012 bytes = ~1 Tbyte. Maximum size for any file normally 2 Gbyte (Windows) or very much larger (Linux, Mac OS X).
File Archives: A collection of Folders and Files which preserves the hierarchy and file metadata (.zip, .tar). A .docx file is in fact a (zip) archive
"clipboard" in "System Memory" (capacity not known by user, but probably < 10 Mbyte)
cache or temporary files, not normally seen by the user but can wreak havoc if corrupt!
File Usage: Data Files are created and exchanged using:
Combinations of programs, typically a Word processor (Word), a chemical drawing program (Chemdraw) and Bibliographic database (EndNote/Mendeley).
Data exchange between these programs using copy/paste via clipboards or via files (drag-n-drop, save/open or sync).
Managing your Location
All the above applies when you are connecting to your resources on campus (which by definition also includes South Kensington Halls of residence). If you are outside this catchment area, some IT services will not work unless you enter the campus virtually by switching on something called a VPN. These services include most Scientific journals and the important databases.
Accessing Lecture Notes and Scientific Journals
The world's scientific and chemical data, information and knowledge resides in the following types of reliable resources:
1:Course Notes, written by experts in their field, which will often themselves cite:
2: Primary Peer-reviewed scientific journals (1665 - onwards), as articles with identified authors and hence provenance;
3: Secondary Peer-reviewed scientific journals, as review articles/books with identified authors
4: Tertiary sources such as abstracts gleaned from the above two sources, collaboratively authored (peer review by a different name) wikipedia-like entries, some blogs and curated/edited database collections.
Golden rule: Always cite your sources, and if possible the primary ones.
Course Notes and Other Parochial Materials
These can be found in two principle locations; the College Blackboardvirtual learning environment and the chemistry department Wiki. They are mostly available in the form of Acrobat (PDF) format, and rather less commonly as Powerpoint slide shows. You would normally download these to your computer and store them in a library (EndNote or Mendeley), view them on the computer, or print them off. The notes are periodically placed into Blackboard by the lecturers, although they may not be regularly updated. Some lecturers also use blogs and podcasts (but not yet Twitter!).
Scientific Journals (Primary and secondary sources)
Journals and Books are identified using a formal citation, embedded in course notes, or other journals. With chemistry, this traditionally takes the form of a numeric superscript [2]. As recentlly as ~2005, you would have to visit a real library, and track down the journal to a specific shelf and then read the printed pages. Since then, it has become almost universal to add to the citation something called a DOI[3] which allows you to visit the journal electronically. Thus we have instead[2] with the DOI appended. Clicking on the link takes you directly to the journal page, when you will be presented with an abstract. These links can be embedded in HTML pages (such as the one we are looking at now) or in PDF files. If you are given a DOI in unlinked form, you can resolve it by typing http://dx.doi.org/the-DOI-itself into a browser (or you can track it down here if you know the authors and title). You can then view the article itself in either HTML or PDF form.
HTML (Hypertext markup language) vs PDF (Portable document format)
PDF is the format preferred for producing printed copies, and is just starting to be deployed in new Bibliographic database systems such as Mendeley and in 3D forms[4].
HTML is nowadays viewed by an increasing number of publishers as the medium best suited for enhancing the journal article beyond the printable form. Many articles nowadays include rotatable molecules, and other interactive media.
Journals themselves divide into those published by learned societies and by purely commercial organisations. Between them, the four below should cover perhaps 90% of the journals that you will need to access.
Science Direct and Wiley online library represent two major commercial publishers, each offering an aggregation of journals (a shopping mall if you like).
Most publishers now offer e.g. iPhone and iPad apps to facilitate reading journals in this manner.
The above represent the primary literature, and the articles there designed primarily for researchers. An excellent journal which addresses the more pedagogic aspects of chemistry is the Journal of Chemical Education (abbreviated to J. Chem. Ed.) which not only covers aspects of lectures, but also describes new and interesting laboratory experiments (some of which materialise in our own labs!).
The central library has a chemistry librarian (Katharine Thompson) and many chemistry collections and a complete alphabetic list of Journals, together with an Inter-library loan (ILL) system for requesting reprints of journal and books not held on campus. A fully digital version of the ILL has recently been introduced, although (unlike most digital music) this has DRM (digital rights management).
When writing a laboratory report (and in later years literature reports, essays and perhaps even your own published article), you will be expected to cite your sources, in the manner shown below.
↑For an example of one bird's-eye view of chemistry, see A. H. Lipkus, Q. Yuan, K. A. Lucas, S. A. Funk, W. F. Bartelt, R. J. Schenck, and A. J. Trippe, J. Org. Chem., 2008, 73, 4443–4451. DOI:10.1021/jo8001276
↑N. Paski, Digital Object Identifiers for scientific data, Data Science Journal, 2005, 12-20. DOI:10.2481/dsj.4.12
↑P. Kumar, A. Ziegler, J. Ziegler, B. Uchanska-Ziegler and A. Ziegler, Trend. Biochem. Sci., 2008, 33, 408-412. DOI:10.1016/j.tibs.2008.06.004
↑C. S. Wannere, H. S. Rzepa, B. C. Rinderspacher, A. Paul, H. F. Schaefer III, P. v. R. Schleyer and C. S. M. Allan, J. Phys. Chem., 2009, DOI:10.1021/jp902176a
Tertiary Sources (Wikipedia)
The use of Wikipedia and Scientific blogs as a source of information. Its normally pretty good for chemistry, but do not always assume its correct!
Lecture 2: Bibliographic Searches using Scientific Databases
Eugene GarfieldKonrad BeilsteinGeorge Boole
'This part of the course is centred how to search for information using search strings. To illustrate it, we will define the following search:
The conversion of penicillin to cephalosporin.
The following concepts will be introduced:
Boolean logical operators: AND (and the slightly more specific SAME), OR, NOT, XOR.
Wildcard (Stemming) characters: ? vs * vs $, *SULPHUR vs SULPHU* vs SUL*UR
Grouping: A AND (B OR C) vs (A AND B) OR C
Metadata-driven searches (fielded searches): author, year-of-publication with the syntax author:Blogs or au=Blogs
A summary of these features for the four main search engines can be found here
Using Microsoft Office with EndNote: Bibliographic citation software
Many source of bibliographic information allow the export of the hit list to citation management software. Here the use of just one combination: WOS and Word+EndNote will be demonstrated, and you will have a chance to try it for yourselves in the lab sessions.
Using Mendeley as an organiser
Mendeley is a document organiser and knowledge mining system. The inputs to the program are citation lists obtained from bibliographic searches, and the associated Acrobat files for the documents themselves. Mendeley will index these, and allow you to search a collection of documents in a very similar manner to the iTunes music tracks. It also has a feature similar in concept to the iTunes Genius bar, whereby articles in your collection can be compared with related articles found by others. For example, you could add a reprint associated with a lab course, and find similar articles which may provide you with additional information.
Introduction to Lab courses
A quick overview of the lab, and what will be done in the first session.
IT Relevant to Laboratories and Reports
Objectives of these lectures: To demonstrate how to search for information relevant to laboratory courses, and lab. write-ups. This will include how to search for properties of chemicals (physical, spectroscopic), safety sheets, and 3D coordinates.
Lecture 3. MSDS Safety Sheets
The Aldrich catalogues can be searched for compounds and their MSDS safety sheets. Useful for completing COSHH forms. It is also useful for searching eg an Aldrich catalogue number (e.g. 254738) to acquire an MSDS data sheet, and inserting this into your Mendeley library for future access (using e.g. a mobile device).
"penicillin and cephalosporin" as a text authors and more search. Available Booleans: AND, NOT, OR, PROXIMITY, NEAR and NEXT with * as a wildcard anywhere in the query (unlike WOS). Grouping not supported.
MP.MP=155-156 and IDE.MF=C29H28N2O6S1 and ORP.ORP=190-200 as a field search using Properties (Advanced) from the Substances and Properties option and illustrating property ranges (which implies you have to be aware of the typical errors in many of the experimental measurements made on chemical instrumentation, such as melting points, optical rotations or as below NMR chemical shifts).
Use of "added-value" properties such as ChemCalc for molecular mass calculations as either C47H51NO14 (Taxol), which predicts how the mass spectrum (MS) may look given a formula, or as input of a (MS-derived) accurate mass (say 148.052±0.0005) which is converted to the most likely formula.
Using Chemdraw and Structure based searches (2D)
Searching the ChemSpider database using a SMILES string generated from ChemdrawO=C1C(N)C2N1C(C(O)=O)C(C)(C)S2
Searching the PubChem database using a SMILES or InChI string generated from ChemdrawO=C1C(N)C2N1C(C(O)=O)C(C)(C)S2 or InChI=1S/C8H12N2O3S/c1-8(2)4(7(12)13)10-5(11)3(9)6(10)14-8/h3-4,6H,9H2,1-2H3,(H,12,13) for "95% similar" (49 hits)
ChemNetBase has compilations of Drugs, inorganic and organometallic and natural products which might prove useful to you for laboratories.
Sub-structure searching of the Cambridge crystal database (183/5E20C9) of organic and organometallic molecules for specific molecules, and intermolecular interactions (e.g unusual π-H-O hydrogen bonds).
Name based search: penicillin, 34.
2D structure based search: penicillin, 54 (SMILES string is NOT accepted by this program)
2D structure based search for one of the four molecules shown above
3D structure based search for hydrogen bonds is shown in the lab course pages.
The guidelines for demarcation between small molecule and bio- or macromolecule databases are found here.
Wikis
Pentahelicene
The traditional stand-alone (=printable) document is being replaced by equivalent formats designed for an on-line existence. You will here be introduced to the Wiki, a presentation system some lecture and lab courses have adopted.