1da-workshops-ml-2011

From ChemWiki
Revision as of 16:36, 12 October 2011 by Amckinle (Talk | contribs) (Created page with "=MatLab= [http://en.wikipedia.org/wiki/MATLAB MatLab] is an interactive an intuitive front end that allows you to reap all the efficiency benefits of programming in C, C++, and ...")

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

MatLab

MatLab is an interactive an intuitive front end that allows you to reap all the efficiency benefits of programming in C, C++, and Fortran without the difficulties associated with those languages. In this section we will build a MatLab example that will automate the data analysis of 30 data sets, removing the need to sit in front of the computer and perform repetitive tasks, thus maximising your free or thinking time.

Before continuing, if you are not familiar with MatLab please spend 5 mins to watch this

MatLab for Visualisation

MatLab is the most powerful data visualisation tool available to scientists below are a few examples, MatLab can plot things Excel simply cannot!

Flow in mechanical systems
Directional plots
Dimples at an interface of two liquids
Wind movements measured in 3D


MatLab offers you many possibilities for displaying the same data, data visualisation is central to science. Excel has very limited tools. Excel could not produce most of the plots on this page and the ones it could would take hours. All these plots were produced in MatLab using no more than 10lines of code per plot, also once you have made a plot format you are happy with you save it and can use it time and time again without any manual input.


Flow in a jet 1
Flow in a jet 2
Flow in a jet 3
Flow in a jet 4


MatLab can also do all the usual stuff (and a bit more besides). MatLab is incredible powerful and unlike Excel it produces publication quality plots.


A simple bar chart
A pie chart
Contoured spheres
Using images to map data


The code for all of the plots here can be found here

Basic Programming

Before moving on to specific functions for data analysis there are 5 essential pieces of programming knowledge:


  • Comment characters: anything which is prefixed with a % symbol is considered a comment by MatLab and will not be read as part of your program. Comments are important as they allow you to write in plain english how a program is functioning and what it is doing. This means your script can be read and used by others, and also that you can still use/edit your script years after you have wrote it.


  • Learn to love the semi-colon: suffixing a line with ; means that the outcome of that line will not be printed.Printing results of operations to the command window uses lots of RAM and dramatically reduces the efficiency of your process


x= 1:1000000 %is telling MatLab to define x as all the numbers between one and one million and then print x.
x= 1:1000000 %is telling MatLab to define x as all the numbers between one and one million.


  • Avoid mistakes by starting afresh each time. The first thing you should have in any MatLab program are the 3 golden commands:


clear all;  % This command clears all MatLab data in the computers 'active' memory.
close all;  % This command closes all open windows.
clc;  % This command clears all the text from the MatLab output so anything printed in the command window is from your script.


  • Defining tables, a table is matrix MatLab uses the following syntax TableName(n, m), where n is the row number and m the column number.


  • The for loop is a programmers dream, this allows processes to be completed iteratively


for x=1:1000 %take every value of x between 1 and 1000
y(x, 1) = x.*2; %make a table of y values where for each x the first column is x times 2
y(x, 2) = x.*3; %and the second column is x times 3
end %repeat until all x = 1000


This is a very basic introduction to MatLab, examples to work through are here.

Error Analysis in MatLab

Before doing this exercise download this file. The zip files contains a help sheet with MatLab commands and the data for this exercise. Open a New Window in the MatLab editor, and write at the top of the file:

clear all;
close all;
clc;

Load the Data

  • The first thing we need to do is load the data, place the folder data in your MatLab working directory.
  • Use the load command to load the data Set1 type: load Set1.txt
  • Press F5
  • The data held in file Set1.txt is now loaded into your matlab workspace.

Plot the Data

  • The data is arranged in 2 columns, x is column 1 and y is column 2.
  • Write:
figure %opens a new figure window
plot(Set1(:, 1), Set1(:, 2)) % plot command
xlabel('Wavelength (nm)') % x-axis label
ylabel('Re(Dielectric Function)) % y-axis label
title('Plot of Data') % title of plot
  • Press F5.
  • The data is now plotted and a new figure will open.


Linear Regression

MatLab has an inbuilt linear regression tool called polyfit, the syntax is polyfit(x, y, n) where a fit to the data in y is made from the data available. x is the x-data, y is the y-data and n is the number of terms in your polynomial. y=mx + c is given by n=1, n=2 will add a square term, n=3 a cube term and so on.

  • Make a linear model with n=1, by adding:
fit = polyfit(Set1(:, 1), Set1(:, 2), n);

to you code.

  • Then calculate your fit for the values of x you know. MatLab has an inbuilt function to do this, to save you writing the polynomial out each time, add to your code.
yfit = polyval(fit, x);
  • Now plot the fit and real data to see how good the fit is, add to your code:
figure
plot(Set1(:, 1), Set1(:, 2))
hold on % this tells MatLab you want to add more data to the plot
plot(Set1(:,1), yfit, '--k') %'--k will tell MatLab to make this line black and dashed
xlabel('Wavelength (nm)') % x-axis label
ylabel('Re(Dielectric Function)') % y-axis label
title('Plot of Data') % title of plot
legend('Data', 'Fit') % add a legend to the plot
  • You will see that the plot is not very good, try repeating the process for n=3 to see the quality of the fit improve.


Fully commented code for all of the above can be found here.

Exercise in Data Analysis

In many experiments data comes in a large number of files use this code to analyse the data from an experiment, questions are assigned in the code (!!THINK!! how long this would take in Excel!)

Getting MatLab

MatLab is available to you free of charge and has excellent help guides.