1da-workshops-ml-2011
Contents
MatLab
MatLab is an interactive an intuitive front end that allows you to reap all the efficiency benefits of programming in C, C++, and Fortran without the difficulties associated with those languages. In this section we will build a MatLab example that will automate the data analysis of 30 data sets, removing the need to sit in front of the computer and perform repetitive tasks, thus maximising your free or thinking time.
Before continuing, if you are not familiar with MatLab please spend 5 mins to watch this
MatLab for Visualisation
MatLab is the most powerful data visualisation tool available to scientists below are a few examples, MatLab can plot things Excel simply cannot!
MatLab offers you many possibilities for displaying the same data, data visualisation is central to science. Excel has very limited tools. Excel could not produce most of the plots on this page and the ones it could would take hours. All these plots were produced in MatLab using no more than 10lines of code per plot, also once you have made a plot format you are happy with you save it and can use it time and time again without any manual input.
MatLab can also do all the usual stuff (and a bit more besides). MatLab is incredible powerful and unlike Excel it produces publication quality plots.
The code for all of the plots here can be found here
Basic Programming
Before moving on to specific functions for data analysis there are 5 essential pieces of programming knowledge:
- Comment characters: anything which is prefixed with a % symbol is considered a comment by MatLab and will not be read as part of your program. Comments are important as they allow you to write in plain english how a program is functioning and what it is doing. This means your script can be read and used by others, and also that you can still use/edit your script years after you have wrote it.
- Learn to love the semi-colon: suffixing a line with ; means that the outcome of that line will not be printed.Printing results of operations to the command window uses lots of RAM and dramatically reduces the efficiency of your process
- x= 1:1000000 %is telling MatLab to define x as all the numbers between one and one million and then print x.
- x= 1:1000000 %is telling MatLab to define x as all the numbers between one and one million.
- Avoid mistakes by starting afresh each time. The first thing you should have in any MatLab program are the 3 golden commands:
- clear all; % This command clears all MatLab data in the computers 'active' memory.
- close all; % This command closes all open windows.
- clc; % This command clears all the text from the MatLab output so anything printed in the command window is from your script.
- Defining tables, a table is matrix MatLab uses the following syntax TableName(n, m), where n is the row number and m the column number.
- The for loop is a programmers dream, this allows processes to be completed iteratively
- for x=1:1000 %take every value of x between 1 and 1000
- y(x, 1) = x.*2; %make a table of y values where for each x the first column is x times 2
- y(x, 2) = x.*3; %and the second column is x times 3
- end %repeat until all x = 1000
- for x=1:1000 %take every value of x between 1 and 1000
This is a very basic introduction to MatLab, examples to work through are here.
Error Analysis in MatLab
Before doing this exercise download this file. The zip files contains a help sheet with MatLab commands and the data for this exercise. Open a New Window in the MatLab editor, and write at the top of the file:
- clear all;
- close all;
- clc;
Load the Data
- The first thing we need to do is load the data, place the folder data in your MatLab working directory.
- Use the load command to load the data Set1 type: load Set1.txt
- Press F5
- The data held in file Set1.txt is now loaded into your matlab workspace.
Plot the Data
- The data is arranged in 2 columns, x is column 1 and y is column 2.
- Write:
- figure %opens a new figure window
- plot(Set1(:, 1), Set1(:, 2)) % plot command
- xlabel('Wavelength (nm)') % x-axis label
- ylabel('Re(Dielectric Function)) % y-axis label
- title('Plot of Data') % title of plot
- Press F5.
- The data is now plotted and a new figure will open.
Linear Regression
MatLab has an inbuilt linear regression tool called polyfit, the syntax is polyfit(x, y, n) where a fit to the data in y is made from the data available. x is the x-data, y is the y-data and n is the number of terms in your polynomial. y=mx + c is given by n=1, n=2 will add a square term, n=3 a cube term and so on.
- Make a linear model with n=1, by adding:
- fit = polyfit(Set1(:, 1), Set1(:, 2), n);
to you code.
- Then calculate your fit for the values of x you know. MatLab has an inbuilt function to do this, to save you writing the polynomial out each time, add to your code.
- yfit = polyval(fit, x);
- Now plot the fit and real data to see how good the fit is, add to your code:
- figure
- plot(Set1(:, 1), Set1(:, 2))
- hold on % this tells MatLab you want to add more data to the plot
- plot(Set1(:,1), yfit, '--k') %'--k will tell MatLab to make this line black and dashed
- xlabel('Wavelength (nm)') % x-axis label
- ylabel('Re(Dielectric Function)') % y-axis label
- title('Plot of Data') % title of plot
- legend('Data', 'Fit') % add a legend to the plot
- You will see that the plot is not very good, try repeating the process for n=3 to see the quality of the fit improve.
Fully commented code for all of the above can be found here.
Exercise in Data Analysis
In many experiments data comes in a large number of files use this code to analyse the data from an experiment, questions are assigned in the code (!!THINK!! how long this would take in Excel!)
Getting MatLab
MatLab is available to you free of charge and has excellent help guides.