Version Issues
One of the key issues to be mindful of is that there are two new versions of MATLAB every year, one in the spring of that year (e.g., 2016a) and one in the fall (e.g., 2016b). Whereas changes in each release are incremental, over time, these changes do add up to transformational change.
For instance, between MATLAB 2014a and now, the default colormap was changed from “jet” to “parula.” If this changes again in the future, the figures might look different from what you see. If that is the case, you can set your colormap explicitly, by typing
colormap(jet)
or
colormap(parula).
Sometimes, changes to how MATLAB works require updating your code. For instance, the way to initialize (“seed”) the random number generator was by calling the setDefaultStream method of the RandStream object, such as this:
RandStream.setDefaultStream(s);
after defining s as
s = RandStream('mt19937ar', 'seed', sum(100*clock))
This specifies a particular type of random number generation method (one involving mersenne twisters), hooked up to the system clock.
But this no longer works. The method setDefaultStream has been replaced with setGlobalStream, the correct code is now: RandStream.setGlobalStream(s);
Be on the lookout for things like that.
VECTORIZATION
MATLAB is an interpreted language, so each line is interpreted and executed one after the other. This is fine, but if there are a lot of lines, it can take a lot of time to execute the code. This is a particular concern if there are long loops, and
possibly even nested loops. In principle, every loop can be replaced by a vector operation, and MATLAB is optimized to do those, so this will speed up your code considerably. Here, we will provide three simple examples of how to do this that generalize easily. Note that this is less of a concern as of late. Which interfaces with the “version issues” point made above. Since recently, Matlab code is now auto-compiled before it is run (under the hood and out of sight), speeding up code considerably.
1. A single loop:
Say you have data from 100,000,000 trials and need to calculate the total number of photons presented in a given trial (from illumination levels and time presented in milliseconds). If you do this with a loop, it will take a while:
numTrials = 1e6; numPhotons = randi(100,[numTrials,1]); exposureDuration = randi(1000,[numTrials,1]); tic for ii = 1:numTrials totalExposure(ii,1) = numPhotons(ii)*exposureDuration(ii); end toc
If you replace the second paragraph with this one (without the loop) the result will be the same, but it will be much
faster:
tic totalExposure = numPhotons.*exposureDuration; toc
2. Nested loops:
Say you have a 10,000 by 10,000 matrix that results from multiplying all numbers from 1 to 10,000 with all other numbers from 1 to 10,000 (a full cross). You can do this element by element, first going through all rows, then all columns.
Note that we always preallocate. Otherwise, this would take even longer:
howBig = 1e4; tic M = zeros(howBig,howBig); %Always preallocate for ii = 1:howBig for jj = 1:howBig M(ii,jj) = ii*jj; %Each ii, jjth entry of M is ii * jj end end toc
This works, but it takes a long time.
Now, we vectorize the last dimension (columns), so instead of a nested loop, we have a single loop:
tic M = zeros(howBig,howBig); %Always preallocate for ii = 1:howBig M(ii,:) = ii*(1:howBig); %Doing each row at once end toc
This should already be much faster.
Finally, let’s vectorize both dimension and get rid of loops altogether:
tic M = zeros(howBig,howBig); %Always preallocate M(:,:) = (1:howBig)'*(1:howBig); %Doing it all at once toc
Note that we have to transpose the first vector to get the outer product. All three code versions yield the same result, but you should be able to realize considerable time savings with the last one. These would be even more dramatic if you compared it to a unpreallocated version of the code.
3. Conditionals:
Say you want to add a number (e.g., performance) to a running total, but only if another number (e.g., percentage of trials completed) is big enough. You could do this in a loop, checking the condition each time.
numParticipants = 1e6; numTrials = randi(100,[numParticipants,1]); performance = rand(numParticipants,1); cumPerf = 0; tic for ii = 1:numParticipants if numTrials(ii,1) > 50 cumPerf = cumPerf + performance(ii,1); end end toc
It is straightforward to replace the second paragraph with faster code that produces the same result and that gets rid of the loop:
tic temp = find(numTrials > 50); cumPerf = sum(performance(temp)); toc