Pattern raiders: regressions of complex time series

NB In this example I use most of the code and concepts already presented when explaining simple time series regressions using genetic algorithms. Please refer to that text for the basic explanations.

In the previous example we were happy with the idea to get rid of many features in the target time series that were considered unimportant. This may be a good solution if we consider small fluctuation as the result of noise that must be ignored to extract the correct features concerning the trends. The clear disadvantage of this strategy is that, besides noise, the regression will also potentially ignore features which may be significant. Consider for example the fluctuations expressed in these target time series:

Here we have wave patterns characterised by a period, an amplitude and a central line which is important to replicate if we want a fine grain prediction of the evolution of these activities or if we are trying to understand the causal relations among each node in a network (e.g. in studies estimating the effective connectivity).

Once again then I use in this example a genetic algorithm as a regression tool. Unfortunately, if we rely on a simple distance from the target to estimate the error (as in the previous example), the result is not acceptable:

The averaging effect takes the lead and the oscillations are treated like noise. To avoid this problem, we have to change the error function accordingly. The one I propose is as follows:

%comparison of period, amplitude and centre line of wave functions
for n=1:length(output(1,:))
    for int=2:4
    output_wave=compute_wave(output(:,n), [T/5*int T]);
    target_wave=compute_wave(target_output(:,n), [T/5*int T]);
    %comparison: period
    error=error+abs(target_wave(1)-output_wave(1))^2;
    %comparison: amplitude
    error=error+abs((target_wave(2)-output_wave(2))*500)^2;
    %comparison: centre line distance
    error=error+abs((target_wave(3)-output_wave(3))*500)^2;
    %comparison: peak index
    error=error+abs(target_wave(4)-output_wave(4));
    end
end

%comparison of period, amplitude and centre line of wave functions

for n=1:length(output(1,:))

for int=2:4

output_wave=compute_wave(output(:,n), [T/5*int T]);

target_wave=compute_wave(target_output(:,n), [T/5*int T]);

%comparison: period

error=error+abs(target_wave(1)-output_wave(1))^2;

%comparison: amplitude

error=error+abs((target_wave(2)-output_wave(2))*500)^2;

%comparison: centre line distance

error=error+abs((target_wave(3)-output_wave(3))*500)^2;

%comparison: peak index

error=error+abs(target_wave(4)-output_wave(4));

end

Having in “compute_wave” (file included in the zip folder) a simple function that extract in the provided interval features such as: wave period, local max and min (ordinates), amplitude, centre line and peak position (abscissa). Since the waves are not constant (the amplitude diminishes), I use the function several times for the same time series (three in the example) in different time intervals, arbitrarily chosen. The weights ascribed to each value comparison have been themselves tuned running the genetic algorithm several times to find a good balance. Consider in this context that distance between target and phenotype is naturally unbalanced in favour of the period (which has a minimum error = 1) in comparison with, e.g., amplitude (which has a maximum error = 1). Hence the mutiplier here used (times 500) for those differences which are otherwise always below 1 in value.

The result looks now more interesting:

But once again, if we analyse the matrix of weights responsible for the target time series and the one selected by the genetic algorithm we will notice important differences:

Target parameters	Optimised parameters
w_11: -0.1310	w_11: 0.7787
w_12: 2.8140	w_12: 2.2731
w_13: -2.0500	w_13: 1.5834
w_21: 1.4760	w_21: -0.9940
w_22: -3.3740	w_22: -0.4927
w_23: -1.2930	w_23: 3.3073
w_31: 2.4480	w_31: -1.0431
w_32: 1.6330	w_32: 0.9901
w_33: 1.0820	w_33: -0.0775
bl1: -1	bl1: 0.7522
bl2: -0.4578	bl2: -0.7349
bl3: 0.9496	bl3: -0.8425

The whole point of this example is to show that each target requires a specific error function to extract the desired feature and allow a regression specifically focussing on those. So what happens if, like in the previous example, we decide to add noise?

This is an ambiguous case: if we try to replicate all the oscillations, we are entailing we want to replicate the noise as well and there is no (easy) way for the error function to tell the difference between fluctuations caused by noise and fluctuations caused by significant alterations in the activity of the recorded units. Thus, the easiest way to approach the problem is to alter the target data, smoothing the curves and fluctuation until we have once again a pattern we are interested in replicating.

smooth=15;

%average for cluster
for ind=smooth:length(all_units.activity(:,1))
    noisy_a(ind,:)=sum(all_units.activity(ind-smooth+1:ind,:))/smooth;    
end
all_units.activity=noisy_a;

plotting

smooth=15;

%average for cluster

for ind=smooth:length(all_units.activity(:,1))

noisy_a(ind,:)=sum(all_units.activity(ind-smooth+1:ind,:))/smooth;

end

all_units.activity=noisy_a;

plotting

This operation result in the following:

Note the ratio noise to signal favours the former in the last part of the target data series as the amplitude of the waves diminishes, whilst the noise range in this case is constant. This effect is detrimental for the regression and has to be weighed accordingly.

Please consider some of these simulations may take several hours, depending on the speed of the processor(s) you are using.

Download all the files here. Regression with genetic algorithms – part 2 (Matlab files)

– Run the genetic algorithm from file GAmain.m

– Check the error function in exp_error

– Run single test to plot the time series using separate_test.m

Pattern raiders: regressions of complex time series

Please consider some of these simulations may take several hours, depending on the speed of the processor(s) you are using.

Like this:

Leave a Reply Cancel reply

In this section

Meta

Pattern raiders: regressions of complex time series

Please consider some of these simulations may take several hours, depending on the speed of the processor(s) you are using.

Share this:

Like this:

Leave a Reply Cancel reply

In this section

Meta