nsim.analyses1 package

Submodules

nsim.analyses1.epochs module

Functions to identify pseudo-stationary epochs, where a timeseries has low ‘variability’. There is flexibility to give your own function to define ‘variability’ in the way you want, or you can use the default.

An example of a ‘variability’ function is given based on Blenkinsop2012, using centroid frequency and height of spectral peak to define ‘variability’.

functions:
epochs Find pseudo-stationary epochs, based on any defn of ‘variability’. epochs_distributed Same as above, but compute all channels in parallel. epochs_joint Find epochs where x% of channels are simultaneously stationary. variability_fp Example 2D variability function, extending Blenkinsop2012.
See also: Blenkinsop et al. (2012) The dynamic evolution of focal-onset
epilepsies - combining theoretical and clinical observations
nsim.analyses1.epochs.epochs(ts, variability=None, threshold=0.0, minlength=1.0, plot=True)

Identify “stationary” epochs within a time series, based on a continuous measure of variability. Epochs are defined to contain the points of minimal variability, and to extend as wide as possible with variability not exceeding the threshold.

Parameters:
  • Timeseries of m variables, shape (n, m). (ts) –
  • (optional) Timeseries of shape (n, m, q), giving q scalar (variability) – measures of the variability of timeseries ts near each point in time. (if None, we will use variability_fp()) Epochs require the mean of these to be below the threshold.
  • The maximum variability permitted in stationary epochs. (threshold) –
  • Shortest acceptable epoch length (in seconds) (minlength) –
  • bool Whether to display the output (plot) –
Returns: (variability, allchannels_epochs)
variability: as above allchannels_epochs: (list of) list of tuples For each variable, a list of tuples (start, end) that give the starting and ending indices of stationary epochs. (epochs are inclusive of start point but not the end point)
nsim.analyses1.epochs.epochs_distributed(ts, variability=None, threshold=0.0, minlength=1.0, plot=True)

Same as epochs(), but computes channels in parallel for speed.

(Note: This requires an IPython cluster to be started first,
e.g. on a workstation type ‘ipcluster start’)

Identify “stationary” epochs within a time series, based on a continuous measure of variability. Epochs are defined to contain the points of minimal variability, and to extend as wide as possible with variability not exceeding the threshold.

Parameters:
  • Timeseries of m variables, shape (n, m). (ts) –
  • (optional) Timeseries of shape (n, m, q), giving q scalar (variability) – measures of the variability of timeseries ts near each point in time. (if None, we will use variability_fp()) Epochs require the mean of these to be below the threshold.
  • The maximum variability permitted in stationary epochs. (threshold) –
  • Shortest acceptable epoch length (in seconds) (minlength) –
  • bool Whether to display the output (plot) –
Returns: (variability, allchannels_epochs)
variability: as above allchannels_epochs: (list of) list of tuples For each variable, a list of tuples (start, end) that give the starting and ending indices of stationary epochs. (epochs are inclusive of start point but not the end point)
nsim.analyses1.epochs.epochs_joint(ts, variability=None, threshold=0.0, minlength=1.0, proportion=0.75, plot=True)

Identify epochs within a multivariate time series where at least a certain proportion of channels are “stationary”, based on a previously computed variability measure.

(Note: This requires an IPython cluster to be started first,
e.g. on a workstation type ‘ipcluster start’)
Parameters:
  • Timeseries of m variables, shape (n, m). (ts) –
  • (optional) Timeseries of shape (n, m), giving a scalar (variability) – measure of the variability of timeseries ts near each point in time. (if None, we will use variability_fp())
  • The maximum variability permitted in stationary epochs. (threshold) –
  • Shortest acceptable epoch length (in seconds) (minlength) –
  • Require at least this fraction of channels to be "stationary" (proportion) –
  • bool Whether to display the output (plot) –
Returns: (variability, joint_epochs)
joint_epochs: list of tuples A list of tuples (start, end) that give the starting and ending indices of time epochs that are stationary for at least proportion of channels. (epochs are inclusive of start point but not the end point)
nsim.analyses1.epochs.variability_fp(ts, freqs=None, ncycles=6, plot=True)

Example variability function. Gives two continuous, time-resolved measures of the variability of a time series, ranging between -1 and 1. The two measures are based on variance of the centroid frequency and variance of the height of the spectral peak, respectively. (Centroid frequency meaning the power-weighted average frequency) These measures are calculated over sliding time windows of variable size. See also: Blenkinsop et al. (2012) The dynamic evolution of focal-onset

epilepsies - combining theoretical and clinical observations
Parameters:
  • Timeseries of m variables, shape (n, m). Assumed constant timestep. (ts) –
  • (optional) List of frequencies to examine. If None, defaults to (freqs) – 50 frequency bands ranging 1Hz to 60Hz, logarithmically spaced.
  • Window size, in number of cycles of the centroid frequency. (ncycles) –
  • bool Whether to display the output (plot) –
Returns:

variability Timeseries of shape (n, m, 2) variability[:, :, 0] gives a measure of variability between -1 and 1 based on variance of centroid frequency. variability[:, :, 1] gives a measure of variability between -1 and 1 based on variance of maximum power.

nsim.analyses1.freq module

Various frequency domain analyses that apply to a single time series.

functions:
psd lowpass highpass bandpass notch hilbert hilbert_amplitude hilbert_phase epochs cwt cwt_distributed
nsim.analyses1.freq.bandpass(ts, low_hz, high_hz, order=3)

forward-backward butterworth band-pass filter

nsim.analyses1.freq.cwt(ts, freqs=array([ 1., 1.09854114, 1.20679264, 1.32571137, 1.45634848, 1.59985872, 1.75751062, 1.93069773, 2.12095089, 2.32995181, 2.55954792, 2.8117687, 3.0888436, 3.39322177, 3.72759372, 4.09491506, 4.49843267, 4.94171336, 5.42867544, 5.96362332, 6.55128557, 7.19685673, 7.90604321, 8.68511374, 9.54095476, 10.48113134, 11.51395399, 12.64855217, 13.89495494, 15.26417967, 16.76832937, 18.42069969, 20.23589648, 22.22996483, 24.42053095, 26.82695795, 29.47051703, 32.37457543, 35.56480306, 39.06939937, 42.9193426, 47.14866363, 51.79474679, 56.89866029, 62.50551925, 68.6648845, 75.43120063, 82.86427729, 91.0298178, 100. ]), wavelet=<function cwtmorlet>, plot=True)

Continuous wavelet transform Note the full results can use a huge amount of memory at 64-bit precision

Parameters:
  • ts – Timeseries of m variables, shape (n, m). Assumed constant timestep.
  • freqs – list of frequencies (in Hz) to use for the tranform. (default is 50 frequency bins logarithmic from 1Hz to 100Hz)
  • wavelet – the wavelet to use. may be complex. see scipy.signal.wavelets
  • plot – whether to plot time-resolved power spectrum
Returns:

Continuous wavelet transform output array, shape (n,len(freqs),m)

Return type:

coefs

nsim.analyses1.freq.cwt_distributed(ts, freqs=array([ 1., 1.09854114, 1.20679264, 1.32571137, 1.45634848, 1.59985872, 1.75751062, 1.93069773, 2.12095089, 2.32995181, 2.55954792, 2.8117687, 3.0888436, 3.39322177, 3.72759372, 4.09491506, 4.49843267, 4.94171336, 5.42867544, 5.96362332, 6.55128557, 7.19685673, 7.90604321, 8.68511374, 9.54095476, 10.48113134, 11.51395399, 12.64855217, 13.89495494, 15.26417967, 16.76832937, 18.42069969, 20.23589648, 22.22996483, 24.42053095, 26.82695795, 29.47051703, 32.37457543, 35.56480306, 39.06939937, 42.9193426, 47.14866363, 51.79474679, 56.89866029, 62.50551925, 68.6648845, 75.43120063, 82.86427729, 91.0298178, 100. ]), wavelet=<function cwtmorlet>, plot=True)

Continuous wavelet transform using distributed computation. (Currently just splits the data by channel. TODO split it further.) Note: this function requires an IPython cluster to be started first.

Parameters:
  • ts – Timeseries of m variables, shape (n, m). Assumed constant timestep.
  • freqs – list of frequencies (in Hz) to use for the tranform. (default is 50 frequency bins logarithmic from 1Hz to 100Hz)
  • wavelet – the wavelet to use. may be complex. see scipy.signal.wavelets
  • plot – whether to plot time-resolved power spectrum
Returns:

Continuous wavelet transform output array, shape (n,len(freqs),m)

Return type:

coefs

nsim.analyses1.freq.highpass(ts, cutoff_hz, order=3)

forward-backward butterworth high-pass filter

nsim.analyses1.freq.hilbert(ts)

Analytic signal, using the Hilbert transform

nsim.analyses1.freq.hilbert_amplitude(ts)

Amplitude of the analytic signal, using the Hilbert transform

nsim.analyses1.freq.hilbert_phase(ts)

Phase of the analytic signal, using the Hilbert transform

nsim.analyses1.freq.lowpass(ts, cutoff_hz, order=3)

forward-backward butterworth low-pass filter

nsim.analyses1.freq.notch(ts, freq_hz, bandwidth_hz=1.0)

notch filter to remove remove a particular frequency Adapted from code by Sturla Molden

nsim.analyses1.freq.psd(ts, plot=True)

plot Welch estimate of power spectral density

nsim.analyses1.misc module

functions:

nsim.analyses1.misc.autocorrelation(ts, normalized=False, unbiased=False)

Returns the discrete, linear convolution of a time series with itself, optionally using unbiased normalization.

N.B. Autocorrelation estimates are necessarily inaccurate for longer lags, as there are less pairs of points to convolve separated by that lag. Therefore best to throw out the results except for shorter lags, e.g. keep lags from tau=0 up to one quarter of the total time series length.

Parameters:
  • normalized (boolean) – If True, the time series will first be normalized to a mean of 0 and variance of 1. This gives autocorrelation 1 at zero lag.
  • unbiased (boolean) – If True, the result at each lag m will be scaled by 1/(N-m). This gives an unbiased estimation of the autocorrelation of a stationary process from a finite length sample.

Ref: S. J. Orfanidis (1996) “Optimum Signal Processing”, 2nd Ed.

nsim.analyses1.misc.crossing_indices(ts, c=0.0, d=0.0)

For a single variable timeseries, find the time indices each time the value crosses c from above or below. Can optionally set a non-zero d to impose the condition that the value must wander at least d units away from c between crossings.

If the timeseries begins (or ends) exactly at c, then time zero (or the ending time) is also included as a crossing event, so that the boundaries of the first and last excursions are included.

Parameters:
  • ts – Timeseries (single variable)
  • c (float) – Critical value at which to report crossings.
  • d (float) – Optional min distance from c to be attained between crossings.
Returns:

array of indices

nsim.analyses1.misc.first_return_times(ts, c=None, d=0.0)

For a single variable time series, first wait until the time series attains the value c for the first time. Then record the time intervals between successive returns to c. If c is not given, the default is the mean of the time series.

Parameters:
  • ts – Timeseries (single variable)
  • c (float) – Optional target value (default is the mean of the time series)
  • d (float) – Optional min distance from c to be attained between returns
Returns:

array of time intervals (Can take the mean of these to estimate the expected first return time)

nsim.analyses1.phase module

Various analyses that apply to the phase time-series of an oscillator

functions:

nsim.analyses1.phase.circmean(ts, axis=2)

Circular mean phase

nsim.analyses1.phase.circstd(ts, axis=2)

Circular standard deviation

nsim.analyses1.phase.mod2pi(ts)

For a timeseries where all variables represent phases (in radians), return an equivalent timeseries where all values are in the range (-pi, pi]

nsim.analyses1.phase.periods(ts, phi=0.0)

For a single variable timeseries representing the phase of an oscillator, measure the period of each successive oscillation.

An individual oscillation is defined to start and end when the phase passes phi (by default zero) after completing a full cycle.

If the timeseries begins (or ends) exactly at phi, then the first (or last) oscillation will be included.

Parameters:
  • ts – Timeseries (single variable) The timeseries of an angle variable (radians)
  • phi (float) – A single oscillation starts and ends at phase phi (by default zero).
nsim.analyses1.phase.phase_crossings(ts, phi=0.0)

For a single variable timeseries representing the phase of an oscillator, find the time indices each time the phase crosses angle phi, with the condition that the phase must visit phi+pi between crossings.

(Thus if noise causes the phase to wander back and forth across angle phi without the oscillator doing a full revolution, then this is recorded as a single crossing event, giving the index of the earliest arrival.)

If the timeseries begins (or ends) exactly at phi, then time zero (or the ending time) is also included as a crossing event, so that the boundaries of the first and last oscillations are included.

Parameters:
  • ts – Timeseries (single variable) The timeseries of an angle variable (radians)
  • phi (float) – Critical phase angle (radians) at which to report crossings.
Returns:

array of indices

nsim.analyses1.plots module

Various plotting routines for time series

functions:
plot
nsim.analyses1.plots.plot(ts, title=None, show=True)

Plot a Timeseries :param ts Timeseries: :param title str: :param show bool whether to display the figure or just return a figure object:

nsim.analyses1.pyeeg module

Copyleft 2010-2015 Forrest Sheng Bao http://fsbao.net
Copyleft 2010 Xin Liu Copyleft 2014-2015 Borzou Alipour Fard

PyEEG, a Python module to extract EEG feature.

Project homepage: http://pyeeg.org

Data structure

PyEEG only uses standard Python and numpy data structures, so you need to import numpy before using it. For numpy, please visit http://numpy.scipy.org

Naming convention

I follow “Style Guide for Python Code” to code my program http://www.python.org/dev/peps/pep-0008/

Constants: UPPER_CASE_WITH_UNDERSCORES, e.g., SAMPLING_RATE, LENGTH_SIGNAL.

Function names: lower_case_with_underscores, e.g., spectrum_entropy.

Variables (global and local): CapitalizedWords or CapWords, e.g., Power.

If a variable name consists of one letter, I may use lower case, e.g., x, y.

Functions listed alphabetically

nsim.analyses1.pyeeg.ap_entropy(X, M, R)

Computer approximate entropy (ApEN) of series X, specified by M and R.

Suppose given time series is X = [x(1), x(2), ... , x(N)]. We first build embedding matrix Em, of dimension (N-M+1)-by-M, such that the i-th row of Em is x(i),x(i+1), ... , x(i+M-1). Hence, the embedding lag and dimension are 1 and M-1 respectively. Such a matrix can be built by calling pyeeg function as Em = embed_seq(X, 1, M). Then we build matrix Emp, whose only difference with Em is that the length of each embedding sequence is M + 1

Denote the i-th and j-th row of Em as Em[i] and Em[j]. Their k-th elements are Em[i][k] and Em[j][k] respectively. The distance between Em[i] and Em[j] is defined as 1) the maximum difference of their corresponding scalar components, thus, max(Em[i]-Em[j]), or 2) Euclidean distance. We say two 1-D vectors Em[i] and Em[j] match in tolerance R, if the distance between them is no greater than R, thus, max(Em[i]-Em[j]) <= R. Mostly, the value of R is defined as 20% - 30% of standard deviation of X.

Pick Em[i] as a template, for all j such that 0 < j < N - M + 1, we can check whether Em[j] matches with Em[i]. Denote the number of Em[j], which is in the range of Em[i], as k[i], which is the i-th element of the vector k. The probability that a random row in Em matches Em[i] is simga_1^{N-M+1} k[i] / (N - M + 1), thus sum(k)/ (N - M + 1), denoted as Cm[i].

We repeat the same process on Emp and obtained Cmp[i], but here 0<i<N-M since the length of each sequence in Emp is M + 1.

The probability that any two embedding sequences in Em match is then sum(Cm)/ (N - M +1 ). We define Phi_m = sum(log(Cm)) / (N - M + 1) and Phi_mp = sum(log(Cmp)) / (N - M ).

And the ApEn is defined as Phi_m - Phi_mp.

Notes

  1. Please be aware that self-match is also counted in ApEn.
  2. This function now runs very slow. We are still trying to speed it up.

References

Costa M, Goldberger AL, Peng CK, Multiscale entropy analysis of biological signals, Physical Review E, 71:021906, 2005

See also

samp_entropy()
sample entropy of a time series

Notes

Extremely slow implementation. Do NOT use if your dataset is not small.

nsim.analyses1.pyeeg.bin_power(X, Band, Fs)

Compute power in each frequency bin specified by Band from FFT result of X. By default, X is a real signal.

A real signal can be synthesized, thus not real.

Band

list

boundary frequencies (in Hz) of bins. They can be unequal bins, e.g. [0.5,4,7,12,30] which are delta, theta, alpha and beta respectively. You can also use range() function of Python to generate equal bins and pass the generated list to this function.

Each element of Band is a physical frequency and shall not exceed the Nyquist frequency, i.e., half of sampling frequency.

X

list

a 1-D real time series.

Fs

integer

the sampling rate in physical frequency

Returns:
  • Power – list

    spectral power in each frequency bin.

  • Power_ratio – list

    spectral power in each frequency bin normalized by total power in ALL frequency bins.

nsim.analyses1.pyeeg.dfa(X, Ave=None, L=None)

Compute Detrended Fluctuation Analysis from a time series X and length of boxes L.

The first step to compute DFA is to integrate the signal. Let original series be X= [x(1), x(2), ..., x(N)].

The integrated signal Y = [y(1), y(2), ..., y(N)] is obtained as follows y(k) = sum_{i=1}^{k}{x(i)-Ave} where Ave is the mean of X.

The second step is to partition/slice/segment the integrated sequence Y into boxes. At least two boxes are needed for computing DFA. Box sizes are specified by the L argument of this function. By default, it is from 1/5 of signal length to one (x-5)-th of the signal length, where x is the nearest power of 2 from the length of the signal, i.e., 1/16, 1/32, 1/64, 1/128, ...

In each box, a linear least square fitting is employed on data in the box. Denote the series on fitted line as Yn. Its k-th elements, yn(k), corresponds to y(k).

For fitting in each box, there is a residue, the sum of squares of all offsets, difference between actual points and points on fitted line.

F(n) denotes the square root of average total residue in all boxes when box length is n, thus Total_Residue = sum_{k=1}^{N}{(y(k)-yn(k))} F(n) = sqrt(Total_Residue/N)

The computing to F(n) is carried out for every box length n. Therefore, a relationship between n and F(n) can be obtained. In general, F(n) increases when n increases.

Finally, the relationship between F(n) and n is analyzed. A least square fitting is performed between log(F(n)) and log(n). The slope of the fitting line is the DFA value, denoted as Alpha. To white noise, Alpha should be 0.5. Higher level of signal complexity is related to higher Alpha.

Parameters:
  • X – 1-D Python list or numpy array a time series
  • Ave – integer, optional The average value of the time series
  • L – 1-D Python list of integers A list of box size, integers in ascending order
Returns:

integer the result of DFA analysis, thus the slope of fitting line of log(F(n)) vs. log(n). where n is the

Return type:

Alpha

Examples

>>> import pyeeg
>>> from numpy.random import randn
>>> print pyeeg.dfa(randn(4096))
0.490035110345

Peng C-K, Havlin S, Stanley HE, Goldberger AL. Quantification of scaling exponents and crossover phenomena in nonstationary heartbeat time series. _Chaos_ 1995;5:82-87

Notes

This value depends on the box sizes very much. When the input is a white noise, this value should be 0.5. But, some choices on box sizes can lead to the value lower or higher than 0.5, e.g. 0.38 or 0.58.

Based on many test, I set the box sizes from 1/5 of signal length to one (x-5)-th of the signal length, where x is the nearest power of 2 from the length of the signal, i.e., 1/16, 1/32, 1/64, 1/128, ...

You may generate a list of box sizes and pass in such a list as a parameter.

nsim.analyses1.pyeeg.embed_seq(X, Tau, D)

Build a set of embedding sequences from given time series X with lag Tau and embedding dimension DE. Let X = [x(1), x(2), ... , x(N)], then for each i such that 1 < i < N - (D - 1) * Tau, we build an embedding sequence, Y(i) = [x(i), x(i + Tau), ... , x(i + (D - 1) * Tau)]. All embedding sequence are placed in a matrix Y.

Parameters:
  • X

    list

    a time series

  • Tau

    integer

    the lag or delay when building embedding sequence

  • D

    integer

    the embedding dimension

Returns:

  • Y – 2-D list

    embedding matrix built

  • Examples

  • —————

  • >>> import pyeeg

  • >>> a=range(0,9)

  • >>> pyeeg.embed_seq(a,1,4)

  • array([[ 0., 1., 2., 3.], – [ 1., 2., 3., 4.], [ 2., 3., 4., 5.], [ 3., 4., 5., 6.], [ 4., 5., 6., 7.], [ 5., 6., 7., 8.]])

  • >>> pyeeg.embed_seq(a,2,3)

  • array([[ 0., 2., 4.], – [ 1., 3., 5.], [ 2., 4., 6.], [ 3., 5., 7.], [ 4., 6., 8.]])

  • >>> pyeeg.embed_seq(a,4,1)

  • array([[ 0.], – [ 1.], [ 2.], [ 3.], [ 4.], [ 5.], [ 6.], [ 7.], [ 8.]])

nsim.analyses1.pyeeg.fisher_info(X, Tau, DE, W=None)

Compute SVD Entropy from either two cases below: 1. a time series X, with lag tau and embedding dimension dE (default) 2. a list, W, of normalized singular values of a matrix (if W is provided, recommend to speed up.)

If W is None, the function will do as follows to prepare singular spectrum:

First, computer an embedding matrix from X, Tau and DE using pyeeg function embed_seq():

M = embed_seq(X, Tau, DE)

Second, use scipy.linalg function svd to decompose the embedding matrix M and obtain a list of singular values:

W = svd(M, compute_uv=0)
At last, normalize W:
W /= sum(W)

To speed up, it is recommended to compute W before calling this function because W may also be used by other functions whereas computing it here again will slow down.

nsim.analyses1.pyeeg.hfd(X, Kmax)

Compute Hjorth Fractal Dimension of a time series X, kmax is an HFD parameter

nsim.analyses1.pyeeg.hjorth(X, D=None)

Compute Hjorth mobility and complexity of a time series from either two cases below:

  1. X, the time series of type list (default)
  2. D, a first order differential sequence of X (if D is provided, recommended to speed up)

In case 1, D is computed using Numpy’s Difference function.

Notes

To speed up, it is recommended to compute D before calling this function because D may also be used by other functions whereas computing it here again will slow down.

Parameters:
  • X

    list

    a time series

  • D

    list

    first order differential sequence of a time series

Returns:

  • As indicated in return line
  • Hjorth mobility and complexity

nsim.analyses1.pyeeg.hurst(X)

Compute the Hurst exponent of X. If the output H=0.5,the behavior of the time-series is similar to random walk. If H<0.5, the time-series cover less “distance” than a random walk, vice verse.

Parameters:X

list

a time series

Returns:
  • H

    float

    Hurst exponent

  • Notes
  • ——–
  • Author of this function is Xin Liu

Examples

>>> import pyeeg
>>> from numpy.random import randn
>>> a = randn(4096)
>>> pyeeg.hurst(a)
0.5057444
nsim.analyses1.pyeeg.in_range(Template, Scroll, Distance)

Determines whether one vector is the the range of another vector.

The two vectors should have equal length.

Template
list The template vector, one of two vectors being compared
Scroll
list The scroll vector, one of the two vectors being compared
Distance
float Two vectors match if their distance is less than D

The distance between two vectors can be defined as Euclidean distance according to some publications.

The two vector should of equal length

nsim.analyses1.pyeeg.information_based_similarity(x, y, n)

Calculates the information based similarity of two time series x and y.

Parameters:
  • x

    list

    a time series

  • y

    list

    a time series

  • n

    integer

    word order

IBS

float

Information based similarity

Information based similarity is a measure of dissimilarity between two time series. Let the sequences be x and y. Each sequence is first replaced by its first ordered difference(Encoder). Calculating the Heaviside of the resulting sequences, we get two binary sequences, SymbolicSeq. Using PyEEG function, embed_seq, with lag of 1 and dimension of n, we build an embedding matrix from the latter sequence.

Each row of this embedding matrix is called a word. Information based similarity measures the distance between two sequence by comparing the rank of words in the sequences; more explicitly, the distance, D, is calculated using the formula:

“1/2^(n-1) * sum( abs(Rank(0)(k)-R(1)(k)) * F(k) )” where Rank(0)(k) and Rank(1)(k) are the rank of the k-th word in each of the input sequences. F(k) is a modified “shannon” weighing function that increases the weight of each word in the calculations when they are more frequent in the sequences.

It is advisable to calculate IBS for numerical sequences using 8-tupple words.

References

Yang AC, Hseu SS, Yien HW, Goldberger AL, Peng CK: Linguistic analysis of the human heartbeat using frequency and rank order statistics. Phys Rev Lett 2003, 90: 108103

>>> import pyeeg
>>> from numpy.random import randn
>>> x = randn(100)
>>> y = randn(100)
>>> pyeeg.information_based_similarity(x,y,8)
0.64512947848249214
nsim.analyses1.pyeeg.permutation_entropy(x, n, tau)
Compute Permutation Entropy of a given time series x, specified by

permutation order n and embedding lag tau.

x

list

a time series

n

integer

Permutation order

tau

integer

Embedding lag

PE

float

permutation entropy

Suppose the given time series is X =[x(1),x(2),x(3),

x(N)].

We first build embedding matrix Em, of dimension(n*N-n+1), such that the ith row of Em is x(i),x(i+1),..x(i+n-1). Hence the embedding lag and the embedding dimension are 1 and n respectively. We build this matrix from a given time series, X, by calling pyEEg function embed_seq(x,1,n).

We then transform each row of the embedding matrix into a new sequence, comprising a set of integers in range of 0,..,n-1. The order in which the integers are placed within a row is the same as those of the original elements:0 is placed where the smallest element of the row was and n-1 replaces the largest element of the row.

To calculate the Permutation entropy, we calculate the entropy of PeSeq. In doing so, we count the number of occurrences of each permutation in PeSeq and write it in a sequence, RankMat. We then use this sequence to calculate entropy by using Shanon’s entropy formula.

Permutation entropy is usually calculated with n in range of 3 and 7.

Bandt, Christoph, and Bernd Pompe. “Permutation entropy: a natural complexity measure for time series.” Physical Review Letters 88.17 (2002): 174102.

>>> import pyeeg
>>> x = [1,2,4,5,12,3,4,5]
>>> pyeeg.permutation_entropy(x,5,1)
2.0
nsim.analyses1.pyeeg.pfd(X, D=None)

Compute Petrosian Fractal Dimension of a time series from either two cases below:

  1. X, the time series of type list (default)
  2. D, the first order differential sequence of X (if D is provided, recommended to speed up)

In case 1, D is computed using Numpy’s difference function.

To speed up, it is recommended to compute D before calling this function because D may also be used by other functions whereas computing it here again will slow down.

nsim.analyses1.pyeeg.samp_entropy(X, M, R)

Computer sample entropy (SampEn) of series X, specified by M and R.

SampEn is very close to ApEn.

Suppose given time series is X = [x(1), x(2), ... , x(N)]. We first build embedding matrix Em, of dimension (N-M+1)-by-M, such that the i-th row of Em is x(i),x(i+1), ... , x(i+M-1). Hence, the embedding lag and dimension are 1 and M-1 respectively. Such a matrix can be built by calling pyeeg function as Em = embed_seq(X, 1, M). Then we build matrix Emp, whose only difference with Em is that the length of each embedding sequence is M + 1

Denote the i-th and j-th row of Em as Em[i] and Em[j]. Their k-th elements are Em[i][k] and Em[j][k] respectively. The distance between Em[i] and Em[j] is defined as 1) the maximum difference of their corresponding scalar components, thus, max(Em[i]-Em[j]), or 2) Euclidean distance. We say two 1-D vectors Em[i] and Em[j] match in tolerance R, if the distance between them is no greater than R, thus, max(Em[i]-Em[j]) <= R. Mostly, the value of R is defined as 20% - 30% of standard deviation of X.

Pick Em[i] as a template, for all j such that 0 < j < N - M , we can check whether Em[j] matches with Em[i]. Denote the number of Em[j], which is in the range of Em[i], as k[i], which is the i-th element of the vector k.

We repeat the same process on Emp and obtained Cmp[i], 0 < i < N - M.

The SampEn is defined as log(sum(Cm)/sum(Cmp))

References

Costa M, Goldberger AL, Peng C-K, Multiscale entropy analysis of biological signals, Physical Review E, 71:021906, 2005

See also

ap_entropy()
approximate entropy of a time series

Notes

Extremely slow computation. Do NOT use if your dataset is not small and you are not patient enough.

nsim.analyses1.pyeeg.spectral_entropy(X, Band, Fs, Power_Ratio=None)

Compute spectral entropy of a time series from either two cases below: 1. X, the time series (default) 2. Power_Ratio, a list of normalized signal power in a set of frequency bins defined in Band (if Power_Ratio is provided, recommended to speed up)

In case 1, Power_Ratio is computed by bin_power() function.

Notes

To speed up, it is recommended to compute Power_Ratio before calling this function because it may also be used by other functions whereas computing it here again will slow down.

Parameters:
  • Band

    list

    boundary frequencies (in Hz) of bins. They can be unequal bins, e.g. [0.5,4,7,12,30] which are delta, theta, alpha and beta respectively. You can also use range() function of Python to generate equal bins and pass the generated list to this function.

    Each element of Band is a physical frequency and shall not exceed the Nyquist frequency, i.e., half of sampling frequency.

    X
    list

    a 1-D real time series.

  • Fs

    integer

    the sampling rate in physical frequency

Returns:

Return type:

As indicated in return line

See also

bin_power()
pyeeg function that computes spectral power in frequency bins
nsim.analyses1.pyeeg.svd_entropy(X, Tau, DE, W=None)

Compute SVD Entropy from either two cases below: 1. a time series X, with lag tau and embedding dimension dE (default) 2. a list, W, of normalized singular values of a matrix (if W is provided, recommend to speed up.)

If W is None, the function will do as follows to prepare singular spectrum:

First, computer an embedding matrix from X, Tau and DE using pyeeg function embed_seq():

M = embed_seq(X, Tau, DE)

Second, use scipy.linalg function svd to decompose the embedding matrix M and obtain a list of singular values:

W = svd(M, compute_uv=0)
At last, normalize W:
W /= sum(W)

To speed up, it is recommended to compute W before calling this function because W may also be used by other functions whereas computing it here again will slow down.

Module contents

nsim.analyses1.psd(ts, plot=True)

plot Welch estimate of power spectral density

nsim.analyses1.lowpass(ts, cutoff_hz, order=3)

forward-backward butterworth low-pass filter

nsim.analyses1.highpass(ts, cutoff_hz, order=3)

forward-backward butterworth high-pass filter

nsim.analyses1.bandpass(ts, low_hz, high_hz, order=3)

forward-backward butterworth band-pass filter

nsim.analyses1.notch(ts, freq_hz, bandwidth_hz=1.0)

notch filter to remove remove a particular frequency Adapted from code by Sturla Molden

nsim.analyses1.hilbert(ts)

Analytic signal, using the Hilbert transform

nsim.analyses1.hilbert_amplitude(ts)

Amplitude of the analytic signal, using the Hilbert transform

nsim.analyses1.hilbert_phase(ts)

Phase of the analytic signal, using the Hilbert transform

nsim.analyses1.cwt(ts, freqs=array([ 1., 1.09854114, 1.20679264, 1.32571137, 1.45634848, 1.59985872, 1.75751062, 1.93069773, 2.12095089, 2.32995181, 2.55954792, 2.8117687, 3.0888436, 3.39322177, 3.72759372, 4.09491506, 4.49843267, 4.94171336, 5.42867544, 5.96362332, 6.55128557, 7.19685673, 7.90604321, 8.68511374, 9.54095476, 10.48113134, 11.51395399, 12.64855217, 13.89495494, 15.26417967, 16.76832937, 18.42069969, 20.23589648, 22.22996483, 24.42053095, 26.82695795, 29.47051703, 32.37457543, 35.56480306, 39.06939937, 42.9193426, 47.14866363, 51.79474679, 56.89866029, 62.50551925, 68.6648845, 75.43120063, 82.86427729, 91.0298178, 100. ]), wavelet=<function cwtmorlet>, plot=True)

Continuous wavelet transform Note the full results can use a huge amount of memory at 64-bit precision

Parameters:
  • ts – Timeseries of m variables, shape (n, m). Assumed constant timestep.
  • freqs – list of frequencies (in Hz) to use for the tranform. (default is 50 frequency bins logarithmic from 1Hz to 100Hz)
  • wavelet – the wavelet to use. may be complex. see scipy.signal.wavelets
  • plot – whether to plot time-resolved power spectrum
Returns:

Continuous wavelet transform output array, shape (n,len(freqs),m)

Return type:

coefs

nsim.analyses1.cwt_distributed(ts, freqs=array([ 1., 1.09854114, 1.20679264, 1.32571137, 1.45634848, 1.59985872, 1.75751062, 1.93069773, 2.12095089, 2.32995181, 2.55954792, 2.8117687, 3.0888436, 3.39322177, 3.72759372, 4.09491506, 4.49843267, 4.94171336, 5.42867544, 5.96362332, 6.55128557, 7.19685673, 7.90604321, 8.68511374, 9.54095476, 10.48113134, 11.51395399, 12.64855217, 13.89495494, 15.26417967, 16.76832937, 18.42069969, 20.23589648, 22.22996483, 24.42053095, 26.82695795, 29.47051703, 32.37457543, 35.56480306, 39.06939937, 42.9193426, 47.14866363, 51.79474679, 56.89866029, 62.50551925, 68.6648845, 75.43120063, 82.86427729, 91.0298178, 100. ]), wavelet=<function cwtmorlet>, plot=True)

Continuous wavelet transform using distributed computation. (Currently just splits the data by channel. TODO split it further.) Note: this function requires an IPython cluster to be started first.

Parameters:
  • ts – Timeseries of m variables, shape (n, m). Assumed constant timestep.
  • freqs – list of frequencies (in Hz) to use for the tranform. (default is 50 frequency bins logarithmic from 1Hz to 100Hz)
  • wavelet – the wavelet to use. may be complex. see scipy.signal.wavelets
  • plot – whether to plot time-resolved power spectrum
Returns:

Continuous wavelet transform output array, shape (n,len(freqs),m)

Return type:

coefs

nsim.analyses1.variability_fp(ts, freqs=None, ncycles=6, plot=True)

Example variability function. Gives two continuous, time-resolved measures of the variability of a time series, ranging between -1 and 1. The two measures are based on variance of the centroid frequency and variance of the height of the spectral peak, respectively. (Centroid frequency meaning the power-weighted average frequency) These measures are calculated over sliding time windows of variable size. See also: Blenkinsop et al. (2012) The dynamic evolution of focal-onset

epilepsies - combining theoretical and clinical observations
Parameters:
  • Timeseries of m variables, shape (n, m). Assumed constant timestep. (ts) –
  • (optional) List of frequencies to examine. If None, defaults to (freqs) – 50 frequency bands ranging 1Hz to 60Hz, logarithmically spaced.
  • Window size, in number of cycles of the centroid frequency. (ncycles) –
  • bool Whether to display the output (plot) –
Returns:

variability Timeseries of shape (n, m, 2) variability[:, :, 0] gives a measure of variability between -1 and 1 based on variance of centroid frequency. variability[:, :, 1] gives a measure of variability between -1 and 1 based on variance of maximum power.

nsim.analyses1.epochs(ts, variability=None, threshold=0.0, minlength=1.0, plot=True)

Identify “stationary” epochs within a time series, based on a continuous measure of variability. Epochs are defined to contain the points of minimal variability, and to extend as wide as possible with variability not exceeding the threshold.

Parameters:
  • Timeseries of m variables, shape (n, m). (ts) –
  • (optional) Timeseries of shape (n, m, q), giving q scalar (variability) – measures of the variability of timeseries ts near each point in time. (if None, we will use variability_fp()) Epochs require the mean of these to be below the threshold.
  • The maximum variability permitted in stationary epochs. (threshold) –
  • Shortest acceptable epoch length (in seconds) (minlength) –
  • bool Whether to display the output (plot) –
Returns: (variability, allchannels_epochs)
variability: as above allchannels_epochs: (list of) list of tuples For each variable, a list of tuples (start, end) that give the starting and ending indices of stationary epochs. (epochs are inclusive of start point but not the end point)
nsim.analyses1.epochs_distributed(ts, variability=None, threshold=0.0, minlength=1.0, plot=True)

Same as epochs(), but computes channels in parallel for speed.

(Note: This requires an IPython cluster to be started first,
e.g. on a workstation type ‘ipcluster start’)

Identify “stationary” epochs within a time series, based on a continuous measure of variability. Epochs are defined to contain the points of minimal variability, and to extend as wide as possible with variability not exceeding the threshold.

Parameters:
  • Timeseries of m variables, shape (n, m). (ts) –
  • (optional) Timeseries of shape (n, m, q), giving q scalar (variability) – measures of the variability of timeseries ts near each point in time. (if None, we will use variability_fp()) Epochs require the mean of these to be below the threshold.
  • The maximum variability permitted in stationary epochs. (threshold) –
  • Shortest acceptable epoch length (in seconds) (minlength) –
  • bool Whether to display the output (plot) –
Returns: (variability, allchannels_epochs)
variability: as above allchannels_epochs: (list of) list of tuples For each variable, a list of tuples (start, end) that give the starting and ending indices of stationary epochs. (epochs are inclusive of start point but not the end point)
nsim.analyses1.epochs_joint(ts, variability=None, threshold=0.0, minlength=1.0, proportion=0.75, plot=True)

Identify epochs within a multivariate time series where at least a certain proportion of channels are “stationary”, based on a previously computed variability measure.

(Note: This requires an IPython cluster to be started first,
e.g. on a workstation type ‘ipcluster start’)
Parameters:
  • Timeseries of m variables, shape (n, m). (ts) –
  • (optional) Timeseries of shape (n, m), giving a scalar (variability) – measure of the variability of timeseries ts near each point in time. (if None, we will use variability_fp())
  • The maximum variability permitted in stationary epochs. (threshold) –
  • Shortest acceptable epoch length (in seconds) (minlength) –
  • Require at least this fraction of channels to be "stationary" (proportion) –
  • bool Whether to display the output (plot) –
Returns: (variability, joint_epochs)
joint_epochs: list of tuples A list of tuples (start, end) that give the starting and ending indices of time epochs that are stationary for at least proportion of channels. (epochs are inclusive of start point but not the end point)
nsim.analyses1.plot(ts, title=None, show=True)

Plot a Timeseries :param ts Timeseries: :param title str: :param show bool whether to display the figure or just return a figure object:

nsim.analyses1.crossing_indices(ts, c=0.0, d=0.0)

For a single variable timeseries, find the time indices each time the value crosses c from above or below. Can optionally set a non-zero d to impose the condition that the value must wander at least d units away from c between crossings.

If the timeseries begins (or ends) exactly at c, then time zero (or the ending time) is also included as a crossing event, so that the boundaries of the first and last excursions are included.

Parameters:
  • ts – Timeseries (single variable)
  • c (float) – Critical value at which to report crossings.
  • d (float) – Optional min distance from c to be attained between crossings.
Returns:

array of indices

nsim.analyses1.first_return_times(ts, c=None, d=0.0)

For a single variable time series, first wait until the time series attains the value c for the first time. Then record the time intervals between successive returns to c. If c is not given, the default is the mean of the time series.

Parameters:
  • ts – Timeseries (single variable)
  • c (float) – Optional target value (default is the mean of the time series)
  • d (float) – Optional min distance from c to be attained between returns
Returns:

array of time intervals (Can take the mean of these to estimate the expected first return time)

nsim.analyses1.autocorrelation(ts, normalized=False, unbiased=False)

Returns the discrete, linear convolution of a time series with itself, optionally using unbiased normalization.

N.B. Autocorrelation estimates are necessarily inaccurate for longer lags, as there are less pairs of points to convolve separated by that lag. Therefore best to throw out the results except for shorter lags, e.g. keep lags from tau=0 up to one quarter of the total time series length.

Parameters:
  • normalized (boolean) – If True, the time series will first be normalized to a mean of 0 and variance of 1. This gives autocorrelation 1 at zero lag.
  • unbiased (boolean) – If True, the result at each lag m will be scaled by 1/(N-m). This gives an unbiased estimation of the autocorrelation of a stationary process from a finite length sample.

Ref: S. J. Orfanidis (1996) “Optimum Signal Processing”, 2nd Ed.

nsim.analyses1.mod2pi(ts)

For a timeseries where all variables represent phases (in radians), return an equivalent timeseries where all values are in the range (-pi, pi]

nsim.analyses1.phase_crossings(ts, phi=0.0)

For a single variable timeseries representing the phase of an oscillator, find the time indices each time the phase crosses angle phi, with the condition that the phase must visit phi+pi between crossings.

(Thus if noise causes the phase to wander back and forth across angle phi without the oscillator doing a full revolution, then this is recorded as a single crossing event, giving the index of the earliest arrival.)

If the timeseries begins (or ends) exactly at phi, then time zero (or the ending time) is also included as a crossing event, so that the boundaries of the first and last oscillations are included.

Parameters:
  • ts – Timeseries (single variable) The timeseries of an angle variable (radians)
  • phi (float) – Critical phase angle (radians) at which to report crossings.
Returns:

array of indices

nsim.analyses1.periods(ts, phi=0.0)

For a single variable timeseries representing the phase of an oscillator, measure the period of each successive oscillation.

An individual oscillation is defined to start and end when the phase passes phi (by default zero) after completing a full cycle.

If the timeseries begins (or ends) exactly at phi, then the first (or last) oscillation will be included.

Parameters:
  • ts – Timeseries (single variable) The timeseries of an angle variable (radians)
  • phi (float) – A single oscillation starts and ends at phase phi (by default zero).
nsim.analyses1.circmean(ts, axis=2)

Circular mean phase

nsim.analyses1.circstd(ts, axis=2)

Circular standard deviation

nsim.analyses1.variability_fp(ts, freqs=None, ncycles=6, plot=True)

Example variability function. Gives two continuous, time-resolved measures of the variability of a time series, ranging between -1 and 1. The two measures are based on variance of the centroid frequency and variance of the height of the spectral peak, respectively. (Centroid frequency meaning the power-weighted average frequency) These measures are calculated over sliding time windows of variable size. See also: Blenkinsop et al. (2012) The dynamic evolution of focal-onset

epilepsies - combining theoretical and clinical observations
Parameters:
  • Timeseries of m variables, shape (n, m). Assumed constant timestep. (ts) –
  • (optional) List of frequencies to examine. If None, defaults to (freqs) – 50 frequency bands ranging 1Hz to 60Hz, logarithmically spaced.
  • Window size, in number of cycles of the centroid frequency. (ncycles) –
  • bool Whether to display the output (plot) –
Returns:

variability Timeseries of shape (n, m, 2) variability[:, :, 0] gives a measure of variability between -1 and 1 based on variance of centroid frequency. variability[:, :, 1] gives a measure of variability between -1 and 1 based on variance of maximum power.

nsim.analyses1.epochs(ts, variability=None, threshold=0.0, minlength=1.0, plot=True)

Identify “stationary” epochs within a time series, based on a continuous measure of variability. Epochs are defined to contain the points of minimal variability, and to extend as wide as possible with variability not exceeding the threshold.

Parameters:
  • Timeseries of m variables, shape (n, m). (ts) –
  • (optional) Timeseries of shape (n, m, q), giving q scalar (variability) – measures of the variability of timeseries ts near each point in time. (if None, we will use variability_fp()) Epochs require the mean of these to be below the threshold.
  • The maximum variability permitted in stationary epochs. (threshold) –
  • Shortest acceptable epoch length (in seconds) (minlength) –
  • bool Whether to display the output (plot) –
Returns: (variability, allchannels_epochs)
variability: as above allchannels_epochs: (list of) list of tuples For each variable, a list of tuples (start, end) that give the starting and ending indices of stationary epochs. (epochs are inclusive of start point but not the end point)
nsim.analyses1.epochs_distributed(ts, variability=None, threshold=0.0, minlength=1.0, plot=True)

Same as epochs(), but computes channels in parallel for speed.

(Note: This requires an IPython cluster to be started first,
e.g. on a workstation type ‘ipcluster start’)

Identify “stationary” epochs within a time series, based on a continuous measure of variability. Epochs are defined to contain the points of minimal variability, and to extend as wide as possible with variability not exceeding the threshold.

Parameters:
  • Timeseries of m variables, shape (n, m). (ts) –
  • (optional) Timeseries of shape (n, m, q), giving q scalar (variability) – measures of the variability of timeseries ts near each point in time. (if None, we will use variability_fp()) Epochs require the mean of these to be below the threshold.
  • The maximum variability permitted in stationary epochs. (threshold) –
  • Shortest acceptable epoch length (in seconds) (minlength) –
  • bool Whether to display the output (plot) –
Returns: (variability, allchannels_epochs)
variability: as above allchannels_epochs: (list of) list of tuples For each variable, a list of tuples (start, end) that give the starting and ending indices of stationary epochs. (epochs are inclusive of start point but not the end point)
nsim.analyses1.epochs_joint(ts, variability=None, threshold=0.0, minlength=1.0, proportion=0.75, plot=True)

Identify epochs within a multivariate time series where at least a certain proportion of channels are “stationary”, based on a previously computed variability measure.

(Note: This requires an IPython cluster to be started first,
e.g. on a workstation type ‘ipcluster start’)
Parameters:
  • Timeseries of m variables, shape (n, m). (ts) –
  • (optional) Timeseries of shape (n, m), giving a scalar (variability) – measure of the variability of timeseries ts near each point in time. (if None, we will use variability_fp())
  • The maximum variability permitted in stationary epochs. (threshold) –
  • Shortest acceptable epoch length (in seconds) (minlength) –
  • Require at least this fraction of channels to be "stationary" (proportion) –
  • bool Whether to display the output (plot) –
Returns: (variability, joint_epochs)
joint_epochs: list of tuples A list of tuples (start, end) that give the starting and ending indices of time epochs that are stationary for at least proportion of channels. (epochs are inclusive of start point but not the end point)
nsim.analyses1.hurst(X)

Compute the Hurst exponent of X. If the output H=0.5,the behavior of the time-series is similar to random walk. If H<0.5, the time-series cover less “distance” than a random walk, vice verse.

Parameters:X

list

a time series

Returns:
  • H

    float

    Hurst exponent

  • Notes
  • ——–
  • Author of this function is Xin Liu

Examples

>>> import pyeeg
>>> from numpy.random import randn
>>> a = randn(4096)
>>> pyeeg.hurst(a)
0.5057444
nsim.analyses1.bin_power(X, Band, Fs)

Compute power in each frequency bin specified by Band from FFT result of X. By default, X is a real signal.

A real signal can be synthesized, thus not real.

Band

list

boundary frequencies (in Hz) of bins. They can be unequal bins, e.g. [0.5,4,7,12,30] which are delta, theta, alpha and beta respectively. You can also use range() function of Python to generate equal bins and pass the generated list to this function.

Each element of Band is a physical frequency and shall not exceed the Nyquist frequency, i.e., half of sampling frequency.

X

list

a 1-D real time series.

Fs

integer

the sampling rate in physical frequency

Returns:
  • Power – list

    spectral power in each frequency bin.

  • Power_ratio – list

    spectral power in each frequency bin normalized by total power in ALL frequency bins.

nsim.analyses1.pfd(X, D=None)

Compute Petrosian Fractal Dimension of a time series from either two cases below:

  1. X, the time series of type list (default)
  2. D, the first order differential sequence of X (if D is provided, recommended to speed up)

In case 1, D is computed using Numpy’s difference function.

To speed up, it is recommended to compute D before calling this function because D may also be used by other functions whereas computing it here again will slow down.

nsim.analyses1.hfd(X, Kmax)

Compute Hjorth Fractal Dimension of a time series X, kmax is an HFD parameter

nsim.analyses1.hjorth(X, D=None)

Compute Hjorth mobility and complexity of a time series from either two cases below:

  1. X, the time series of type list (default)
  2. D, a first order differential sequence of X (if D is provided, recommended to speed up)

In case 1, D is computed using Numpy’s Difference function.

Notes

To speed up, it is recommended to compute D before calling this function because D may also be used by other functions whereas computing it here again will slow down.

Parameters:
  • X

    list

    a time series

  • D

    list

    first order differential sequence of a time series

Returns:

  • As indicated in return line
  • Hjorth mobility and complexity

nsim.analyses1.spectral_entropy(X, Band, Fs, Power_Ratio=None)

Compute spectral entropy of a time series from either two cases below: 1. X, the time series (default) 2. Power_Ratio, a list of normalized signal power in a set of frequency bins defined in Band (if Power_Ratio is provided, recommended to speed up)

In case 1, Power_Ratio is computed by bin_power() function.

Notes

To speed up, it is recommended to compute Power_Ratio before calling this function because it may also be used by other functions whereas computing it here again will slow down.

Parameters:
  • Band

    list

    boundary frequencies (in Hz) of bins. They can be unequal bins, e.g. [0.5,4,7,12,30] which are delta, theta, alpha and beta respectively. You can also use range() function of Python to generate equal bins and pass the generated list to this function.

    Each element of Band is a physical frequency and shall not exceed the Nyquist frequency, i.e., half of sampling frequency.

    X
    list

    a 1-D real time series.

  • Fs

    integer

    the sampling rate in physical frequency

Returns:

Return type:

As indicated in return line

See also

bin_power()
pyeeg function that computes spectral power in frequency bins
nsim.analyses1.svd_entropy(X, Tau, DE, W=None)

Compute SVD Entropy from either two cases below: 1. a time series X, with lag tau and embedding dimension dE (default) 2. a list, W, of normalized singular values of a matrix (if W is provided, recommend to speed up.)

If W is None, the function will do as follows to prepare singular spectrum:

First, computer an embedding matrix from X, Tau and DE using pyeeg function embed_seq():

M = embed_seq(X, Tau, DE)

Second, use scipy.linalg function svd to decompose the embedding matrix M and obtain a list of singular values:

W = svd(M, compute_uv=0)
At last, normalize W:
W /= sum(W)

To speed up, it is recommended to compute W before calling this function because W may also be used by other functions whereas computing it here again will slow down.

nsim.analyses1.fisher_info(X, Tau, DE, W=None)

Compute SVD Entropy from either two cases below: 1. a time series X, with lag tau and embedding dimension dE (default) 2. a list, W, of normalized singular values of a matrix (if W is provided, recommend to speed up.)

If W is None, the function will do as follows to prepare singular spectrum:

First, computer an embedding matrix from X, Tau and DE using pyeeg function embed_seq():

M = embed_seq(X, Tau, DE)

Second, use scipy.linalg function svd to decompose the embedding matrix M and obtain a list of singular values:

W = svd(M, compute_uv=0)
At last, normalize W:
W /= sum(W)

To speed up, it is recommended to compute W before calling this function because W may also be used by other functions whereas computing it here again will slow down.

nsim.analyses1.ap_entropy(X, M, R)

Computer approximate entropy (ApEN) of series X, specified by M and R.

Suppose given time series is X = [x(1), x(2), ... , x(N)]. We first build embedding matrix Em, of dimension (N-M+1)-by-M, such that the i-th row of Em is x(i),x(i+1), ... , x(i+M-1). Hence, the embedding lag and dimension are 1 and M-1 respectively. Such a matrix can be built by calling pyeeg function as Em = embed_seq(X, 1, M). Then we build matrix Emp, whose only difference with Em is that the length of each embedding sequence is M + 1

Denote the i-th and j-th row of Em as Em[i] and Em[j]. Their k-th elements are Em[i][k] and Em[j][k] respectively. The distance between Em[i] and Em[j] is defined as 1) the maximum difference of their corresponding scalar components, thus, max(Em[i]-Em[j]), or 2) Euclidean distance. We say two 1-D vectors Em[i] and Em[j] match in tolerance R, if the distance between them is no greater than R, thus, max(Em[i]-Em[j]) <= R. Mostly, the value of R is defined as 20% - 30% of standard deviation of X.

Pick Em[i] as a template, for all j such that 0 < j < N - M + 1, we can check whether Em[j] matches with Em[i]. Denote the number of Em[j], which is in the range of Em[i], as k[i], which is the i-th element of the vector k. The probability that a random row in Em matches Em[i] is simga_1^{N-M+1} k[i] / (N - M + 1), thus sum(k)/ (N - M + 1), denoted as Cm[i].

We repeat the same process on Emp and obtained Cmp[i], but here 0<i<N-M since the length of each sequence in Emp is M + 1.

The probability that any two embedding sequences in Em match is then sum(Cm)/ (N - M +1 ). We define Phi_m = sum(log(Cm)) / (N - M + 1) and Phi_mp = sum(log(Cmp)) / (N - M ).

And the ApEn is defined as Phi_m - Phi_mp.

Notes

  1. Please be aware that self-match is also counted in ApEn.
  2. This function now runs very slow. We are still trying to speed it up.

References

Costa M, Goldberger AL, Peng CK, Multiscale entropy analysis of biological signals, Physical Review E, 71:021906, 2005

See also

samp_entropy()
sample entropy of a time series

Notes

Extremely slow implementation. Do NOT use if your dataset is not small.

nsim.analyses1.samp_entropy(X, M, R)

Computer sample entropy (SampEn) of series X, specified by M and R.

SampEn is very close to ApEn.

Suppose given time series is X = [x(1), x(2), ... , x(N)]. We first build embedding matrix Em, of dimension (N-M+1)-by-M, such that the i-th row of Em is x(i),x(i+1), ... , x(i+M-1). Hence, the embedding lag and dimension are 1 and M-1 respectively. Such a matrix can be built by calling pyeeg function as Em = embed_seq(X, 1, M). Then we build matrix Emp, whose only difference with Em is that the length of each embedding sequence is M + 1

Denote the i-th and j-th row of Em as Em[i] and Em[j]. Their k-th elements are Em[i][k] and Em[j][k] respectively. The distance between Em[i] and Em[j] is defined as 1) the maximum difference of their corresponding scalar components, thus, max(Em[i]-Em[j]), or 2) Euclidean distance. We say two 1-D vectors Em[i] and Em[j] match in tolerance R, if the distance between them is no greater than R, thus, max(Em[i]-Em[j]) <= R. Mostly, the value of R is defined as 20% - 30% of standard deviation of X.

Pick Em[i] as a template, for all j such that 0 < j < N - M , we can check whether Em[j] matches with Em[i]. Denote the number of Em[j], which is in the range of Em[i], as k[i], which is the i-th element of the vector k.

We repeat the same process on Emp and obtained Cmp[i], 0 < i < N - M.

The SampEn is defined as log(sum(Cm)/sum(Cmp))

References

Costa M, Goldberger AL, Peng C-K, Multiscale entropy analysis of biological signals, Physical Review E, 71:021906, 2005

See also

ap_entropy()
approximate entropy of a time series

Notes

Extremely slow computation. Do NOT use if your dataset is not small and you are not patient enough.

nsim.analyses1.dfa(X, Ave=None, L=None)

Compute Detrended Fluctuation Analysis from a time series X and length of boxes L.

The first step to compute DFA is to integrate the signal. Let original series be X= [x(1), x(2), ..., x(N)].

The integrated signal Y = [y(1), y(2), ..., y(N)] is obtained as follows y(k) = sum_{i=1}^{k}{x(i)-Ave} where Ave is the mean of X.

The second step is to partition/slice/segment the integrated sequence Y into boxes. At least two boxes are needed for computing DFA. Box sizes are specified by the L argument of this function. By default, it is from 1/5 of signal length to one (x-5)-th of the signal length, where x is the nearest power of 2 from the length of the signal, i.e., 1/16, 1/32, 1/64, 1/128, ...

In each box, a linear least square fitting is employed on data in the box. Denote the series on fitted line as Yn. Its k-th elements, yn(k), corresponds to y(k).

For fitting in each box, there is a residue, the sum of squares of all offsets, difference between actual points and points on fitted line.

F(n) denotes the square root of average total residue in all boxes when box length is n, thus Total_Residue = sum_{k=1}^{N}{(y(k)-yn(k))} F(n) = sqrt(Total_Residue/N)

The computing to F(n) is carried out for every box length n. Therefore, a relationship between n and F(n) can be obtained. In general, F(n) increases when n increases.

Finally, the relationship between F(n) and n is analyzed. A least square fitting is performed between log(F(n)) and log(n). The slope of the fitting line is the DFA value, denoted as Alpha. To white noise, Alpha should be 0.5. Higher level of signal complexity is related to higher Alpha.

Parameters:
  • X – 1-D Python list or numpy array a time series
  • Ave – integer, optional The average value of the time series
  • L – 1-D Python list of integers A list of box size, integers in ascending order
Returns:

integer the result of DFA analysis, thus the slope of fitting line of log(F(n)) vs. log(n). where n is the

Return type:

Alpha

Examples

>>> import pyeeg
>>> from numpy.random import randn
>>> print pyeeg.dfa(randn(4096))
0.490035110345

Peng C-K, Havlin S, Stanley HE, Goldberger AL. Quantification of scaling exponents and crossover phenomena in nonstationary heartbeat time series. _Chaos_ 1995;5:82-87

Notes

This value depends on the box sizes very much. When the input is a white noise, this value should be 0.5. But, some choices on box sizes can lead to the value lower or higher than 0.5, e.g. 0.38 or 0.58.

Based on many test, I set the box sizes from 1/5 of signal length to one (x-5)-th of the signal length, where x is the nearest power of 2 from the length of the signal, i.e., 1/16, 1/32, 1/64, 1/128, ...

You may generate a list of box sizes and pass in such a list as a parameter.