nsim.analyses1 package¶
Submodules¶
nsim.analyses1.epochs module¶
Functions to identify pseudo-stationary epochs, where a timeseries has low ‘variability’. There is flexibility to give your own function to define ‘variability’ in the way you want, or you can use the default.
An example of a ‘variability’ function is given based on Blenkinsop2012, using centroid frequency and height of spectral peak to define ‘variability’.
- functions:
- epochs Find pseudo-stationary epochs, based on any defn of ‘variability’. epochs_distributed Same as above, but compute all channels in parallel. epochs_joint Find epochs where x% of channels are simultaneously stationary. variability_fp Example 2D variability function, extending Blenkinsop2012.
- See also: Blenkinsop et al. (2012) The dynamic evolution of focal-onset
- epilepsies - combining theoretical and clinical observations
-
nsim.analyses1.epochs.epochs(ts, variability=None, threshold=0.0, minlength=1.0, plot=True)¶ Identify “stationary” epochs within a time series, based on a continuous measure of variability. Epochs are defined to contain the points of minimal variability, and to extend as wide as possible with variability not exceeding the threshold.
Parameters: - Timeseries of m variables, shape (n, m). (ts) –
- (optional) Timeseries of shape (n, m, q), giving q scalar (variability) – measures of the variability of timeseries ts near each point in time. (if None, we will use variability_fp()) Epochs require the mean of these to be below the threshold.
- The maximum variability permitted in stationary epochs. (threshold) –
- Shortest acceptable epoch length (in seconds) (minlength) –
- bool Whether to display the output (plot) –
- Returns: (variability, allchannels_epochs)
- variability: as above allchannels_epochs: (list of) list of tuples For each variable, a list of tuples (start, end) that give the starting and ending indices of stationary epochs. (epochs are inclusive of start point but not the end point)
-
nsim.analyses1.epochs.epochs_distributed(ts, variability=None, threshold=0.0, minlength=1.0, plot=True)¶ Same as epochs(), but computes channels in parallel for speed.
- (Note: This requires an IPython cluster to be started first,
- e.g. on a workstation type ‘ipcluster start’)
Identify “stationary” epochs within a time series, based on a continuous measure of variability. Epochs are defined to contain the points of minimal variability, and to extend as wide as possible with variability not exceeding the threshold.
Parameters: - Timeseries of m variables, shape (n, m). (ts) –
- (optional) Timeseries of shape (n, m, q), giving q scalar (variability) – measures of the variability of timeseries ts near each point in time. (if None, we will use variability_fp()) Epochs require the mean of these to be below the threshold.
- The maximum variability permitted in stationary epochs. (threshold) –
- Shortest acceptable epoch length (in seconds) (minlength) –
- bool Whether to display the output (plot) –
- Returns: (variability, allchannels_epochs)
- variability: as above allchannels_epochs: (list of) list of tuples For each variable, a list of tuples (start, end) that give the starting and ending indices of stationary epochs. (epochs are inclusive of start point but not the end point)
-
nsim.analyses1.epochs.epochs_joint(ts, variability=None, threshold=0.0, minlength=1.0, proportion=0.75, plot=True)¶ Identify epochs within a multivariate time series where at least a certain proportion of channels are “stationary”, based on a previously computed variability measure.
- (Note: This requires an IPython cluster to be started first,
- e.g. on a workstation type ‘ipcluster start’)
Parameters: - Timeseries of m variables, shape (n, m). (ts) –
- (optional) Timeseries of shape (n, m), giving a scalar (variability) – measure of the variability of timeseries ts near each point in time. (if None, we will use variability_fp())
- The maximum variability permitted in stationary epochs. (threshold) –
- Shortest acceptable epoch length (in seconds) (minlength) –
- Require at least this fraction of channels to be "stationary" (proportion) –
- bool Whether to display the output (plot) –
- Returns: (variability, joint_epochs)
- joint_epochs: list of tuples A list of tuples (start, end) that give the starting and ending indices of time epochs that are stationary for at least proportion of channels. (epochs are inclusive of start point but not the end point)
-
nsim.analyses1.epochs.variability_fp(ts, freqs=None, ncycles=6, plot=True)¶ Example variability function. Gives two continuous, time-resolved measures of the variability of a time series, ranging between -1 and 1. The two measures are based on variance of the centroid frequency and variance of the height of the spectral peak, respectively. (Centroid frequency meaning the power-weighted average frequency) These measures are calculated over sliding time windows of variable size. See also: Blenkinsop et al. (2012) The dynamic evolution of focal-onset
epilepsies - combining theoretical and clinical observationsParameters: - Timeseries of m variables, shape (n, m). Assumed constant timestep. (ts) –
- (optional) List of frequencies to examine. If None, defaults to (freqs) – 50 frequency bands ranging 1Hz to 60Hz, logarithmically spaced.
- Window size, in number of cycles of the centroid frequency. (ncycles) –
- bool Whether to display the output (plot) –
Returns: variability Timeseries of shape (n, m, 2) variability[:, :, 0] gives a measure of variability between -1 and 1 based on variance of centroid frequency. variability[:, :, 1] gives a measure of variability between -1 and 1 based on variance of maximum power.
nsim.analyses1.freq module¶
Various frequency domain analyses that apply to a single time series.
- functions:
- psd lowpass highpass bandpass notch hilbert hilbert_amplitude hilbert_phase epochs cwt cwt_distributed
-
nsim.analyses1.freq.bandpass(ts, low_hz, high_hz, order=3)¶ forward-backward butterworth band-pass filter
-
nsim.analyses1.freq.cwt(ts, freqs=array([ 1., 1.09854114, 1.20679264, 1.32571137, 1.45634848, 1.59985872, 1.75751062, 1.93069773, 2.12095089, 2.32995181, 2.55954792, 2.8117687, 3.0888436, 3.39322177, 3.72759372, 4.09491506, 4.49843267, 4.94171336, 5.42867544, 5.96362332, 6.55128557, 7.19685673, 7.90604321, 8.68511374, 9.54095476, 10.48113134, 11.51395399, 12.64855217, 13.89495494, 15.26417967, 16.76832937, 18.42069969, 20.23589648, 22.22996483, 24.42053095, 26.82695795, 29.47051703, 32.37457543, 35.56480306, 39.06939937, 42.9193426, 47.14866363, 51.79474679, 56.89866029, 62.50551925, 68.6648845, 75.43120063, 82.86427729, 91.0298178, 100. ]), wavelet=<function cwtmorlet>, plot=True)¶ Continuous wavelet transform Note the full results can use a huge amount of memory at 64-bit precision
Parameters: - ts – Timeseries of m variables, shape (n, m). Assumed constant timestep.
- freqs – list of frequencies (in Hz) to use for the tranform. (default is 50 frequency bins logarithmic from 1Hz to 100Hz)
- wavelet – the wavelet to use. may be complex. see scipy.signal.wavelets
- plot – whether to plot time-resolved power spectrum
Returns: Continuous wavelet transform output array, shape (n,len(freqs),m)
Return type: coefs
-
nsim.analyses1.freq.cwt_distributed(ts, freqs=array([ 1., 1.09854114, 1.20679264, 1.32571137, 1.45634848, 1.59985872, 1.75751062, 1.93069773, 2.12095089, 2.32995181, 2.55954792, 2.8117687, 3.0888436, 3.39322177, 3.72759372, 4.09491506, 4.49843267, 4.94171336, 5.42867544, 5.96362332, 6.55128557, 7.19685673, 7.90604321, 8.68511374, 9.54095476, 10.48113134, 11.51395399, 12.64855217, 13.89495494, 15.26417967, 16.76832937, 18.42069969, 20.23589648, 22.22996483, 24.42053095, 26.82695795, 29.47051703, 32.37457543, 35.56480306, 39.06939937, 42.9193426, 47.14866363, 51.79474679, 56.89866029, 62.50551925, 68.6648845, 75.43120063, 82.86427729, 91.0298178, 100. ]), wavelet=<function cwtmorlet>, plot=True)¶ Continuous wavelet transform using distributed computation. (Currently just splits the data by channel. TODO split it further.) Note: this function requires an IPython cluster to be started first.
Parameters: - ts – Timeseries of m variables, shape (n, m). Assumed constant timestep.
- freqs – list of frequencies (in Hz) to use for the tranform. (default is 50 frequency bins logarithmic from 1Hz to 100Hz)
- wavelet – the wavelet to use. may be complex. see scipy.signal.wavelets
- plot – whether to plot time-resolved power spectrum
Returns: Continuous wavelet transform output array, shape (n,len(freqs),m)
Return type: coefs
-
nsim.analyses1.freq.highpass(ts, cutoff_hz, order=3)¶ forward-backward butterworth high-pass filter
-
nsim.analyses1.freq.hilbert(ts)¶ Analytic signal, using the Hilbert transform
-
nsim.analyses1.freq.hilbert_amplitude(ts)¶ Amplitude of the analytic signal, using the Hilbert transform
-
nsim.analyses1.freq.hilbert_phase(ts)¶ Phase of the analytic signal, using the Hilbert transform
-
nsim.analyses1.freq.lowpass(ts, cutoff_hz, order=3)¶ forward-backward butterworth low-pass filter
-
nsim.analyses1.freq.notch(ts, freq_hz, bandwidth_hz=1.0)¶ notch filter to remove remove a particular frequency Adapted from code by Sturla Molden
-
nsim.analyses1.freq.psd(ts, plot=True)¶ plot Welch estimate of power spectral density
nsim.analyses1.misc module¶
functions:
-
nsim.analyses1.misc.autocorrelation(ts, normalized=False, unbiased=False)¶ Returns the discrete, linear convolution of a time series with itself, optionally using unbiased normalization.
N.B. Autocorrelation estimates are necessarily inaccurate for longer lags, as there are less pairs of points to convolve separated by that lag. Therefore best to throw out the results except for shorter lags, e.g. keep lags from tau=0 up to one quarter of the total time series length.
Parameters: - normalized (boolean) – If True, the time series will first be normalized to a mean of 0 and variance of 1. This gives autocorrelation 1 at zero lag.
- unbiased (boolean) – If True, the result at each lag m will be scaled by 1/(N-m). This gives an unbiased estimation of the autocorrelation of a stationary process from a finite length sample.
Ref: S. J. Orfanidis (1996) “Optimum Signal Processing”, 2nd Ed.
-
nsim.analyses1.misc.crossing_indices(ts, c=0.0, d=0.0)¶ For a single variable timeseries, find the time indices each time the value crosses
cfrom above or below. Can optionally set a non-zerodto impose the condition that the value must wander at leastdunits away fromcbetween crossings.If the timeseries begins (or ends) exactly at
c, then time zero (or the ending time) is also included as a crossing event, so that the boundaries of the first and last excursions are included.Parameters: - ts – Timeseries (single variable)
- c (float) – Critical value at which to report crossings.
- d (float) – Optional min distance from c to be attained between crossings.
Returns: array of indices
-
nsim.analyses1.misc.first_return_times(ts, c=None, d=0.0)¶ For a single variable time series, first wait until the time series attains the value c for the first time. Then record the time intervals between successive returns to c. If c is not given, the default is the mean of the time series.
Parameters: - ts – Timeseries (single variable)
- c (float) – Optional target value (default is the mean of the time series)
- d (float) – Optional min distance from c to be attained between returns
Returns: array of time intervals (Can take the mean of these to estimate the expected first return time)
nsim.analyses1.phase module¶
Various analyses that apply to the phase time-series of an oscillator
functions:
-
nsim.analyses1.phase.circmean(ts, axis=2)¶ Circular mean phase
-
nsim.analyses1.phase.circstd(ts, axis=2)¶ Circular standard deviation
-
nsim.analyses1.phase.mod2pi(ts)¶ For a timeseries where all variables represent phases (in radians), return an equivalent timeseries where all values are in the range (-pi, pi]
-
nsim.analyses1.phase.periods(ts, phi=0.0)¶ For a single variable timeseries representing the phase of an oscillator, measure the period of each successive oscillation.
An individual oscillation is defined to start and end when the phase passes phi (by default zero) after completing a full cycle.
If the timeseries begins (or ends) exactly at phi, then the first (or last) oscillation will be included.
Parameters: - ts – Timeseries (single variable) The timeseries of an angle variable (radians)
- phi (float) – A single oscillation starts and ends at phase phi (by default zero).
-
nsim.analyses1.phase.phase_crossings(ts, phi=0.0)¶ For a single variable timeseries representing the phase of an oscillator, find the time indices each time the phase crosses angle phi, with the condition that the phase must visit phi+pi between crossings.
(Thus if noise causes the phase to wander back and forth across angle phi without the oscillator doing a full revolution, then this is recorded as a single crossing event, giving the index of the earliest arrival.)
If the timeseries begins (or ends) exactly at phi, then time zero (or the ending time) is also included as a crossing event, so that the boundaries of the first and last oscillations are included.
Parameters: - ts – Timeseries (single variable) The timeseries of an angle variable (radians)
- phi (float) – Critical phase angle (radians) at which to report crossings.
Returns: array of indices
nsim.analyses1.plots module¶
Various plotting routines for time series
- functions:
- plot
-
nsim.analyses1.plots.plot(ts, title=None, show=True)¶ Plot a Timeseries :param ts Timeseries: :param title str: :param show bool whether to display the figure or just return a figure object:
nsim.analyses1.pyeeg module¶
- Copyleft 2010-2015 Forrest Sheng Bao http://fsbao.net
- Copyleft 2010 Xin Liu Copyleft 2014-2015 Borzou Alipour Fard
PyEEG, a Python module to extract EEG feature.
Project homepage: http://pyeeg.org
Data structure
PyEEG only uses standard Python and numpy data structures, so you need to import numpy before using it. For numpy, please visit http://numpy.scipy.org
Naming convention
I follow “Style Guide for Python Code” to code my program http://www.python.org/dev/peps/pep-0008/
Constants: UPPER_CASE_WITH_UNDERSCORES, e.g., SAMPLING_RATE, LENGTH_SIGNAL.
Function names: lower_case_with_underscores, e.g., spectrum_entropy.
Variables (global and local): CapitalizedWords or CapWords, e.g., Power.
If a variable name consists of one letter, I may use lower case, e.g., x, y.
Functions listed alphabetically¶
-
nsim.analyses1.pyeeg.ap_entropy(X, M, R)¶ Computer approximate entropy (ApEN) of series X, specified by M and R.
Suppose given time series is X = [x(1), x(2), ... , x(N)]. We first build embedding matrix Em, of dimension (N-M+1)-by-M, such that the i-th row of Em is x(i),x(i+1), ... , x(i+M-1). Hence, the embedding lag and dimension are 1 and M-1 respectively. Such a matrix can be built by calling pyeeg function as Em = embed_seq(X, 1, M). Then we build matrix Emp, whose only difference with Em is that the length of each embedding sequence is M + 1
Denote the i-th and j-th row of Em as Em[i] and Em[j]. Their k-th elements are Em[i][k] and Em[j][k] respectively. The distance between Em[i] and Em[j] is defined as 1) the maximum difference of their corresponding scalar components, thus, max(Em[i]-Em[j]), or 2) Euclidean distance. We say two 1-D vectors Em[i] and Em[j] match in tolerance R, if the distance between them is no greater than R, thus, max(Em[i]-Em[j]) <= R. Mostly, the value of R is defined as 20% - 30% of standard deviation of X.
Pick Em[i] as a template, for all j such that 0 < j < N - M + 1, we can check whether Em[j] matches with Em[i]. Denote the number of Em[j], which is in the range of Em[i], as k[i], which is the i-th element of the vector k. The probability that a random row in Em matches Em[i] is simga_1^{N-M+1} k[i] / (N - M + 1), thus sum(k)/ (N - M + 1), denoted as Cm[i].
We repeat the same process on Emp and obtained Cmp[i], but here 0<i<N-M since the length of each sequence in Emp is M + 1.
The probability that any two embedding sequences in Em match is then sum(Cm)/ (N - M +1 ). We define Phi_m = sum(log(Cm)) / (N - M + 1) and Phi_mp = sum(log(Cmp)) / (N - M ).
And the ApEn is defined as Phi_m - Phi_mp.
Notes
- Please be aware that self-match is also counted in ApEn.
- This function now runs very slow. We are still trying to speed it up.
References
Costa M, Goldberger AL, Peng CK, Multiscale entropy analysis of biological signals, Physical Review E, 71:021906, 2005
See also
samp_entropy()- sample entropy of a time series
Notes
Extremely slow implementation. Do NOT use if your dataset is not small.
-
nsim.analyses1.pyeeg.bin_power(X, Band, Fs)¶ Compute power in each frequency bin specified by Band from FFT result of X. By default, X is a real signal.
A real signal can be synthesized, thus not real.
- Band
list
boundary frequencies (in Hz) of bins. They can be unequal bins, e.g. [0.5,4,7,12,30] which are delta, theta, alpha and beta respectively. You can also use range() function of Python to generate equal bins and pass the generated list to this function.
Each element of Band is a physical frequency and shall not exceed the Nyquist frequency, i.e., half of sampling frequency.
- X
list
a 1-D real time series.
- Fs
integer
the sampling rate in physical frequency
Returns: - Power –
list
spectral power in each frequency bin.
- Power_ratio –
list
spectral power in each frequency bin normalized by total power in ALL frequency bins.
-
nsim.analyses1.pyeeg.dfa(X, Ave=None, L=None)¶ Compute Detrended Fluctuation Analysis from a time series X and length of boxes L.
The first step to compute DFA is to integrate the signal. Let original series be X= [x(1), x(2), ..., x(N)].
The integrated signal Y = [y(1), y(2), ..., y(N)] is obtained as follows y(k) = sum_{i=1}^{k}{x(i)-Ave} where Ave is the mean of X.
The second step is to partition/slice/segment the integrated sequence Y into boxes. At least two boxes are needed for computing DFA. Box sizes are specified by the L argument of this function. By default, it is from 1/5 of signal length to one (x-5)-th of the signal length, where x is the nearest power of 2 from the length of the signal, i.e., 1/16, 1/32, 1/64, 1/128, ...
In each box, a linear least square fitting is employed on data in the box. Denote the series on fitted line as Yn. Its k-th elements, yn(k), corresponds to y(k).
For fitting in each box, there is a residue, the sum of squares of all offsets, difference between actual points and points on fitted line.
F(n) denotes the square root of average total residue in all boxes when box length is n, thus Total_Residue = sum_{k=1}^{N}{(y(k)-yn(k))} F(n) = sqrt(Total_Residue/N)
The computing to F(n) is carried out for every box length n. Therefore, a relationship between n and F(n) can be obtained. In general, F(n) increases when n increases.
Finally, the relationship between F(n) and n is analyzed. A least square fitting is performed between log(F(n)) and log(n). The slope of the fitting line is the DFA value, denoted as Alpha. To white noise, Alpha should be 0.5. Higher level of signal complexity is related to higher Alpha.
Parameters: - X – 1-D Python list or numpy array a time series
- Ave – integer, optional The average value of the time series
- L – 1-D Python list of integers A list of box size, integers in ascending order
Returns: integer the result of DFA analysis, thus the slope of fitting line of log(F(n)) vs. log(n). where n is the
Return type: Alpha
Examples
>>> import pyeeg >>> from numpy.random import randn >>> print pyeeg.dfa(randn(4096)) 0.490035110345
Peng C-K, Havlin S, Stanley HE, Goldberger AL. Quantification of scaling exponents and crossover phenomena in nonstationary heartbeat time series. _Chaos_ 1995;5:82-87
Notes
This value depends on the box sizes very much. When the input is a white noise, this value should be 0.5. But, some choices on box sizes can lead to the value lower or higher than 0.5, e.g. 0.38 or 0.58.
Based on many test, I set the box sizes from 1/5 of signal length to one (x-5)-th of the signal length, where x is the nearest power of 2 from the length of the signal, i.e., 1/16, 1/32, 1/64, 1/128, ...
You may generate a list of box sizes and pass in such a list as a parameter.
-
nsim.analyses1.pyeeg.embed_seq(X, Tau, D)¶ Build a set of embedding sequences from given time series X with lag Tau and embedding dimension DE. Let X = [x(1), x(2), ... , x(N)], then for each i such that 1 < i < N - (D - 1) * Tau, we build an embedding sequence, Y(i) = [x(i), x(i + Tau), ... , x(i + (D - 1) * Tau)]. All embedding sequence are placed in a matrix Y.
Parameters: - X –
list
a time series
- Tau –
integer
the lag or delay when building embedding sequence
- D –
integer
the embedding dimension
Returns: Y – 2-D list
embedding matrix built
Examples
—————
>>> import pyeeg
>>> a=range(0,9)
>>> pyeeg.embed_seq(a,1,4)
array([[ 0., 1., 2., 3.], – [ 1., 2., 3., 4.], [ 2., 3., 4., 5.], [ 3., 4., 5., 6.], [ 4., 5., 6., 7.], [ 5., 6., 7., 8.]])
>>> pyeeg.embed_seq(a,2,3)
array([[ 0., 2., 4.], – [ 1., 3., 5.], [ 2., 4., 6.], [ 3., 5., 7.], [ 4., 6., 8.]])
>>> pyeeg.embed_seq(a,4,1)
array([[ 0.], – [ 1.], [ 2.], [ 3.], [ 4.], [ 5.], [ 6.], [ 7.], [ 8.]])
- X –
-
nsim.analyses1.pyeeg.fisher_info(X, Tau, DE, W=None)¶ Compute SVD Entropy from either two cases below: 1. a time series X, with lag tau and embedding dimension dE (default) 2. a list, W, of normalized singular values of a matrix (if W is provided, recommend to speed up.)
If W is None, the function will do as follows to prepare singular spectrum:
First, computer an embedding matrix from X, Tau and DE using pyeeg function embed_seq():
M = embed_seq(X, Tau, DE)Second, use scipy.linalg function svd to decompose the embedding matrix M and obtain a list of singular values:
W = svd(M, compute_uv=0)- At last, normalize W:
- W /= sum(W)
To speed up, it is recommended to compute W before calling this function because W may also be used by other functions whereas computing it here again will slow down.
-
nsim.analyses1.pyeeg.hfd(X, Kmax)¶ Compute Hjorth Fractal Dimension of a time series X, kmax is an HFD parameter
-
nsim.analyses1.pyeeg.hjorth(X, D=None)¶ Compute Hjorth mobility and complexity of a time series from either two cases below:
- X, the time series of type list (default)
- D, a first order differential sequence of X (if D is provided, recommended to speed up)
In case 1, D is computed using Numpy’s Difference function.
Notes
To speed up, it is recommended to compute D before calling this function because D may also be used by other functions whereas computing it here again will slow down.
Parameters: - X –
list
a time series
- D –
list
first order differential sequence of a time series
Returns: - As indicated in return line
- Hjorth mobility and complexity
-
nsim.analyses1.pyeeg.hurst(X)¶ Compute the Hurst exponent of X. If the output H=0.5,the behavior of the time-series is similar to random walk. If H<0.5, the time-series cover less “distance” than a random walk, vice verse.
Parameters: X – list
a time series
Returns: - H –
float
Hurst exponent
- Notes
- ——–
- Author of this function is Xin Liu
Examples
>>> import pyeeg >>> from numpy.random import randn >>> a = randn(4096) >>> pyeeg.hurst(a) 0.5057444
- H –
-
nsim.analyses1.pyeeg.in_range(Template, Scroll, Distance)¶ Determines whether one vector is the the range of another vector.
The two vectors should have equal length.
- Template
- list The template vector, one of two vectors being compared
- Scroll
- list The scroll vector, one of the two vectors being compared
- Distance
- float Two vectors match if their distance is less than D
The distance between two vectors can be defined as Euclidean distance according to some publications.
The two vector should of equal length
-
nsim.analyses1.pyeeg.information_based_similarity(x, y, n)¶ Calculates the information based similarity of two time series x and y.
Parameters: - x –
list
a time series
- y –
list
a time series
- n –
integer
word order
IBS
float
Information based similarity
Information based similarity is a measure of dissimilarity between two time series. Let the sequences be x and y. Each sequence is first replaced by its first ordered difference(Encoder). Calculating the Heaviside of the resulting sequences, we get two binary sequences, SymbolicSeq. Using PyEEG function, embed_seq, with lag of 1 and dimension of n, we build an embedding matrix from the latter sequence.
Each row of this embedding matrix is called a word. Information based similarity measures the distance between two sequence by comparing the rank of words in the sequences; more explicitly, the distance, D, is calculated using the formula:
“1/2^(n-1) * sum( abs(Rank(0)(k)-R(1)(k)) * F(k) )” where Rank(0)(k) and Rank(1)(k) are the rank of the k-th word in each of the input sequences. F(k) is a modified “shannon” weighing function that increases the weight of each word in the calculations when they are more frequent in the sequences.
It is advisable to calculate IBS for numerical sequences using 8-tupple words.
References
Yang AC, Hseu SS, Yien HW, Goldberger AL, Peng CK: Linguistic analysis of the human heartbeat using frequency and rank order statistics. Phys Rev Lett 2003, 90: 108103
>>> import pyeeg >>> from numpy.random import randn >>> x = randn(100) >>> y = randn(100) >>> pyeeg.information_based_similarity(x,y,8) 0.64512947848249214
- x –
-
nsim.analyses1.pyeeg.permutation_entropy(x, n, tau)¶ - Compute Permutation Entropy of a given time series x, specified by
permutation order n and embedding lag tau.
- x
list
a time series
- n
integer
Permutation order
- tau
integer
Embedding lag
- PE
float
permutation entropy
Suppose the given time series is X =[x(1),x(2),x(3),
- x(N)].
We first build embedding matrix Em, of dimension(n*N-n+1), such that the ith row of Em is x(i),x(i+1),..x(i+n-1). Hence the embedding lag and the embedding dimension are 1 and n respectively. We build this matrix from a given time series, X, by calling pyEEg function embed_seq(x,1,n).
We then transform each row of the embedding matrix into a new sequence, comprising a set of integers in range of 0,..,n-1. The order in which the integers are placed within a row is the same as those of the original elements:0 is placed where the smallest element of the row was and n-1 replaces the largest element of the row.
To calculate the Permutation entropy, we calculate the entropy of PeSeq. In doing so, we count the number of occurrences of each permutation in PeSeq and write it in a sequence, RankMat. We then use this sequence to calculate entropy by using Shanons entropy formula.
Permutation entropy is usually calculated with n in range of 3 and 7.
Bandt, Christoph, and Bernd Pompe. “Permutation entropy: a natural complexity measure for time series.” Physical Review Letters 88.17 (2002): 174102.
>>> import pyeeg >>> x = [1,2,4,5,12,3,4,5] >>> pyeeg.permutation_entropy(x,5,1) 2.0
-
nsim.analyses1.pyeeg.pfd(X, D=None)¶ Compute Petrosian Fractal Dimension of a time series from either two cases below:
- X, the time series of type list (default)
- D, the first order differential sequence of X (if D is provided, recommended to speed up)
In case 1, D is computed using Numpy’s difference function.
To speed up, it is recommended to compute D before calling this function because D may also be used by other functions whereas computing it here again will slow down.
-
nsim.analyses1.pyeeg.samp_entropy(X, M, R)¶ Computer sample entropy (SampEn) of series X, specified by M and R.
SampEn is very close to ApEn.
Suppose given time series is X = [x(1), x(2), ... , x(N)]. We first build embedding matrix Em, of dimension (N-M+1)-by-M, such that the i-th row of Em is x(i),x(i+1), ... , x(i+M-1). Hence, the embedding lag and dimension are 1 and M-1 respectively. Such a matrix can be built by calling pyeeg function as Em = embed_seq(X, 1, M). Then we build matrix Emp, whose only difference with Em is that the length of each embedding sequence is M + 1
Denote the i-th and j-th row of Em as Em[i] and Em[j]. Their k-th elements are Em[i][k] and Em[j][k] respectively. The distance between Em[i] and Em[j] is defined as 1) the maximum difference of their corresponding scalar components, thus, max(Em[i]-Em[j]), or 2) Euclidean distance. We say two 1-D vectors Em[i] and Em[j] match in tolerance R, if the distance between them is no greater than R, thus, max(Em[i]-Em[j]) <= R. Mostly, the value of R is defined as 20% - 30% of standard deviation of X.
Pick Em[i] as a template, for all j such that 0 < j < N - M , we can check whether Em[j] matches with Em[i]. Denote the number of Em[j], which is in the range of Em[i], as k[i], which is the i-th element of the vector k.
We repeat the same process on Emp and obtained Cmp[i], 0 < i < N - M.
The SampEn is defined as log(sum(Cm)/sum(Cmp))
References
Costa M, Goldberger AL, Peng C-K, Multiscale entropy analysis of biological signals, Physical Review E, 71:021906, 2005
See also
ap_entropy()- approximate entropy of a time series
Notes
Extremely slow computation. Do NOT use if your dataset is not small and you are not patient enough.
-
nsim.analyses1.pyeeg.spectral_entropy(X, Band, Fs, Power_Ratio=None)¶ Compute spectral entropy of a time series from either two cases below: 1. X, the time series (default) 2. Power_Ratio, a list of normalized signal power in a set of frequency bins defined in Band (if Power_Ratio is provided, recommended to speed up)
In case 1, Power_Ratio is computed by bin_power() function.
Notes
To speed up, it is recommended to compute Power_Ratio before calling this function because it may also be used by other functions whereas computing it here again will slow down.
Parameters: - Band –
list
boundary frequencies (in Hz) of bins. They can be unequal bins, e.g. [0.5,4,7,12,30] which are delta, theta, alpha and beta respectively. You can also use range() function of Python to generate equal bins and pass the generated list to this function.Each element of Band is a physical frequency and shall not exceed the Nyquist frequency, i.e., half of sampling frequency.
- X
- list
a 1-D real time series.
- Fs –
integer
the sampling rate in physical frequency
Returns: Return type: As indicated in return line
See also
bin_power()- pyeeg function that computes spectral power in frequency bins
- Band –
-
nsim.analyses1.pyeeg.svd_entropy(X, Tau, DE, W=None)¶ Compute SVD Entropy from either two cases below: 1. a time series X, with lag tau and embedding dimension dE (default) 2. a list, W, of normalized singular values of a matrix (if W is provided, recommend to speed up.)
If W is None, the function will do as follows to prepare singular spectrum:
First, computer an embedding matrix from X, Tau and DE using pyeeg function embed_seq():
M = embed_seq(X, Tau, DE)Second, use scipy.linalg function svd to decompose the embedding matrix M and obtain a list of singular values:
W = svd(M, compute_uv=0)- At last, normalize W:
- W /= sum(W)
To speed up, it is recommended to compute W before calling this function because W may also be used by other functions whereas computing it here again will slow down.
Module contents¶
-
nsim.analyses1.psd(ts, plot=True)¶ plot Welch estimate of power spectral density
-
nsim.analyses1.lowpass(ts, cutoff_hz, order=3)¶ forward-backward butterworth low-pass filter
-
nsim.analyses1.highpass(ts, cutoff_hz, order=3)¶ forward-backward butterworth high-pass filter
-
nsim.analyses1.bandpass(ts, low_hz, high_hz, order=3)¶ forward-backward butterworth band-pass filter
-
nsim.analyses1.notch(ts, freq_hz, bandwidth_hz=1.0)¶ notch filter to remove remove a particular frequency Adapted from code by Sturla Molden
-
nsim.analyses1.hilbert(ts)¶ Analytic signal, using the Hilbert transform
-
nsim.analyses1.hilbert_amplitude(ts)¶ Amplitude of the analytic signal, using the Hilbert transform
-
nsim.analyses1.hilbert_phase(ts)¶ Phase of the analytic signal, using the Hilbert transform
-
nsim.analyses1.cwt(ts, freqs=array([ 1., 1.09854114, 1.20679264, 1.32571137, 1.45634848, 1.59985872, 1.75751062, 1.93069773, 2.12095089, 2.32995181, 2.55954792, 2.8117687, 3.0888436, 3.39322177, 3.72759372, 4.09491506, 4.49843267, 4.94171336, 5.42867544, 5.96362332, 6.55128557, 7.19685673, 7.90604321, 8.68511374, 9.54095476, 10.48113134, 11.51395399, 12.64855217, 13.89495494, 15.26417967, 16.76832937, 18.42069969, 20.23589648, 22.22996483, 24.42053095, 26.82695795, 29.47051703, 32.37457543, 35.56480306, 39.06939937, 42.9193426, 47.14866363, 51.79474679, 56.89866029, 62.50551925, 68.6648845, 75.43120063, 82.86427729, 91.0298178, 100. ]), wavelet=<function cwtmorlet>, plot=True)¶ Continuous wavelet transform Note the full results can use a huge amount of memory at 64-bit precision
Parameters: - ts – Timeseries of m variables, shape (n, m). Assumed constant timestep.
- freqs – list of frequencies (in Hz) to use for the tranform. (default is 50 frequency bins logarithmic from 1Hz to 100Hz)
- wavelet – the wavelet to use. may be complex. see scipy.signal.wavelets
- plot – whether to plot time-resolved power spectrum
Returns: Continuous wavelet transform output array, shape (n,len(freqs),m)
Return type: coefs
-
nsim.analyses1.cwt_distributed(ts, freqs=array([ 1., 1.09854114, 1.20679264, 1.32571137, 1.45634848, 1.59985872, 1.75751062, 1.93069773, 2.12095089, 2.32995181, 2.55954792, 2.8117687, 3.0888436, 3.39322177, 3.72759372, 4.09491506, 4.49843267, 4.94171336, 5.42867544, 5.96362332, 6.55128557, 7.19685673, 7.90604321, 8.68511374, 9.54095476, 10.48113134, 11.51395399, 12.64855217, 13.89495494, 15.26417967, 16.76832937, 18.42069969, 20.23589648, 22.22996483, 24.42053095, 26.82695795, 29.47051703, 32.37457543, 35.56480306, 39.06939937, 42.9193426, 47.14866363, 51.79474679, 56.89866029, 62.50551925, 68.6648845, 75.43120063, 82.86427729, 91.0298178, 100. ]), wavelet=<function cwtmorlet>, plot=True)¶ Continuous wavelet transform using distributed computation. (Currently just splits the data by channel. TODO split it further.) Note: this function requires an IPython cluster to be started first.
Parameters: - ts – Timeseries of m variables, shape (n, m). Assumed constant timestep.
- freqs – list of frequencies (in Hz) to use for the tranform. (default is 50 frequency bins logarithmic from 1Hz to 100Hz)
- wavelet – the wavelet to use. may be complex. see scipy.signal.wavelets
- plot – whether to plot time-resolved power spectrum
Returns: Continuous wavelet transform output array, shape (n,len(freqs),m)
Return type: coefs
-
nsim.analyses1.variability_fp(ts, freqs=None, ncycles=6, plot=True)¶ Example variability function. Gives two continuous, time-resolved measures of the variability of a time series, ranging between -1 and 1. The two measures are based on variance of the centroid frequency and variance of the height of the spectral peak, respectively. (Centroid frequency meaning the power-weighted average frequency) These measures are calculated over sliding time windows of variable size. See also: Blenkinsop et al. (2012) The dynamic evolution of focal-onset
epilepsies - combining theoretical and clinical observationsParameters: - Timeseries of m variables, shape (n, m). Assumed constant timestep. (ts) –
- (optional) List of frequencies to examine. If None, defaults to (freqs) – 50 frequency bands ranging 1Hz to 60Hz, logarithmically spaced.
- Window size, in number of cycles of the centroid frequency. (ncycles) –
- bool Whether to display the output (plot) –
Returns: variability Timeseries of shape (n, m, 2) variability[:, :, 0] gives a measure of variability between -1 and 1 based on variance of centroid frequency. variability[:, :, 1] gives a measure of variability between -1 and 1 based on variance of maximum power.
-
nsim.analyses1.epochs(ts, variability=None, threshold=0.0, minlength=1.0, plot=True)¶ Identify “stationary” epochs within a time series, based on a continuous measure of variability. Epochs are defined to contain the points of minimal variability, and to extend as wide as possible with variability not exceeding the threshold.
Parameters: - Timeseries of m variables, shape (n, m). (ts) –
- (optional) Timeseries of shape (n, m, q), giving q scalar (variability) – measures of the variability of timeseries ts near each point in time. (if None, we will use variability_fp()) Epochs require the mean of these to be below the threshold.
- The maximum variability permitted in stationary epochs. (threshold) –
- Shortest acceptable epoch length (in seconds) (minlength) –
- bool Whether to display the output (plot) –
- Returns: (variability, allchannels_epochs)
- variability: as above allchannels_epochs: (list of) list of tuples For each variable, a list of tuples (start, end) that give the starting and ending indices of stationary epochs. (epochs are inclusive of start point but not the end point)
-
nsim.analyses1.epochs_distributed(ts, variability=None, threshold=0.0, minlength=1.0, plot=True)¶ Same as epochs(), but computes channels in parallel for speed.
- (Note: This requires an IPython cluster to be started first,
- e.g. on a workstation type ‘ipcluster start’)
Identify “stationary” epochs within a time series, based on a continuous measure of variability. Epochs are defined to contain the points of minimal variability, and to extend as wide as possible with variability not exceeding the threshold.
Parameters: - Timeseries of m variables, shape (n, m). (ts) –
- (optional) Timeseries of shape (n, m, q), giving q scalar (variability) – measures of the variability of timeseries ts near each point in time. (if None, we will use variability_fp()) Epochs require the mean of these to be below the threshold.
- The maximum variability permitted in stationary epochs. (threshold) –
- Shortest acceptable epoch length (in seconds) (minlength) –
- bool Whether to display the output (plot) –
- Returns: (variability, allchannels_epochs)
- variability: as above allchannels_epochs: (list of) list of tuples For each variable, a list of tuples (start, end) that give the starting and ending indices of stationary epochs. (epochs are inclusive of start point but not the end point)
-
nsim.analyses1.epochs_joint(ts, variability=None, threshold=0.0, minlength=1.0, proportion=0.75, plot=True)¶ Identify epochs within a multivariate time series where at least a certain proportion of channels are “stationary”, based on a previously computed variability measure.
- (Note: This requires an IPython cluster to be started first,
- e.g. on a workstation type ‘ipcluster start’)
Parameters: - Timeseries of m variables, shape (n, m). (ts) –
- (optional) Timeseries of shape (n, m), giving a scalar (variability) – measure of the variability of timeseries ts near each point in time. (if None, we will use variability_fp())
- The maximum variability permitted in stationary epochs. (threshold) –
- Shortest acceptable epoch length (in seconds) (minlength) –
- Require at least this fraction of channels to be "stationary" (proportion) –
- bool Whether to display the output (plot) –
- Returns: (variability, joint_epochs)
- joint_epochs: list of tuples A list of tuples (start, end) that give the starting and ending indices of time epochs that are stationary for at least proportion of channels. (epochs are inclusive of start point but not the end point)
-
nsim.analyses1.plot(ts, title=None, show=True)¶ Plot a Timeseries :param ts Timeseries: :param title str: :param show bool whether to display the figure or just return a figure object:
-
nsim.analyses1.crossing_indices(ts, c=0.0, d=0.0)¶ For a single variable timeseries, find the time indices each time the value crosses
cfrom above or below. Can optionally set a non-zerodto impose the condition that the value must wander at leastdunits away fromcbetween crossings.If the timeseries begins (or ends) exactly at
c, then time zero (or the ending time) is also included as a crossing event, so that the boundaries of the first and last excursions are included.Parameters: - ts – Timeseries (single variable)
- c (float) – Critical value at which to report crossings.
- d (float) – Optional min distance from c to be attained between crossings.
Returns: array of indices
-
nsim.analyses1.first_return_times(ts, c=None, d=0.0)¶ For a single variable time series, first wait until the time series attains the value c for the first time. Then record the time intervals between successive returns to c. If c is not given, the default is the mean of the time series.
Parameters: - ts – Timeseries (single variable)
- c (float) – Optional target value (default is the mean of the time series)
- d (float) – Optional min distance from c to be attained between returns
Returns: array of time intervals (Can take the mean of these to estimate the expected first return time)
-
nsim.analyses1.autocorrelation(ts, normalized=False, unbiased=False)¶ Returns the discrete, linear convolution of a time series with itself, optionally using unbiased normalization.
N.B. Autocorrelation estimates are necessarily inaccurate for longer lags, as there are less pairs of points to convolve separated by that lag. Therefore best to throw out the results except for shorter lags, e.g. keep lags from tau=0 up to one quarter of the total time series length.
Parameters: - normalized (boolean) – If True, the time series will first be normalized to a mean of 0 and variance of 1. This gives autocorrelation 1 at zero lag.
- unbiased (boolean) – If True, the result at each lag m will be scaled by 1/(N-m). This gives an unbiased estimation of the autocorrelation of a stationary process from a finite length sample.
Ref: S. J. Orfanidis (1996) “Optimum Signal Processing”, 2nd Ed.
-
nsim.analyses1.mod2pi(ts)¶ For a timeseries where all variables represent phases (in radians), return an equivalent timeseries where all values are in the range (-pi, pi]
-
nsim.analyses1.phase_crossings(ts, phi=0.0)¶ For a single variable timeseries representing the phase of an oscillator, find the time indices each time the phase crosses angle phi, with the condition that the phase must visit phi+pi between crossings.
(Thus if noise causes the phase to wander back and forth across angle phi without the oscillator doing a full revolution, then this is recorded as a single crossing event, giving the index of the earliest arrival.)
If the timeseries begins (or ends) exactly at phi, then time zero (or the ending time) is also included as a crossing event, so that the boundaries of the first and last oscillations are included.
Parameters: - ts – Timeseries (single variable) The timeseries of an angle variable (radians)
- phi (float) – Critical phase angle (radians) at which to report crossings.
Returns: array of indices
-
nsim.analyses1.periods(ts, phi=0.0)¶ For a single variable timeseries representing the phase of an oscillator, measure the period of each successive oscillation.
An individual oscillation is defined to start and end when the phase passes phi (by default zero) after completing a full cycle.
If the timeseries begins (or ends) exactly at phi, then the first (or last) oscillation will be included.
Parameters: - ts – Timeseries (single variable) The timeseries of an angle variable (radians)
- phi (float) – A single oscillation starts and ends at phase phi (by default zero).
-
nsim.analyses1.circmean(ts, axis=2)¶ Circular mean phase
-
nsim.analyses1.circstd(ts, axis=2)¶ Circular standard deviation
-
nsim.analyses1.variability_fp(ts, freqs=None, ncycles=6, plot=True) Example variability function. Gives two continuous, time-resolved measures of the variability of a time series, ranging between -1 and 1. The two measures are based on variance of the centroid frequency and variance of the height of the spectral peak, respectively. (Centroid frequency meaning the power-weighted average frequency) These measures are calculated over sliding time windows of variable size. See also: Blenkinsop et al. (2012) The dynamic evolution of focal-onset
epilepsies - combining theoretical and clinical observationsParameters: - Timeseries of m variables, shape (n, m). Assumed constant timestep. (ts) –
- (optional) List of frequencies to examine. If None, defaults to (freqs) – 50 frequency bands ranging 1Hz to 60Hz, logarithmically spaced.
- Window size, in number of cycles of the centroid frequency. (ncycles) –
- bool Whether to display the output (plot) –
Returns: variability Timeseries of shape (n, m, 2) variability[:, :, 0] gives a measure of variability between -1 and 1 based on variance of centroid frequency. variability[:, :, 1] gives a measure of variability between -1 and 1 based on variance of maximum power.
-
nsim.analyses1.epochs(ts, variability=None, threshold=0.0, minlength=1.0, plot=True) Identify “stationary” epochs within a time series, based on a continuous measure of variability. Epochs are defined to contain the points of minimal variability, and to extend as wide as possible with variability not exceeding the threshold.
Parameters: - Timeseries of m variables, shape (n, m). (ts) –
- (optional) Timeseries of shape (n, m, q), giving q scalar (variability) – measures of the variability of timeseries ts near each point in time. (if None, we will use variability_fp()) Epochs require the mean of these to be below the threshold.
- The maximum variability permitted in stationary epochs. (threshold) –
- Shortest acceptable epoch length (in seconds) (minlength) –
- bool Whether to display the output (plot) –
- Returns: (variability, allchannels_epochs)
- variability: as above allchannels_epochs: (list of) list of tuples For each variable, a list of tuples (start, end) that give the starting and ending indices of stationary epochs. (epochs are inclusive of start point but not the end point)
-
nsim.analyses1.epochs_distributed(ts, variability=None, threshold=0.0, minlength=1.0, plot=True) Same as epochs(), but computes channels in parallel for speed.
- (Note: This requires an IPython cluster to be started first,
- e.g. on a workstation type ‘ipcluster start’)
Identify “stationary” epochs within a time series, based on a continuous measure of variability. Epochs are defined to contain the points of minimal variability, and to extend as wide as possible with variability not exceeding the threshold.
Parameters: - Timeseries of m variables, shape (n, m). (ts) –
- (optional) Timeseries of shape (n, m, q), giving q scalar (variability) – measures of the variability of timeseries ts near each point in time. (if None, we will use variability_fp()) Epochs require the mean of these to be below the threshold.
- The maximum variability permitted in stationary epochs. (threshold) –
- Shortest acceptable epoch length (in seconds) (minlength) –
- bool Whether to display the output (plot) –
- Returns: (variability, allchannels_epochs)
- variability: as above allchannels_epochs: (list of) list of tuples For each variable, a list of tuples (start, end) that give the starting and ending indices of stationary epochs. (epochs are inclusive of start point but not the end point)
-
nsim.analyses1.epochs_joint(ts, variability=None, threshold=0.0, minlength=1.0, proportion=0.75, plot=True) Identify epochs within a multivariate time series where at least a certain proportion of channels are “stationary”, based on a previously computed variability measure.
- (Note: This requires an IPython cluster to be started first,
- e.g. on a workstation type ‘ipcluster start’)
Parameters: - Timeseries of m variables, shape (n, m). (ts) –
- (optional) Timeseries of shape (n, m), giving a scalar (variability) – measure of the variability of timeseries ts near each point in time. (if None, we will use variability_fp())
- The maximum variability permitted in stationary epochs. (threshold) –
- Shortest acceptable epoch length (in seconds) (minlength) –
- Require at least this fraction of channels to be "stationary" (proportion) –
- bool Whether to display the output (plot) –
- Returns: (variability, joint_epochs)
- joint_epochs: list of tuples A list of tuples (start, end) that give the starting and ending indices of time epochs that are stationary for at least proportion of channels. (epochs are inclusive of start point but not the end point)
-
nsim.analyses1.hurst(X)¶ Compute the Hurst exponent of X. If the output H=0.5,the behavior of the time-series is similar to random walk. If H<0.5, the time-series cover less “distance” than a random walk, vice verse.
Parameters: X – list
a time series
Returns: - H –
float
Hurst exponent
- Notes
- ——–
- Author of this function is Xin Liu
Examples
>>> import pyeeg >>> from numpy.random import randn >>> a = randn(4096) >>> pyeeg.hurst(a) 0.5057444
- H –
-
nsim.analyses1.bin_power(X, Band, Fs)¶ Compute power in each frequency bin specified by Band from FFT result of X. By default, X is a real signal.
A real signal can be synthesized, thus not real.
- Band
list
boundary frequencies (in Hz) of bins. They can be unequal bins, e.g. [0.5,4,7,12,30] which are delta, theta, alpha and beta respectively. You can also use range() function of Python to generate equal bins and pass the generated list to this function.
Each element of Band is a physical frequency and shall not exceed the Nyquist frequency, i.e., half of sampling frequency.
- X
list
a 1-D real time series.
- Fs
integer
the sampling rate in physical frequency
Returns: - Power –
list
spectral power in each frequency bin.
- Power_ratio –
list
spectral power in each frequency bin normalized by total power in ALL frequency bins.
-
nsim.analyses1.pfd(X, D=None)¶ Compute Petrosian Fractal Dimension of a time series from either two cases below:
- X, the time series of type list (default)
- D, the first order differential sequence of X (if D is provided, recommended to speed up)
In case 1, D is computed using Numpy’s difference function.
To speed up, it is recommended to compute D before calling this function because D may also be used by other functions whereas computing it here again will slow down.
-
nsim.analyses1.hfd(X, Kmax)¶ Compute Hjorth Fractal Dimension of a time series X, kmax is an HFD parameter
-
nsim.analyses1.hjorth(X, D=None)¶ Compute Hjorth mobility and complexity of a time series from either two cases below:
- X, the time series of type list (default)
- D, a first order differential sequence of X (if D is provided, recommended to speed up)
In case 1, D is computed using Numpy’s Difference function.
Notes
To speed up, it is recommended to compute D before calling this function because D may also be used by other functions whereas computing it here again will slow down.
Parameters: - X –
list
a time series
- D –
list
first order differential sequence of a time series
Returns: - As indicated in return line
- Hjorth mobility and complexity
-
nsim.analyses1.spectral_entropy(X, Band, Fs, Power_Ratio=None)¶ Compute spectral entropy of a time series from either two cases below: 1. X, the time series (default) 2. Power_Ratio, a list of normalized signal power in a set of frequency bins defined in Band (if Power_Ratio is provided, recommended to speed up)
In case 1, Power_Ratio is computed by bin_power() function.
Notes
To speed up, it is recommended to compute Power_Ratio before calling this function because it may also be used by other functions whereas computing it here again will slow down.
Parameters: - Band –
list
boundary frequencies (in Hz) of bins. They can be unequal bins, e.g. [0.5,4,7,12,30] which are delta, theta, alpha and beta respectively. You can also use range() function of Python to generate equal bins and pass the generated list to this function.Each element of Band is a physical frequency and shall not exceed the Nyquist frequency, i.e., half of sampling frequency.
- X
- list
a 1-D real time series.
- Fs –
integer
the sampling rate in physical frequency
Returns: Return type: As indicated in return line
See also
bin_power()- pyeeg function that computes spectral power in frequency bins
- Band –
-
nsim.analyses1.svd_entropy(X, Tau, DE, W=None)¶ Compute SVD Entropy from either two cases below: 1. a time series X, with lag tau and embedding dimension dE (default) 2. a list, W, of normalized singular values of a matrix (if W is provided, recommend to speed up.)
If W is None, the function will do as follows to prepare singular spectrum:
First, computer an embedding matrix from X, Tau and DE using pyeeg function embed_seq():
M = embed_seq(X, Tau, DE)Second, use scipy.linalg function svd to decompose the embedding matrix M and obtain a list of singular values:
W = svd(M, compute_uv=0)- At last, normalize W:
- W /= sum(W)
To speed up, it is recommended to compute W before calling this function because W may also be used by other functions whereas computing it here again will slow down.
-
nsim.analyses1.fisher_info(X, Tau, DE, W=None)¶ Compute SVD Entropy from either two cases below: 1. a time series X, with lag tau and embedding dimension dE (default) 2. a list, W, of normalized singular values of a matrix (if W is provided, recommend to speed up.)
If W is None, the function will do as follows to prepare singular spectrum:
First, computer an embedding matrix from X, Tau and DE using pyeeg function embed_seq():
M = embed_seq(X, Tau, DE)Second, use scipy.linalg function svd to decompose the embedding matrix M and obtain a list of singular values:
W = svd(M, compute_uv=0)- At last, normalize W:
- W /= sum(W)
To speed up, it is recommended to compute W before calling this function because W may also be used by other functions whereas computing it here again will slow down.
-
nsim.analyses1.ap_entropy(X, M, R)¶ Computer approximate entropy (ApEN) of series X, specified by M and R.
Suppose given time series is X = [x(1), x(2), ... , x(N)]. We first build embedding matrix Em, of dimension (N-M+1)-by-M, such that the i-th row of Em is x(i),x(i+1), ... , x(i+M-1). Hence, the embedding lag and dimension are 1 and M-1 respectively. Such a matrix can be built by calling pyeeg function as Em = embed_seq(X, 1, M). Then we build matrix Emp, whose only difference with Em is that the length of each embedding sequence is M + 1
Denote the i-th and j-th row of Em as Em[i] and Em[j]. Their k-th elements are Em[i][k] and Em[j][k] respectively. The distance between Em[i] and Em[j] is defined as 1) the maximum difference of their corresponding scalar components, thus, max(Em[i]-Em[j]), or 2) Euclidean distance. We say two 1-D vectors Em[i] and Em[j] match in tolerance R, if the distance between them is no greater than R, thus, max(Em[i]-Em[j]) <= R. Mostly, the value of R is defined as 20% - 30% of standard deviation of X.
Pick Em[i] as a template, for all j such that 0 < j < N - M + 1, we can check whether Em[j] matches with Em[i]. Denote the number of Em[j], which is in the range of Em[i], as k[i], which is the i-th element of the vector k. The probability that a random row in Em matches Em[i] is simga_1^{N-M+1} k[i] / (N - M + 1), thus sum(k)/ (N - M + 1), denoted as Cm[i].
We repeat the same process on Emp and obtained Cmp[i], but here 0<i<N-M since the length of each sequence in Emp is M + 1.
The probability that any two embedding sequences in Em match is then sum(Cm)/ (N - M +1 ). We define Phi_m = sum(log(Cm)) / (N - M + 1) and Phi_mp = sum(log(Cmp)) / (N - M ).
And the ApEn is defined as Phi_m - Phi_mp.
Notes
- Please be aware that self-match is also counted in ApEn.
- This function now runs very slow. We are still trying to speed it up.
References
Costa M, Goldberger AL, Peng CK, Multiscale entropy analysis of biological signals, Physical Review E, 71:021906, 2005
See also
samp_entropy()- sample entropy of a time series
Notes
Extremely slow implementation. Do NOT use if your dataset is not small.
-
nsim.analyses1.samp_entropy(X, M, R)¶ Computer sample entropy (SampEn) of series X, specified by M and R.
SampEn is very close to ApEn.
Suppose given time series is X = [x(1), x(2), ... , x(N)]. We first build embedding matrix Em, of dimension (N-M+1)-by-M, such that the i-th row of Em is x(i),x(i+1), ... , x(i+M-1). Hence, the embedding lag and dimension are 1 and M-1 respectively. Such a matrix can be built by calling pyeeg function as Em = embed_seq(X, 1, M). Then we build matrix Emp, whose only difference with Em is that the length of each embedding sequence is M + 1
Denote the i-th and j-th row of Em as Em[i] and Em[j]. Their k-th elements are Em[i][k] and Em[j][k] respectively. The distance between Em[i] and Em[j] is defined as 1) the maximum difference of their corresponding scalar components, thus, max(Em[i]-Em[j]), or 2) Euclidean distance. We say two 1-D vectors Em[i] and Em[j] match in tolerance R, if the distance between them is no greater than R, thus, max(Em[i]-Em[j]) <= R. Mostly, the value of R is defined as 20% - 30% of standard deviation of X.
Pick Em[i] as a template, for all j such that 0 < j < N - M , we can check whether Em[j] matches with Em[i]. Denote the number of Em[j], which is in the range of Em[i], as k[i], which is the i-th element of the vector k.
We repeat the same process on Emp and obtained Cmp[i], 0 < i < N - M.
The SampEn is defined as log(sum(Cm)/sum(Cmp))
References
Costa M, Goldberger AL, Peng C-K, Multiscale entropy analysis of biological signals, Physical Review E, 71:021906, 2005
See also
ap_entropy()- approximate entropy of a time series
Notes
Extremely slow computation. Do NOT use if your dataset is not small and you are not patient enough.
-
nsim.analyses1.dfa(X, Ave=None, L=None)¶ Compute Detrended Fluctuation Analysis from a time series X and length of boxes L.
The first step to compute DFA is to integrate the signal. Let original series be X= [x(1), x(2), ..., x(N)].
The integrated signal Y = [y(1), y(2), ..., y(N)] is obtained as follows y(k) = sum_{i=1}^{k}{x(i)-Ave} where Ave is the mean of X.
The second step is to partition/slice/segment the integrated sequence Y into boxes. At least two boxes are needed for computing DFA. Box sizes are specified by the L argument of this function. By default, it is from 1/5 of signal length to one (x-5)-th of the signal length, where x is the nearest power of 2 from the length of the signal, i.e., 1/16, 1/32, 1/64, 1/128, ...
In each box, a linear least square fitting is employed on data in the box. Denote the series on fitted line as Yn. Its k-th elements, yn(k), corresponds to y(k).
For fitting in each box, there is a residue, the sum of squares of all offsets, difference between actual points and points on fitted line.
F(n) denotes the square root of average total residue in all boxes when box length is n, thus Total_Residue = sum_{k=1}^{N}{(y(k)-yn(k))} F(n) = sqrt(Total_Residue/N)
The computing to F(n) is carried out for every box length n. Therefore, a relationship between n and F(n) can be obtained. In general, F(n) increases when n increases.
Finally, the relationship between F(n) and n is analyzed. A least square fitting is performed between log(F(n)) and log(n). The slope of the fitting line is the DFA value, denoted as Alpha. To white noise, Alpha should be 0.5. Higher level of signal complexity is related to higher Alpha.
Parameters: - X – 1-D Python list or numpy array a time series
- Ave – integer, optional The average value of the time series
- L – 1-D Python list of integers A list of box size, integers in ascending order
Returns: integer the result of DFA analysis, thus the slope of fitting line of log(F(n)) vs. log(n). where n is the
Return type: Alpha
Examples
>>> import pyeeg >>> from numpy.random import randn >>> print pyeeg.dfa(randn(4096)) 0.490035110345
Peng C-K, Havlin S, Stanley HE, Goldberger AL. Quantification of scaling exponents and crossover phenomena in nonstationary heartbeat time series. _Chaos_ 1995;5:82-87
Notes
This value depends on the box sizes very much. When the input is a white noise, this value should be 0.5. But, some choices on box sizes can lead to the value lower or higher than 0.5, e.g. 0.38 or 0.58.
Based on many test, I set the box sizes from 1/5 of signal length to one (x-5)-th of the signal length, where x is the nearest power of 2 from the length of the signal, i.e., 1/16, 1/32, 1/64, 1/128, ...
You may generate a list of box sizes and pass in such a list as a parameter.