SMap

Description :
SMap projection of the input data file or DataFrame.

Python :

SMap(dataFrame=None, columns='', target='',
lib='', pred='', E=0, Tp=1, knn=0, tau=-1,
theta=0, exclusionRadius=0, solver=None, embedded=False,
validLib=[], noTime=False, ignoreNan = True,
showPlot=False, verbose=False, returnObject=False)

R :

SMap(pathIn="./", dataFile="", dataFrame=NULL,
lib="", pred="", E=0, Tp=1, knn=0, tau=-1,
theta=0, exclusionRadius=0, columns="", target="", 
embedded=FALSE, verbose=FALSE, validLib=vector(), 
ignoreNan=TRUE, generateSteps=0, parameterList=FALSE, 
showPlot=FALSE, noTime=FALSE) 

Parameter Type Default Purpose
dataFrame pyEDM: pandas DataFrame
rEDM: data.frame
None Input DataFrame
columns string or [] "" Column names for library
target string "" Prediction target column name or index
lib string or [] "" Pairs of library start stop row indices
pred string or [] "" Pairs of prediction start stop row indices
E int 0 Data dimension
Tp int 1 Prediction Interval
knn int 0 Number nearest neighbors
tau int -1 Embedding time shift (time series rows)
theta int 0 SMap localization
exclusionRadius int 0 Prediction vector exclusion radius
solver sklearn.linear_model None Linear system solver
embedded bool False Is data an embedding? If False, embed to E
validLib bool [] [] or None Conditional embedding
noTime bool False Do not require first data column of time or index
ignoreNan bool True Adjust lib to exlcude NaN
showPlot bool False Plot results (pyEDM, rEDM)
verbose bool False Echo messages
generateSteps int 0 Number of recursive time step predictions
generateLibrary bool False Add generated data to library
parameterList bool False Include parameter dictionary in return
const_pred bool False Include non projected forecast data
pathIn string "./" Input data file path
dataFile string "" Data file name
pathOut string "./" Output file path
predictFile string "" Prediction output file
smapCoefFile string "" SMap coefficient output file
smapSVFile string "" SMap singular value output file


Refer to the parameters table for general parameter definitions.

Notes :
If embedded is false, data columns are embedded to dimension E with shift tau. If knn is not specified, it is set to the full library size.
If knn is specified, it must be greater than E.

nan :
Version 1.x : Any prediction row (pred) with nan will result in SMap nan prediction. Any library vector with nan , whether in the observation, or from time delay embedding used as a nearest neighbor, will result in SMap nan prediction. By default SMap uses all library vectors as neighbors. To address this, if nan are detected in columns or target and ignoreNan = True (default), the library is automatically redefined to exclude data and embedding vectors containing nan. If ignoreNan = False the library is not changed. The user can manually specify library (lib) row segments to ignore nan values.

Version 2.x : nan values are removed from the data unless ignoreNan = True.

Multivariate Embedding :
SMap should be called with columns explicitly corresponding to dimensions E. In the univariate case (number of columns = 1) with default embedded = false, the time series will be time-delay embedded to dimension E, returned SMap coefficients correspond to each dimension.

If multivariate data is used (number of columns > 1) SMap must use embedded = true with E equal to the number of columns. This prevents the function from internally time-delay embedding the multiple columns to dimension E. If internal time-delay embedding were performed, then state-space columns will not correspond to the intended dimensions in the matrix inversion, coefficient assignment, and prediction. In this multivariate case, the user can first prepare the embedding (using Embed() for time-delay embedding if desired, add a first column of time), then pass this embedding to SMap with appropriately specified columns, E, and embedded = true. The Embedding.py application can be used to perform the embedding and insert the time vector for input to SMap.

Conditional Embedding :
validLib implements conditional embedding (CE). It is a boolean vector the same length as the number of time series rows. A false entry means that the state-space vector derived from the corresponding time series row will not be included in the state-space library. See examples.

Generative Mode :
If generateSteps > 0 SMap operates in feedback generative mode. The values of pred are over-riden to start at the end of the data. At each step one prediction is made, added to the columns data, a new time-delay embedded is created, and the cycle repeated for generateSteps. Feedback generation only operates on a univariate time series that is time-delay embedded. The columns and target variables must be the same.

Linear System Solver :
In pyEDM: The default LAPACK SVD solver dgelsd() can be replaced with a class object instantiated from the sklearn.linear_model class. Supported solvers include LinearRegression, Ridge, Lasso, ElasticNet, RidgeCV, LassoCV, ElasticNetCV. See examples.

Version 1.x Note: Windows does not support community compiler standards thereby creating binary library compatibility barriers, specifically the use of OpenBLAS for the SVD solver. As a result, the Windows pyEDM implementation does not use the cppEDM default solver dgelss from BLAS/LAPACK. All other implementations use BLAS/LAPACK dgelss directly.

Returns :
Dict in pyEDM, named List in rEDM: with three DataFrames:
predictions [ 3 columns : "Time", "Observations", "Predictions"],
coefficients[ E+2 columns : "Time", and E+1 SMap coefficents]
singularValues[ E+2 columns : "Time", and E+1 SVD singular values] If available from the linear system solver.

Version 1.x : If parameterList = True, a dictionary of parameters is added.

Version 2.x : If returnObject = True returns the SMap class object with all data and variables.