SMap
Description :
SMap projection of the input data file or DataFrame.
Python :
SMap(dataFrame=None, columns='', target='',
lib='', pred='', E=0, Tp=1, knn=0, tau=-1,
theta=0, exclusionRadius=0, solver=None, embedded=False,
validLib=[], noTime=False, ignoreNan = True,
showPlot=False, verbose=False, returnObject=False)
R :
SMap(pathIn="./", dataFile="", dataFrame=NULL,
lib="", pred="", E=0, Tp=1, knn=0, tau=-1,
theta=0, exclusionRadius=0, columns="", target="",
embedded=FALSE, verbose=FALSE, validLib=vector(),
ignoreNan=TRUE, generateSteps=0, parameterList=FALSE,
showPlot=FALSE, noTime=FALSE)
| Parameter | Type | Default | Purpose |
|---|---|---|---|
| dataFrame | pyEDM: pandas DataFrame rEDM: data.frame |
None | Input DataFrame |
| columns | string or [] | "" | Column names for library |
| target | string | "" | Prediction target column name or index |
| lib | string or [] | "" | Pairs of library start stop row indices |
| pred | string or [] | "" | Pairs of prediction start stop row indices |
| E | int | 0 | Embedding dimension |
| Tp | int | 1 | Prediction Interval |
| knn | int | 0 | Number nearest neighbors |
| tau | int | -1 | Embedding time shift (time series rows) |
| theta | int | 0 | SMap localization |
| exclusionRadius | int | 0 | Prediction vector exclusion radius |
| solver | sklearn.linear_model | None | Linear system solver |
| embedded | bool | False | Is data an embedding? If False, embed to E |
| validLib | bool [] | [] or None | Conditional embedding |
| noTime | bool | False | Do not require first data column of time or index |
| ignoreNan | bool | True | Adjust lib to exlcude NaN |
| showPlot | bool | False | Plot results (pyEDM, rEDM) |
| verbose | bool | False | Echo messages |
| generateSteps | int | 0 | Number of recursive time step predictions |
| generateLibrary | bool | False | Add generated data to library |
| parameterList | bool | False | Include parameter dictionary in return |
| const_pred | bool | False | Include non projected forecast data |
| pathIn | string | "./" | Input data file path |
| dataFile | string | "" | Data file name |
| pathOut | string | "./" | Output file path |
| predictFile | string | "" | Prediction output file |
| smapCoefFile | string | "" | SMap coefficient output file |
| smapSVFile | string | "" | SMap singular value output file |
Refer to the parameters table for general parameter definitions.
Notes :
If embedded is false, data columns are embedded to dimension E with shift tau.
If knn is not specified, it is set to the full library size.
If knn is specified, it must be greater than E.
nan :
Version 1.x : Any prediction row (pred) with nan will result in SMap nan prediction. Any library vector with nan , whether in the observation, or from time delay embedding used as a nearest neighbor, will result in SMap nan prediction. By default SMap uses all library vectors as neighbors. To address this, if nan are detected in columns or target and ignoreNan = True (default), the library is automatically redefined to exclude data and embedding vectors containing nan. If ignoreNan = False the library is not changed. The user can manually specify library (lib) row segments to ignore nan values.
Version 2.x : nan values are removed from the data unless ignoreNan = True.
Multivariate Embedding :
SMap should be called with columns explicitly corresponding to dimensions E. In the univariate case (number of columns = 1) with default embedded = false, the time series will be time-delay embedded to dimension E, returned SMap coefficients correspond to each dimension.
If multivariate data is used (number of columns > 1) SMap
must use embedded = true with E equal to the number of columns.
This prevents the function from internally time-delay embedding the
multiple columns to dimension E. If internal time-delay embedding
were performed, then state-space columns will not correspond to the
intended dimensions in the matrix inversion, coefficient assignment,
and prediction. In this multivariate case, the user can first prepare
the embedding (using Embed() for time-delay
embedding if desired, add a first column of time), then pass this embedding
to SMap with appropriately specified columns, E, and embedded = true.
The Embedding.py application can be used to perform the embedding and
insert the time vector for input to SMap.
Conditional Embedding :
validLib implements conditional embedding (CE). It is a boolean vector the same length as the number of time series rows. A false entry means that the state-space vector derived from the corresponding time series row will not be included in the state-space library. See examples.
Generative Mode :
If generateSteps > 0 SMap operates in feedback generative mode. The values of pred are over-riden to start at the end of the data. At each step one prediction is made, added to the columns data, a new time-delay embedded is created, and the cycle repeated for generateSteps. Feedback generation only operates on a univariate time series that is time-delay embedded. The columns and target variables must be the same.
Linear System Solver :
In pyEDM: The default LAPACK SVD solver dgelsd() can be replaced with
a class object instantiated from the sklearn.linear_model class.
Supported solvers include LinearRegression, Ridge, Lasso,
ElasticNet, RidgeCV, LassoCV, ElasticNetCV.
See examples.
Version 1.x Note: Windows does not support community compiler standards thereby creating binary library compatibility barriers, specifically the use of OpenBLAS for the SVD solver. As a result, the Windows pyEDM implementation does not use the cppEDM default solver dgelss from BLAS/LAPACK. All other implementations use BLAS/LAPACK dgelss directly.
Returns :
Dict in pyEDM, named List in rEDM: with three DataFrames:
predictions [ 3 columns : "Time", "Observations", "Predictions"],
coefficients[ E+2 columns : "Time", and E+1 SMap coefficents]
singularValues[ E+2 columns : "Time", and E+1 SVD singular values] If available from the linear system solver.
Version 1.x : If parameterList = True, a dictionary of parameters is added.
Version 2.x : If returnObject = True returns the SMap class object with all data and variables.