pyChemometrics objects

Reference guide for the pyChemometrics objects.

ChemometricsPCA

class pyChemometrics.ChemometricsPCA(ncomps=2, pca_algorithm=<class 'sklearn.decomposition._pca.PCA'>, scaler=ChemometricsScaler(), **pca_type_kwargs)

ChemometricsPCA object - Wrapper for sklearn.decomposition PCA algorithms, with tailored methods for Chemometric Data analysis.

Parameters:
  • ncomps (int) – Number of PCA components desired.
  • pca_algorithm (sklearn.decomposition._BasePCA) – scikit-learn PCA models (inheriting from _BasePCA).
  • scaler (ChemometricsScaler object, scaling/preprocessing objects from scikit-learn or None) – The object which will handle data scaling.
  • pca_type_kwargs (kwargs) – Keyword arguments to be passed during initialization of pca_algorithm.
Raises:

TypeError – If the pca_algorithm or scaler objects are not of the right class.

fit(x, **fit_params)

Perform model fitting on the provided x data matrix and calculate basic goodness-of-fit metrics. Equivalent to scikit-learn’s default BaseEstimator method.

Parameters:
  • x (numpy.ndarray, shape [n_samples, n_features]) – Data matrix to fit the PCA model.
  • fit_params (kwargs) – Keyword arguments to be passed to the .fit() method of the core sklearn model.
Raises:

ValueError – If any problem occurs during fitting.

fit_transform(x, **fit_params)

Fit a model and return the scores, as per the scikit-learn’s TransformerMixin method.

Parameters:
  • x (numpy.ndarray, shape [n_samples, n_features]) – Data matrix to fit and project.
  • fit_params (kwargs) – Optional keyword arguments to be passed to the fit method.
Returns:

PCA projections (scores) corresponding to the samples in X.

Return type:

numpy.ndarray, shape [n_samples, n_comps]

Raises:

ValueError – If there are problems with the input or during model fitting.

transform(x)

Calculate the projections (scores) of the x data matrix. Similar to scikit-learn’s TransformerMixin method.

Parameters:
  • x (numpy.ndarray, shape [n_samples, n_features]) – Data matrix to fit and project.
  • transform_params (kwargs) – Optional keyword arguments to be passed to the transform method.
Returns:

PCA projections (scores) corresponding to the samples in X.

Return type:

numpy.ndarray, shape [n_samples, n_comps]

Raises:

ValueError – If there are problems with the input or during model fitting.

score(x, sample_weight=None)

Return the average log-likelihood of all samples. Same as the underlying score method from the scikit-learn PCA objects.

Parameters:
  • x (numpy.ndarray, shape [n_samples, n_features]) – Data matrix to score model on.
  • sample_weight (numpy.ndarray) – Optional sample weights during scoring.
Returns:

Average log-likelihood over all samples.

Return type:

float

Raises:

ValueError – if the data matrix x provided is invalid.

inverse_transform(scores)

Transform scores to the original data space using the principal component loadings. Similar to scikit-learn’s default TransformerMixin method.

Parameters:scores (numpy.ndarray, shape [n_samples, n_comps]) – The projections (scores) to be converted back to the original data space.
Returns:Data matrix in the original data space.
Return type:numpy.ndarray, shape [n_samples, n_features]
Raises:ValueError – If the dimensions of score mismatch the number of components in the model.
hotelling_T2(comps=None, alpha=0.05)

Obtain the parameters for the Hotelling T2 ellipse at the desired significance level.

Parameters:
  • comps (list) –
  • alpha (float) – Significance level
Returns:

The Hotelling T2 ellipsoid radii at vertex

Return type:

numpy.ndarray

Raises:
  • AtributeError – If the model is not fitted
  • ValueError – If the components requested are higher than the number of components in the model
  • TypeError – If comps is not None or list/numpy 1d array and alpha a float
x_residuals(x, scale=True)
Parameters:
  • x – data matrix [n samples, m variables]
  • scale – Return the residuals in the scale the model is using or in the raw data scale
Returns:

X matrix model residuals

dmodx(x)

Normalised DmodX measure

Parameters:x – data matrix [n samples, m variables]
Returns:The Normalised DmodX measure for each sample
leverages()

Calculate the leverages for each observation

Returns:The leverage (H) for each observation
Return type:numpy.ndarray
cross_validation(x, cv_method=KFold(n_splits=7, random_state=None, shuffle=True), outputdist=False, press_impute=True)

Cross-validation method for the model. Calculates cross-validated estimates for Q2X and other model parameters using row-wise cross validation.

Parameters:
  • x (numpy.ndarray, shape [n_samples, n_features]) – Data matrix.
  • cv_method (BaseCrossValidator) – An instance of a scikit-learn CrossValidator object.
  • outputdist (bool) – Output the whole distribution for the cross validated parameters.

Useful when using ShuffleSplit or CrossValidators other than KFold. :param bool press_impute: Use imputation of test set observations instead of row wise cross-validation. Slower but more reliable. :return: Adds a dictionary cvParameters to the object, containing the cross validation results :rtype: dict :raise TypeError: If the cv_method passed is not a scikit-learn CrossValidator object. :raise ValueError: If the x data matrix is invalid.

outlier(x, comps=None, measure='T2', alpha=0.05)

Use the Hotelling T2 or DmodX measure and F statistic to screen for outlier candidates.

Parameters:
  • x – Data matrix [n samples, m variables]
  • comps – Which components to use (for Hotelling T2 only)
  • measure – Hotelling T2 or DmodX
  • alpha – Significance level
Returns:

List with row indices of X matrix

permutationtest_loadings(x, nperms=1000)

Permutation test to assess significance of magnitude of value for variable in component loading vector. Can be used to test importance of variable to the loading vector.

Parameters:
  • x (numpy.ndarray, shape [n_samples, n_features]) – Data matrix.
  • nperms (int) – Number of permutations.
Returns:

Permuted null distribution for loading vector values.

Return type:

numpy.ndarray, shape [ncomps, n_perms, n_features]

Raises:

ValueError – If there is a problem with the input x data or during the procedure.

permutationtest_components(x, nperms=1000)

Unfinished Permutation test for a whole component. Also outputs permuted null distributions for the loadings.

Parameters:
  • x (numpy.ndarray, shape [n_samples, n_features]) – Data matrix.
  • nperms (int) – Number of permutations.
Returns:

Permuted null distribution for the component metrics (VarianceExplained and R2).

Return type:

numpy.ndarray, shape [ncomps, n_perms, n_features]

Raises:

ValueError – If there is a problem with the input data.

ChemometricsPLS

class pyChemometrics.ChemometricsPLS(ncomps=2, pls_algorithm=<class 'sklearn.cross_decomposition._pls.PLSRegression'>, xscaler=ChemometricsScaler(), yscaler=None, **pls_type_kwargs)

ChemometricsPLS object - Wrapper for sklearn.cross_decomposition PLS algorithms, with tailored methods for Chemometric Data analysis.

Parameters:
  • ncomps (int) – Number of PLS components desired.
  • pls_algorithm (sklearn._PLS) – Scikit-learn PLS algorithm to use - PLSRegression or PLSCanonical are supported.
  • xscaler (ChemometricsScaler object, scaling/preprocessing objects from scikit-learn or None.) – Scaler object for X data matrix.
  • yscaler (ChemometricsScaler object, scaling/preprocessing objects from scikit-learn or None.) – Scaler object for the Y data vector/matrix.
  • pls_type_kwargs (kwargs) – Keyword arguments to be passed during initialization of pls_algorithm.
Raises:

TypeError – If the pca_algorithm or scaler objects are not of the right class.

fit(x, y, **fit_params)

Perform model fitting on the provided x and y data and calculate basic goodness-of-fit metrics. Similar to scikit-learn’s BaseEstimator method.

Parameters:
  • x (numpy.ndarray, shape [n_samples, n_features]) – Data matrix to fit the PLS model.
  • y (numpy.ndarray, shape [n_samples, n_features]) – Data matrix to fit the PLS model.
  • fit_params (kwargs) – Keyword arguments to be passed to the .fit() method of the core sklearn model.
Raises:

ValueError – If any problem occurs during fitting.

fit_transform(x, y, **fit_params)

Fit a model to supplied data and return the scores. Equivalent to scikit-learn’s TransformerMixin method.

Parameters:
  • x (numpy.ndarray, shape [n_samples, n_features]) – Data matrix to fit the PLS model.
  • y (numpy.ndarray, shape [n_samples, n_features]) – Data matrix to fit the PLS model.
  • fit_params (kwargs) – Optional keyword arguments to be passed to the pls_algorithm .fit() method.
Returns:

Latent Variable scores (T) for the X matrix and for the Y vector/matrix (U).

Return type:

tuple of numpy.ndarray, shape [[n_tscores], [n_uscores]]

Raises:

ValueError – If any problem occurs during fitting.

transform(x=None, y=None)

Calculate the scores for a data block from the original data. Equivalent to sklearn’s TransformerMixin method.

Parameters:
  • x (numpy.ndarray, shape [n_samples, n_features] or None) – Data matrix to fit the PLS model.
  • y (numpy.ndarray, shape [n_samples, n_features] or None) – Data matrix to fit the PLS model.
Returns:

Latent Variable scores (T) for the X matrix and for the Y vector/matrix (U).

Return type:

tuple with 2 numpy.ndarray, shape [n_samples, n_comps]

Raises:
  • ValueError – If dimensions of input data are mismatched.
  • AttributeError – When calling the method before the model is fitted.
inverse_transform(t=None, u=None)

Transform scores to the original data space using their corresponding loadings. Same logic as in scikit-learn’s TransformerMixin method.

Parameters:
  • t (numpy.ndarray, shape [n_samples, n_comps] or None) – T scores corresponding to the X data matrix.
  • u (numpy.ndarray, shape [n_samples, n_comps] or None) – Y scores corresponding to the Y data vector/matrix.
Return x:

X Data matrix in the original data space.

Return type:

numpy.ndarray, shape [n_samples, n_features] or None

Return y:

Y Data matrix in the original data space.

Return type:

numpy.ndarray, shape [n_samples, n_features] or None

Raises:

ValueError – If dimensions of input data are mismatched.

score(x, y, block_to_score='y', sample_weight=None)

Predict and calculate the R2 for the model using one of the data blocks (X or Y) provided. Equivalent to the scikit-learn RegressorMixin score method.

Parameters:
  • x (numpy.ndarray, shape [n_samples, n_features] or None) – Data matrix to fit the PLS model.
  • y (numpy.ndarray, shape [n_samples, n_features] or None) – Data matrix to fit the PLS model.
  • block_to_score (str) – Which of the data blocks (X or Y) to calculate the R2 goodness of fit.
  • sample_weight (numpy.ndarray, shape [n_samples] or None) – Optional sample weights to use in scoring.
Return R2Y:

The model’s R2Y, calculated by predicting Y from X and scoring.

Return type:

float

Return R2X:

The model’s R2X, calculated by predicting X from Y and scoring.

Return type:

float

Raises:

ValueError – If block to score argument is not acceptable or date mismatch issues with the provided data.

predict(x=None, y=None)

Predict the values in one data block using the other. Same as its scikit-learn’s RegressorMixin namesake method.

Parameters:
  • x (numpy.ndarray, shape [n_samples, n_features] or None) – Data matrix to fit the PLS model.
  • y (numpy.ndarray, shape [n_samples, n_features] or None) – Data matrix to fit the PLS model.
Returns:

Predicted data block (X or Y) obtained from the other data block.

Return type:

numpy.ndarray, shape [n_samples, n_features]

Raises:
  • ValueError – If no data matrix is passed, or dimensions mismatch issues with the provided data.
  • AttributeError – Calling the method without fitting the model before.
VIP(mode='w', direction='y')

Output the Variable importance for projection metric (VIP). With the default values it is calculated using the x variable weights and the variance explained of y.

Note: Code not adequate to obtain a VIP for each individual variable in the multi-Y case, as SSY should be changed so that it is calculated for each y and not for the whole Y matrix

Parameters:
  • mode (str) – The type of model parameter to use in calculating the VIP. Default value is weights (w), and other acceptable arguments are p, ws, cs, c and q.
  • direction (str) – The data block to be used to calculated the model fit and regression sum of squares.
Return numpy.ndarray VIP:
 

The vector with the calculated VIP values.

Return type:

numpy.ndarray, shape [n_features]

Raises:
  • ValueError – If mode or direction is not a valid option.
  • AttributeError – Calling method without a fitted model.
hotelling_T2(comps=[0, 1], alpha=0.05)

Obtain the parameters for the Hotelling T2 ellipse at the desired significance level.

Parameters:
  • comps (list) – List of components to calculate the Hotelling T2.
  • alpha (float) – Significant level for the F statistic.
Returns:

List with the Hotelling T2 ellipse radii

Return type:

list

Raises:

ValueError – If the dimensions request

dmodx(x)

Normalised DmodX measure

Parameters:x – data matrix [n samples, m variables]
Returns:The Normalised DmodX measure for each sample
leverages(block='X')

Calculate the leverages for each observation :return: :rtype:

outlier(x, comps=None, measure='T2', alpha=0.05)

Use the Hotelling T2 or DmodX measure and F statistic to screen for outlier candidates.

Parameters:
  • x – Data matrix [n samples, m variables]
  • comps – Which components to use (for Hotelling T2 only)
  • measure – Hotelling T2 or DmodX
  • alpha – Significance level
Returns:

List with row indices of X matrix

cross_validation(x, y, cv_method=KFold(n_splits=7, random_state=None, shuffle=True), outputdist=False, **crossval_kwargs)

Cross-validation method for the model. Calculates Q2 and cross-validated estimates for all model parameters.

Parameters:
  • x (numpy.ndarray, shape [n_samples, n_features]) – Data matrix to fit the PLS model.
  • y (numpy.ndarray, shape [n_samples, n_features]) – Data matrix to fit the PLS model.
  • cv_method (BaseCrossValidator or BaseShuffleSplit) – An instance of a scikit-learn CrossValidator object.
  • outputdist (bool) – Output the whole distribution for. Useful when ShuffleSplit or CrossValidators other than KFold.
  • crossval_kwargs (kwargs) – Keyword arguments to be passed to the sklearn.Pipeline during cross-validation
Returns:

Return type:

dict

Raises:
  • TypeError – If the cv_method passed is not a scikit-learn CrossValidator object.
  • ValueError – If the x and y data matrices are invalid.
permutation_test(x, y, nperms=1000, cv_method=KFold(n_splits=7, random_state=None, shuffle=True), **permtest_kwargs)

Permutation test for the classifier. Outputs permuted null distributions for model performance metrics (Q2X/Q2Y) and most model parameters.

Parameters:
  • x (numpy.ndarray, shape [n_samples, n_features]) – Data matrix to fit the PLS model.
  • y (numpy.ndarray, shape [n_samples, n_features]) – Data matrix to fit the PLS model.
  • nperms (int) – Number of permutations to perform.
  • cv_method (BaseCrossValidator or BaseShuffleSplit) – An instance of a scikit-learn CrossValidator object.
  • permtest_kwargs (kwargs) – Keyword arguments to be passed to the .fit() method during cross-validation and model fitting.
Returns:

Permuted null distributions for model parameters and the permutation p-value for the Q2Y value.

Return type:

dict

ChemometricsPLSDA

class pyChemometrics.ChemometricsPLSDA(ncomps=2, pls_algorithm=<class 'sklearn.cross_decomposition._pls.PLSRegression'>, xscaler=ChemometricsScaler(), **pls_type_kwargs)

Chemometrics PLS-DA object - Similar to ChemometricsPLS, but with extra functions to handle Y vectors encoding class membership and classification assessment metrics.

Parameters:
  • ncomps (int) – Number of PLS components desired.
  • pls_algorithm (sklearn._PLS) – Scikit-learn PLS algorithm to use - PLSRegression or PLSCanonical are supported.
  • xscaler (ChemometricsScaler object, scaling/preprocessing objects from scikit-learn or None.) – Scaler object for X data matrix.
  • yscaler (ChemometricsScaler object, scaling/preprocessing objects from scikit-learn or None.) – Scaler object for the Y data vector/matrix.
  • pls_type_kwargs (kwargs) – Keyword arguments to be passed during initialization of pls_algorithm.
Raises:

TypeError – If the pca_algorithm or scaler objects are not of the right class.

fit(x, y, **fit_params)

Perform model fitting on the provided x and y data and calculate basic goodness-of-fit metrics. Similar to scikit-learn’s BaseEstimator method.

Parameters:
  • x (numpy.ndarray, shape [n_samples, n_features]) – Data matrix to fit the PLS model.
  • y (numpy.ndarray, shape [n_samples, n_features]) – Data matrix to fit the PLS model.
  • fit_params (kwargs) – Keyword arguments to be passed to the .fit() method of the core sklearn model.
Raises:

ValueError – If any problem occurs during fitting.

fit_transform(x, y, **fit_params)

Fit a model to supplied data and return the scores. Equivalent to scikit-learn’s TransformerMixin method.

Parameters:
  • x (numpy.ndarray, shape [n_samples, n_features]) – Data matrix to fit the PLS model.
  • y (numpy.ndarray, shape [n_samples, n_features]) – Data matrix to fit the PLS model.
  • fit_params (kwargs) – Optional keyword arguments to be passed to the pls_algorithm .fit() method.
Returns:

Latent Variable scores (T) for the X matrix and for the Y vector/matrix (U).

Return type:

tuple of numpy.ndarray, shape [[n_tscores], [n_uscores]]

Raises:

ValueError – If any problem occurs during fitting.

transform(x=None, y=None)

Calculate the scores for a data block from the original data. Equivalent to sklearn’s TransformerMixin method.

Parameters:
  • x (numpy.ndarray, shape [n_samples, n_features] or None) – Data matrix to fit the PLS model.
  • y (numpy.ndarray, shape [n_samples, n_features] or None) – Data matrix to fit the PLS model.
Returns:

Latent Variable scores (T) for the X matrix and for the Y vector/matrix (U).

Return type:

tuple with 2 numpy.ndarray, shape [n_samples, n_comps]

Raises:
  • ValueError – If dimensions of input data are mismatched.
  • AttributeError – When calling the method before the model is fitted.
inverse_transform(t=None, u=None)

Transform scores to the original data space using their corresponding loadings. Same logic as in scikit-learn’s TransformerMixin method.

Parameters:
  • t (numpy.ndarray, shape [n_samples, n_comps] or None) – T scores corresponding to the X data matrix.
  • u (numpy.ndarray, shape [n_samples, n_comps] or None) – Y scores corresponding to the Y data vector/matrix.
Return x:

X Data matrix in the original data space.

Return type:

numpy.ndarray, shape [n_samples, n_features] or None

Return y:

Y Data matrix in the original data space.

Return type:

numpy.ndarray, shape [n_samples, n_features] or None

Raises:

ValueError – If dimensions of input data are mismatched.

score(x, y, sample_weight=None)

Predict and calculate the R2 for the model using one of the data blocks (X or Y) provided. Equivalent to the scikit-learn ClassifierMixin score method.

Parameters:
  • x (numpy.ndarray, shape [n_samples, n_features] or None) – Data matrix to fit the PLS model.
  • y (numpy.ndarray, shape [n_samples, n_features] or None) – Data matrix to fit the PLS model.
  • block_to_score (str) – Which of the data blocks (X or Y) to calculate the R2 goodness of fit.
  • sample_weight (numpy.ndarray, shape [n_samples] or None) – Optional sample weights to use in scoring.
Return R2Y:

The model’s R2Y, calculated by predicting Y from X and scoring.

Return type:

float

Return R2X:

The model’s R2X, calculated by predicting X from Y and scoring.

Return type:

float

Raises:

ValueError – If block to score argument is not acceptable or date mismatch issues with the provided data.

predict(x)

Predict the values in one data block using the other. Same as its scikit-learn’s RegressorMixin namesake method.

Parameters:
  • x (numpy.ndarray, shape [n_samples, n_features] or None) – Data matrix to fit the PLS model.
  • y (numpy.ndarray, shape [n_samples, n_features] or None) – Data matrix to fit the PLS model.
Returns:

Predicted data block (X or Y) obtained from the other data block.

Return type:

numpy.ndarray, shape [n_samples, n_features]

Raises:
  • ValueError – If no data matrix is passed, or dimensions mismatch issues with the provided data.
  • AttributeError – Calling the method without fitting the model before.
VIP(mode='w', direction='y')

Output the Variable importance for projection metric (VIP). With the default values it is calculated using the x variable weights and the variance explained of y. Default mode is recommended (mode = ‘w’ and direction = ‘y’)

Parameters:
  • mode (str) – The type of model parameter to use in calculating the VIP. Default value is weights (w), and other acceptable arguments are p, ws, cs, c and q.
  • direction (str) – The data block to be used to calculated the model fit and regression sum of squares.
Return numpy.ndarray VIP:
 

The vector with the calculated VIP values.

Return type:

numpy.ndarray, shape [n_features]

Raises:
  • ValueError – If mode or direction is not a valid option.
  • AttributeError – Calling method without a fitted model.
cross_validation(x, y, cv_method=KFold(n_splits=7, random_state=None, shuffle=True), outputdist=False, **crossval_kwargs)

Cross-validation method for the model. Calculates Q2 and cross-validated estimates for all model parameters.

Parameters:
  • x (numpy.ndarray, shape [n_samples, n_features]) – Data matrix to fit the PLS model.
  • y (numpy.ndarray, shape [n_samples, n_features]) – Data matrix to fit the PLS model.
  • cv_method (BaseCrossValidator or BaseShuffleSplit) – An instance of a scikit-learn CrossValidator object.
  • outputdist (bool) – Output the whole distribution for. Useful when ShuffleSplit or CrossValidators other than KFold.
  • crossval_kwargs (kwargs) – Keyword arguments to be passed to the sklearn.Pipeline during cross-validation
Returns:

Return type:

dict

Raises:
  • TypeError – If the cv_method passed is not a scikit-learn CrossValidator object.
  • ValueError – If the x and y data matrices are invalid.
permutation_test(x, y, nperms=1000, cv_method=KFold(n_splits=7, random_state=None, shuffle=True), **permtest_kwargs)

Permutation test for the classifier. Outputs permuted null distributions for model performance metrics (Q2X/Q2Y) and many other model parameters.

Parameters:
  • x (numpy.ndarray, shape [n_samples, n_features]) – Data matrix to fit the PLS model.
  • y (numpy.ndarray, shape [n_samples, n_features]) – Data matrix to fit the PLS model.
  • nperms (int) – Number of permutations to perform.
  • cv_method (BaseCrossValidator or BaseShuffleSplit) – An instance of a scikit-learn CrossValidator object.
  • permtest_kwargs (kwargs) – Keyword arguments to be passed to the .fit() method during cross-validation and model fitting.
Returns:

Permuted null distributions for model parameters and the permutation p-value for the Q2Y value.

Return type:

dict

ChemometricsPLS_Logistic

class pyChemometrics.ChemometricsPLS_Logistic(ncomps=2, pls_algorithm=<class 'sklearn.cross_decomposition._pls.PLSRegression'>, logreg_algorithm=<class 'sklearn.linear_model._logistic.LogisticRegression'>, xscaler=ChemometricsScaler(), **pls_type_kwargs)

ChemometricsPLS object - Wrapper for sklearn.cross_decomposition PLS algorithms, with tailored methods for Chemometric Data analysis.

Parameters:
  • ncomps (int) – Number of PLS components desired.
  • pls_algorithm (sklearn._PLS) – Scikit-learn PLS algorithm to use - PLSRegression or PLSCanonical are supported.
  • xscaler (ChemometricsScaler object, scaling/preprocessing objects from scikit-learn or None.) – Scaler object for X data matrix.
  • yscaler (ChemometricsScaler object, scaling/preprocessing objects from scikit-learn or None.) – Scaler object for the Y data vector/matrix.
  • pls_type_kwargs (kwargs) – Keyword arguments to be passed during initialization of pls_algorithm.
Raises:

TypeError – If the pca_algorithm or scaler objects are not of the right class.

fit(x, y, **fit_params)

Perform model fitting on the provided x and y data and calculate basic goodness-of-fit metrics. Similar to scikit-learn’s BaseEstimator method.

Parameters:
  • x (numpy.ndarray, shape [n_samples, n_features]) – Data matrix to fit the PLS model.
  • y (numpy.ndarray, shape [n_samples, n_features]) – Data matrix to fit the PLS model.
  • fit_params (kwargs) – Keyword arguments to be passed to the .fit() method of the core sklearn model.
Raises:

ValueError – If any problem occurs during fitting.

fit_transform(x, y, **fit_params)

Fit a model to supplied data and return the scores. Equivalent to scikit-learn’s TransformerMixin method.

Parameters:
  • x (numpy.ndarray, shape [n_samples, n_features]) – Data matrix to fit the PLS model.
  • y (numpy.ndarray, shape [n_samples, n_features]) – Data matrix to fit the PLS model.
  • fit_params (kwargs) – Optional keyword arguments to be passed to the pls_algorithm .fit() method.
Returns:

Latent Variable scores (T) for the X matrix and for the Y vector/matrix (U).

Return type:

tuple of numpy.ndarray, shape [[n_tscores], [n_uscores]]

Raises:

ValueError – If any problem occurs during fitting.

transform(x=None, y=None)

Calculate the scores for a data block from the original data. Equivalent to sklearn’s TransformerMixin method.

Parameters:
  • x (numpy.ndarray, shape [n_samples, n_features] or None) – Data matrix to fit the PLS model.
  • y (numpy.ndarray, shape [n_samples, n_features] or None) – Data matrix to fit the PLS model.
Returns:

Latent Variable scores (T) for the X matrix and for the Y vector/matrix (U).

Return type:

tuple with 2 numpy.ndarray, shape [n_samples, n_comps]

Raises:
  • ValueError – If dimensions of input data are mismatched.
  • AttributeError – When calling the method before the model is fitted.
inverse_transform(t=None, u=None)

Transform scores to the original data space using their corresponding loadings. Same logic as in scikit-learn’s TransformerMixin method.

Parameters:
  • t (numpy.ndarray, shape [n_samples, n_comps] or None) – T scores corresponding to the X data matrix.
  • u (numpy.ndarray, shape [n_samples, n_comps] or None) – Y scores corresponding to the Y data vector/matrix.
Return x:

X Data matrix in the original data space.

Return type:

numpy.ndarray, shape [n_samples, n_features] or None

Return y:

Y Data matrix in the original data space.

Return type:

numpy.ndarray, shape [n_samples, n_features] or None

Raises:

ValueError – If dimensions of input data are mismatched.

score(x, y, sample_weight=None)

Predict and calculate the R2 for the model using one of the data blocks (X or Y) provided. Equivalent to the scikit-learn ClassifierMixin score method.

Parameters:
  • x (numpy.ndarray, shape [n_samples, n_features] or None) – Data matrix to fit the PLS model.
  • y (numpy.ndarray, shape [n_samples, n_features] or None) – Data matrix to fit the PLS model.
  • block_to_score (str) – Which of the data blocks (X or Y) to calculate the R2 goodness of fit.
  • sample_weight (numpy.ndarray, shape [n_samples] or None) – Optional sample weights to use in scoring.
Return R2Y:

The model’s R2Y, calculated by predicting Y from X and scoring.

Return type:

float

Return R2X:

The model’s R2X, calculated by predicting X from Y and scoring.

Return type:

float

Raises:

ValueError – If block to score argument is not acceptable or date mismatch issues with the provided data.

predict(x)

Predict the values in one data block using the other. Same as its scikit-learn’s RegressorMixin namesake method.

Parameters:
  • x (numpy.ndarray, shape [n_samples, n_features] or None) – Data matrix to fit the PLS model.
  • y (numpy.ndarray, shape [n_samples, n_features] or None) – Data matrix to fit the PLS model.
Returns:

Predicted data block (X or Y) obtained from the other data block.

Return type:

numpy.ndarray, shape [n_samples, n_features]

Raises:
  • ValueError – If no data matrix is passed, or dimensions mismatch issues with the provided data.
  • AttributeError – Calling the method without fitting the model before.
VIP(mode='w', direction='y')

Output the Variable importance for projection metric (VIP). With the default values it is calculated using the x variable weights and the variance explained of y.

Parameters:
  • mode (str) – The type of model parameter to use in calculating the VIP. Default value is weights (w), and other acceptable arguments are p, ws, cs, c and q.
  • direction (str) – The data block to be used to calculated the model fit and regression sum of squares.
Return numpy.ndarray VIP:
 

The vector with the calculated VIP values.

Return type:

numpy.ndarray, shape [n_features]

Raises:
  • ValueError – If mode or direction is not a valid option.
  • AttributeError – Calling method without a fitted model.
cross_validation(x, y, cv_method=KFold(n_splits=7, random_state=None, shuffle=True), outputdist=False, **crossval_kwargs)

Cross-validation method for the model. Calculates Q2 and cross-validated estimates for all model parameters.

Parameters:
  • x (numpy.ndarray, shape [n_samples, n_features]) – Data matrix to fit the PLS model.
  • y (numpy.ndarray, shape [n_samples, n_features]) – Data matrix to fit the PLS model.
  • cv_method (BaseCrossValidator or BaseShuffleSplit) – An instance of a scikit-learn CrossValidator object.
  • outputdist (bool) – Output the whole distribution for. Useful when ShuffleSplit or CrossValidators other than KFold.
  • crossval_kwargs (kwargs) – Keyword arguments to be passed to the sklearn.Pipeline during cross-validation
Returns:

Return type:

dict

Raises:
  • TypeError – If the cv_method passed is not a scikit-learn CrossValidator object.
  • ValueError – If the x and y data matrices are invalid.
permutation_test(x, y, nperms=1000, cv_method=KFold(n_splits=7, random_state=None, shuffle=True), **permtest_kwargs)

Permutation test for the classifier. Outputs permuted null distributions for model performance metrics (Q2X/Q2Y) and most model parameters.

Parameters:
  • x (numpy.ndarray, shape [n_samples, n_features]) – Data matrix to fit the PLS model.
  • y (numpy.ndarray, shape [n_samples, n_features]) – Data matrix to fit the PLS model.
  • nperms (int) – Number of permutations to perform.
  • cv_method (BaseCrossValidator or BaseShuffleSplit) – An instance of a scikit-learn CrossValidator object.
  • permtest_kwargs (kwargs) – Keyword arguments to be passed to the .fit() method during cross-validation and model fitting.
Returns:

Permuted null distributions for model parameters and the permutation p-value for the Q2Y value.

Return type:

dict

ChemometricsScaler

class pyChemometrics.ChemometricsScaler(scale_power=1, copy=True, with_mean=True, with_std=True)

Extension of Scikit-learn’s StandardScaler which allows scaling by different powers of the standard deviation.

Parameters:
  • scale_power (Float) – To which power should the standard deviation of each variable be raised for scaling. 0: Mean centering; 0.5: Pareto; 1:Unit Variance.
  • copy (bool) – Copy the array containing the data.
  • with_mean (bool) – Perform mean centering.
  • with_std (bool) – Scale the data.
fit(X, y=None)

Compute the mean and standard deviation from a dataset to use in future scaling operations.

Parameters:
  • X (numpy.ndarray, shape [n_samples, n_features]) – Data matrix to scale.
  • y (None) – Passthrough for Scikit-learn Pipeline compatibility.
Returns:

Fitted object.

Return type:

pyChemometrics.ChemometricsScaler

partial_fit(X, y=None)

Performs online computation of mean and standard deviation on X for later scaling. All of X is processed as a single batch. This is intended for cases when fit is not feasible due to very large number of n_samples or because X is read from a continuous stream.

The algorithm for incremental mean and std is given in Equation 1.5a,b in Chan, Tony F., Gene H. Golub, and Randall J. LeVeque. “Algorithms for computing the sample variance: Analysis and recommendations.” The American Statistician 37.3 (1983): 242-247

Parameters:
  • X (numpy.ndarray, shape [n_samples, n_features]) – Data matrix to scale.
  • y (None) – Passthrough for Scikit-learn Pipeline compatibility.
Returns:

Fitted object.

Return type:

pyChemometrics.ChemometricsScaler

transform(X, y=None, copy=None)

Perform standardization by centering and scaling using the parameters.

Parameters:
  • X (numpy.ndarray, shape [n_samples, n_features]) – Data matrix to scale.
  • y (None) – Passthrough for scikit-learn Pipeline compatibility.
  • copy (bool) – Copy the X matrix.
Returns:

Scaled version of the X data matrix.

Return type:

numpy.ndarray, shape [n_samples, n_features]

inverse_transform(X, copy=None)

Scale back the data to the original representation.

Parameters:
  • X (numpy.ndarray, shape [n_samples, n_features]) – Scaled data matrix.
  • copy (bool) – Copy the X data matrix.
Returns:

X data matrix with the scaling operation reverted.

Return type:

numpy.ndarray, shape [n_samples, n_features]