Reference

Layers

Linear Regression

class theano_wrapper.layers.LinearRegression(regularizer_fn=None, shape=None, X=None)

Simple Linear Regression. Linear regression is a linear predictor modeling the relationship between a scalar dependent variable y and one or more explanatory variables denoted D from an input sample X. The target value is given by the formula:

y = \sum_{i=0}^{|\mathcal{D}|} (W_i \cdot X_i) +  b

Parameters:
  • n_in (int) – Number of input nodes
  • n_out (int) – Number of output nodes
X

theano variable – Symbolic input.

y

theano variable – Symbolic output.

W

theano variable – Weights matrix, shape=(n_in, n_out).

b

theano variable – Bias vector, shape=(n_out,).

predict

theano expression – Predict target value for input X.

cost

theano expression – Mean squared error loss function.

Logistic Regression

class theano_wrapper.layers.LogisticRegression(regularizer_fn=None, shape=None, X=None)

Multi-class Logistic Regression.

Logistic regression is a probabilistic, linear classifier. It is parametrized by a weight matrix W and a bias vector b. Classification is done by projecting an input vector onto a set of hyperplanes, each of which corresponds to a class. The distance from the input to a hyperplane reflects the probability that the input is a member of the corresponding class.

P(Y=i|x, W,b) &= softmax_i(W x + b) \\
              &= \frac {e^{W_i x + b_i}} {\sum_j e^{W_j x + b_j}}

The model’s prediction y_{pred} is the class whose probability is maximal, specifically:

y_{pred} = {\rm argmax}_i P(Y=i|x,W,b)

Parameters:
  • n_in (int) – Number of input nodes
  • n_out (int) – Number of output nodes
X

theano variable – Symbolic input.

y

theano variable – Symbolic output.

W

theano variable – Weights matrix, shape=(n_in, n_out).

b

theano variable – Bias vector, shape=(n_out,).

predict

theano expression – Return the most probable class (the probability function as described above).

cost

theano expression – Negative log-likelihood if we define the likelihood \cal{L} and loss \ell:

\mathcal{L} (\theta=\{W,b\}, \mathcal{D}) =
  \sum_{i=0}^{|\mathcal{D}|} \log(P(Y=y^{(i)}|x^{(i)}, W,b)) \\
\ell (\theta=\{W,b\}, \mathcal{D}) =
   - \mathcal{L} (\theta=\{W,b\}, \mathcal{D})

probas

theano expression – Calculate probabilities for input X.

Multi-layer Regression

class theano_wrapper.layers.MultiLayerRegression(n_in, n_hidden, n_out, random=None)

Multilayer Regression.

An MLP can be viewed as a linear regression predictor where the input is first transformed using a transformation \Phi. This transformation projects the input data into a more sparse or dense space. This intermediate layer is referred to as a hidden layer. Formally, a one-hidden-layer MLP is a function f: R^D \rightarrow R^L, where D is the size of input vector x and L is the size of the output vector f(x), such that, in matrix notation: .. math:

f(x) = G( b^{(2)} + W^{(2)}( s( b^{(1)} + W^{(1)} x))),

with bias vectors b^{(1)}, b^{(2)}; weight matrices W^{(1)}, W^{(2)} and activation functions G and s. The vector h(x) = \Phi(x) = s(b^{(1)} + W^{(1)} x) constitutes the hidden layer. W^{(1)} \in R^{D \times D_h} is the weight matrix connecting the input vector to the hidden layer. Each column W^{(1)}_{\cdot i} represents the weights from the import input units to the i-th hidden unit. This estimator’s s is the Rectified linear unit output, or relu function.

Parameters:
  • n_in (int) – number of input nodes
  • n_hidden (int or list(int)) – if int this is the number of hidden layer nodes in a single-hidden-layer network. If list of int’s this is a list of number of nodes for len(n_hidden) successive layers
  • n_out (int) – number of output nodes
  • random (Optional(int or numpy.random.RandomState instance)) – an integer seed or random state generator. Default: None, links to np.random
layers

list – List of all the estimator layers with layers[0] being the input layer, layer[1:-1] being the hidden layers and layers[-1] the output layer.

X

theano variable – Symbolic input of first layer.

y

theano variable – Symbolic output of last layer.

params

list – Vector of all the estimator parameters, i.e. weights and biases of all the layers

predict

theano expression – Return the most probable class (the probability function as described above).

cost

theano expression – Negative log-likelihood from LogisticRegression.

Multi-Layer Perceptron

class theano_wrapper.layers.MultiLayerPerceptron(n_in, n_hidden, n_out, random=None)

Multilayer Perceptron.

An MLR can be viewed as a logistic regression classifier where the input is first transformed using a learnt non-linear transformation \Phi. This transformation projects the input data into a space where it becomes linearly separable. This intermediate layer is referred to as a hidden layer.

Formally, a one-hidden-layer MLR is a function f: R^D \rightarrow R^L, where D is the size of input vector x and L is the size of the output vector f(x), such that, in matrix notation:

f(x) = G( b^{(2)} + W^{(2)}( s( b^{(1)} + W^{(1)} x))),

with bias vectors b^{(1)}, b^{(2)}; weight matrices W^{(1)}, W^{(2)} and activation functions G and s. The vector h(x) = \Phi(x) = s(b^{(1)} + W^{(1)} x) constitutes the hidden layer. W^{(1)} \in R^{D \times D_h} is the weight matrix connecting the input vector to the hidden layer. Each column W^{(1)}_{\cdot i} represents the weights from the import input units to the i-th hidden unit. This estimator’s s is the tanh function.

Parameters:
  • n_in (int) – number of input nodes
  • n_hidden (int or list(int)) – if int this is the number of hidden layer nodes in a single-hidden-layer network. If list of int’s this is a list of number of nodes for len(n_hidden) successive layers
  • n_out (int) – number of output nodes
  • random (Optional(int or numpy.random.RandomState instance)) – an integer seed or random state generator. Default: None, links to np.random
layers

list – List of all the estimator layers with layers[0] being the input layer, layer[1:-1] being the hidden layers and layers[-1] the output layer.

X

theano variable – Symbolic input of first layer.

y

theano variable – Symbolic output of last layer.

params

list – Vector of all the estimator parameters, i.e. weights and biases of all the layers

predict

theano expression – Return the most probable class (the probability function as described above).

cost

theano expression – Negative log-likelihood from LogisticRegression.

Trainers

Epoch-based

class theano_wrapper.trainers.EpochTrainer(clf, alpha=0.01, max_iter=10000, patience=5000, p_inc=2.0, imp_thresh=0.995, random=None, verbose=None)

Simple epoch-based trainer using Gradient Descent with patience. The idea is that we train for at least n (patience) epochs and then if the score keeps getting better (biased by imp_thresh) we elongate the training session by a factor of p_inc.

Parameters:
  • clf – the estimator to train
  • alpha (float) – learning rate
  • max_iter (int) – max_iterations to go through
  • patience (int) – look at least that many samples
  • p_inc (float) – how many more samples to fit after each improvement
  • imp_thresh (float) – the limit of what to consider improvement
  • random (int or random state generator) – a random state for predictable results
  • verbose (int) – verbosity factor. None = off, n = every n periods
gradients

theano symbolic function – The gradient for each parameter.

updates

theano symbolic function – Compute update values.

fit(X, y)

Train estimator using input samples. This implementation will automatically split the input into an 80% training and an 20% validation set

predict(X)

Return estimator prediction for input X

Stohastic Gradient Descent

class theano_wrapper.trainers.SGDTrainer(clf, batch_size=None, alpha=0.01, max_iter=10000, patience=5000, p_inc=2.0, imp_thresh=0.995, random=None, verbose=None)

Stohastic Gradient descent trainer with patience. This classifier works in a similar way to EpochTrainer, but instead of fitting all the samples it splits them to minibatches and go through a subset of all the samples at a fit period. This allows for speed improvements with large datasets and off-line training, i.e. training without all the samples available at once.

Parameters:
  • clf – the estimator to train
  • batch_size (int or None) – how many samples to consider for each training batch. if None, it is set to int(n_samples/100)
  • alpha (float) – learning rate
  • max_iter (int) – max_iterations to go through
  • patience (int) – look at least that many samples
  • p_inc (float) – how many more samples to fit after each improvement
  • imp_thresh (float) – the limit of what to consider improvement
  • random (int or random state generator) – a random state for predictable resi;ts
  • verbose (int) – verbosity factor. None = off, n = every n periods
gradients

(theano symbolic function) The gradient for each parameter

updates

(theano symbolic function) Compute update values

fit(X, y)

Train estimator using input samples. This implementation will automatically split the input into an 80% training and an 20% validation set

predict(X)

Return estimator prediction for input X

Regularizers

L1 / L2 squared

theano_wrapper.trainers.l1_l2_reg(l1_reg=0.0, l2_reg=0.0001)

L1 and L2 squared regularization.

L1 and L2 regularization involve adding an extra term to the loss function, which penalizes certain parameter configurations. For a loss function \ell(\theta, \cal{D}) of the prediction function f parameterized by \theta on data set \cal{D}, the regularized loss will be:

E(\theta, \mathcal{D}) =  \ell(\theta, \mathcal{D}) +
                          \lambda R(\theta)\\

or, in our case:

E(\theta, \mathcal{D}) =  NLL(\theta, \mathcal{D}) +
                          \lambda||\theta||_p^p

where

||\theta||_p =
\left(\sum_{j=0}^{|\theta|}{|\theta_j|^p}\right)^{\frac{1}{p}}

\theta is a set of all parameters for a given model, \lambda the hyper-parameter which controls the relative importance of the regularization parameter and R the regularization function. Commonly used values for p are 1 and 2, hence the L1/L2 nomenclature. If p=2, then the regularizer is also called “weight decay”.

In this model both L1 and L2 regularization is supported.

Parameters:
  • clf – an estimator
  • l1_reg (float) – The l1 regularization parameter. Defaults to .0
  • l2_reg (float) – The l2 regularization parameter. Defaults to .0001
Returns:

Symbolic expression that calculates the

regularized cost.

Return type:

cost (theano expression)

Example:

clf = SomeClassifier(*args)
reg = l1_l2_reg(clf, 0.0001, 0.001)
trn = SomeTrainer(clf, reg=reg)
[...]