Canopy  1.0
The header-only random forests library
Public Member Functions | Protected Attributes | Friends | List of all members
canopy::discreteDistribution Class Reference

A distribution that defines the probabilities over a number of discrete (integer-valued) class labels. More...

#include <discreteDistribution.hpp>

Collaboration diagram for canopy::discreteDistribution:
Collaboration graph
[legend]

Public Member Functions

 discreteDistribution ()
 Default constructor. More...
 
 discreteDistribution (const int num_classes)
 Constructor. More...
 
void initialise (const int num_classes)
 Initialise with a certain number of classes and reset probabilities to zero. More...
 
void reset ()
 Reset function - return probabilities to zero. More...
 
float pdf (const int x) const
 Returns the probability of a particular label. More...
 
void normalise ()
 Normalise the distribution to ensure it is valid. More...
 
void printOut (std::ofstream &stream) const
 Prints the defining parameters of the distribution to an output filestream. More...
 
void readIn (std::ifstream &stream)
 Reads the defining parameters of the distribution from a filestream. More...
 
void raiseDistributionTemperature (const double T)
 Smooth the distribution using the softmax function. More...
 
template<class TLabelIterator , class TIdIterator >
void fit (TLabelIterator first_label, TLabelIterator last_label, TIdIterator)
 Fit the distribution to a set of labels. More...
 
template<class TId >
float pdf (const int x, const TId) const
 Returns the probability of a particular label. More...
 
template<class TId >
void combineWith (const discreteDistribution &dist, const TId)
 Combine this distribution with a second by summing the probability values, without normalisation. More...
 

Protected Attributes

int n_classes
 The number of discrete classes.
 
std::vector< float > prob
 Vector containing the probabilities of each class.
 

Friends

std::ofstream & operator<< (std::ofstream &stream, const discreteDistribution &dist)
 Allows the distribution to be written to a file via the streaming operator '<<'.
 
std::ifstream & operator>> (std::ifstream &stream, discreteDistribution &dist)
 Allows the distribution to be written to read from a file via the streaming operator '>>'.
 

Detailed Description

A distribution that defines the probabilities over a number of discrete (integer-valued) class labels.

The discreteDistribution has the characteristics of both a node distribution and an output distribution, and is used as the node and output distribution for the classifier

Constructor & Destructor Documentation

canopy::discreteDistribution::discreteDistribution ( )
inline

Default constructor.

Initialises with 0 classes

canopy::discreteDistribution::discreteDistribution ( const int  num_classes)
inline

Constructor.

Initialises with a given number of classes

Parameters
num_classesThe number of discrete classes

Member Function Documentation

template<class TId >
void canopy::discreteDistribution::combineWith ( const discreteDistribution dist,
const TId   
)

Combine this distribution with a second by summing the probability values, without normalisation.

This method is used by the randomForestBase methods to aggregate the distributions in several leaf nodes into one output distribution.

Template Parameters
TIdThe type of the IDs of the data points. The ID is unused but required for compatibility with randomForestBase.
Parameters
distThe distribution that this distribution should be combined with.
-The second parameter is unused and but required for compatibility with randomForestBase
template<class TLabelIterator , class TIdIterator >
void canopy::discreteDistribution::fit ( TLabelIterator  first_label,
TLabelIterator  last_label,
TIdIterator   
)

Fit the distribution to a set of labels.

Fits the discrete distribution to the set of labels between first_label and last label. Expects the labels to take value between 0 and N-1 inclusive, where N is the number of classes that the distribution has been initialised with. There are no checks to ensure this.

Template Parameters
TLabelIteratorThe type of the iterator used to access the labels of the training data. Must be a forward iterator that dereferences to an integral type.
TIdIteratorThe type of the iterator used to access the IDs of the data points. The ID is unused but required for compatibility with randomForestBase.
Parameters
first_labelIterator to the first label
last_labelIterator to the last label
-The third parameter is unused but required for compatibility with randomForestBase
void canopy::discreteDistribution::initialise ( const int  num_classes)
inline

Initialise with a certain number of classes and reset probabilities to zero.

Parameters
num_classesThe number of discrete classes
void canopy::discreteDistribution::normalise ( )
inline

Normalise the distribution to ensure it is valid.

This may be used after several combineWith() operations to ensure that the resulting distribution represents a valid probability distribution

float canopy::discreteDistribution::pdf ( const int  x) const
inline

Returns the probability of a particular label.

This overloaded version does not require the ID and is intended for use by user code.

Parameters
xThe label of for which the probability is sought
template<class TId >
float canopy::discreteDistribution::pdf ( const int  x,
const TId   
) const

Returns the probability of a particular label.

This is the version used by the randomForestBase methods.

Template Parameters
TIdThe type of the IDs of the data points. The ID is unused but required for compatibility with randomForestBase.
Parameters
xThe label of for which the probability is sought
-The second parameter is unused and but required for compatibility with randomForestBase
void canopy::discreteDistribution::printOut ( std::ofstream &  stream) const
inline

Prints the defining parameters of the distribution to an output filestream.

Parameters
streamThe stream to which the parameters (the probability values for each class) are printed
void canopy::discreteDistribution::raiseDistributionTemperature ( const double  T)
inline

Smooth the distribution using the softmax function.

This alters the probability distribution by replacing the probability of class \( i \) according to

\[ p_i \leftarrow \frac{ e^{\frac{p_i}{T}}}{\sum_{j=1}^N {e^\frac{p_j}{T}} } \]

where \( N \) is the number of classes and \( T \) is a temperature parameter. This has the effect of regularising the distribution, reducing the certainty.

Parameters
TThe temperature parameter. The higher the temperature, the more the certainty is reduced. T must be a strictly positive number, otherwise this function will have no effect.
void canopy::discreteDistribution::readIn ( std::ifstream &  stream)
inline

Reads the defining parameters of the distribution from a filestream.

Parameters
streamThe stream from which the parameters (probability values for each class) are to be read
void canopy::discreteDistribution::reset ( )
inline

Reset function - return probabilities to zero.

Use this when using the class as an output distribution to create a new blank distribution before combining with new node distributions


The documentation for this class was generated from the following file: