Headers and NamespacesAll the code in this library is inside namespace boost::math. In order to use a distribution my_distribution you will need to include either the header <boost/math/distributions/my_distribution.hpp> or the "include everything" header: <boost/math/distributions.hpp>. For example, to use the Students-t distribution include either <boost/math/distributions/students_t.hpp> or <boost/math/distributions.hpp> Distributions are ObjectsEach kind of distribution in this library is a class type. Policies provide fine-grained control of the behaviour of these classes, allowing the user to customise behaviour such as how errors are handled, or how the quantiles of discrete distribtions behave.
Making distributions class types does two things:
Although the distribution classes in this library are templates, there
are typedefs on type double that mostly take the usual
name of the distribution (except where there is a clash with a function
of the same name: beta and gamma, in which case using the default template
arguments - using namespace boost::math; // Construct a students_t distribution with 4 degrees of freedom: students_t d1(4); // Construct a double-precision beta distribution // with parameters a = 10, b = 20 beta_distribution<> d2(10, 20); // Note: _distribution<> suffix !
If you need to use the distributions with a type other than // Construct a students_t distribution, of float type, // with 4 degrees of freedom: students_t_distribution<float> d3(4); // Construct a binomial distribution, of long double type, // with probability of success 0.3 // and 20 trials in total: binomial_distribution<long double> d4(20, 0.3); The parameters passed to the distributions can be accessed via getter member functions: d1.degrees_of_freedom(); // returns 4.0 This is all well and good, but not very useful so far. What we often want is to be able to calculate the cumulative distribution functions and quantiles etc for these distributions. Generic operations common to all distributions are non-member functionsWant to calculate the PDF (Probability Density Function) of a distribution? No problem, just use: pdf(my_dist, x); // Returns PDF (density) at point x of distribution my_dist. Or how about the CDF (Cumulative Distribution Function): cdf(my_dist, x); // Returns CDF (integral from -infinity to point x) // of distribution my_dist. And quantiles are just the same: quantile(my_dist, p); // Returns the value of the random variable x // such that cdf(my_dist, x) == p. If you're wondering why these aren't member functions, it's to make the library more easily extensible: if you want to add additional generic operations - let's say the n'th moment - then all you have to do is add the appropriate non-member functions, overloaded for each implemented distribution type.
For example, the binomial distribution has two parameters: n (the number of trials) and p (the probability of success on one trial).
The
For this distribution the random variate is k: the number of successes observed. The probability density/mass function (pdf) is therefore written as f(k; n, p).
As noted above the non-member function
The ranges of random variate values that are permitted and are supported
can be tested by using two functions The distribution (effectively the random variate) is said to be 'supported' over a range that is "the smallest closed set whose complement has probability zero". MathWorld uses the word 'defined' for this range. Non-mathematicians might say it means the 'interesting' smallest range of random variate x that has the cdf going from zero to unity. Outside are uninteresting zones where the pdf is zero, and the cdf zero or unity.
For most distributions, with probability distribution functions one might
describe as 'well-behaved', we have decided that it is most useful for
the supported range to exclude random variate values like exact zero if the end point is discontinuous. For example,
the Weibull (scale 1, shape 1) distribution smoothly heads for unity as
the random variate x declines towards zero. But at x = zero, the value
of the pdf is suddenly exactly zero, by definition. If you are plotting
the PDF, or otherwise calculating, zero is not the most useful value for
the lower limit of supported, as we discovered. So for this, and similar
distributions, we have decided it is most numerically useful to use the
closest value to zero, min_value, for the limit of the supported range.
(The
Mathematically, the functions may make sense with an (+ or -) infinite
value, but except for a few special cases (in the Normal and Cauchy distributions)
this implementation limits random variates to finite values from the
Complements are supported too
Often you don't want the value of the CDF, but its complement, which is
to say
In this library, whenever you want to receive a complement, just wrap all
the function arguments in a call to students_t dist(5); cout << "CDF at t = 1 is " << cdf(dist, 1.0) << endl; cout << "Complement of CDF at t = 1 is " << cdf(complement(dist, 1.0)) << endl;
But wait, now that we have a complement, we have to be able to use it as
well. Any function that accepts a probability as an argument can also accept
a complement by wrapping all of its arguments in a call to students_t dist(5); for(double i = 10; i < 1e10; i *= 10) { // Calculate the quantile for a 1 in i chance: double t = quantile(complement(dist, 1/i)); // Print it out: cout << "Quantile of students-t with 5 degrees of freedom\n" "for a 1 in " << i << " chance is " << t << endl; }
Parameters can be calculatedSometimes it's the parameters that define the distribution that you need to find. Suppose, for example, you have conducted a Students-t test for equal means and the result is borderline. Maybe your two samples differ from each other, or maybe they don't; based on the result of the test you can't be sure. A legitimate question to ask then is "How many more measurements would I have to take before I would get an X% probability that the difference is real?" Parameter finders can answer questions like this, and are necessarily different for each distribution. They are implemented as static member functions of the distributions, for example: students_t::find_degrees_of_freedom( 1.3, // difference from true mean to detect 0.05, // maximum risk of falsely rejecting the null-hypothesis. 0.1, // maximum risk of falsely failing to reject the null-hypothesis. 0.13); // sample standard deviation Returns the number of degrees of freedom required to obtain a 95% probability that the observed differences in means is not down to chance alone. In the case that a borderline Students-t test result was previously obtained, this can be used to estimate how large the sample size would have to become before the observed difference was considered significant. It assumes, of course, that the sample mean and standard deviation are invariant with sample size. Summary
Now that you have the basics, the next section looks at some worked examples. |