Student's t-distribution

probability distribution

Student's t-distribution is a probability distribution which was developed by William Sealy Gosset in 1908. Student is the pseudonym he used when he published the paper describing the distribution.[1][2][3]

Student's t
Probability density function
Cumulative distribution function
Parameters ν > 0 degrees of freedom (real)
Support x ∈ (−∞; +∞)
Probability density function (pdf)
Cumulative distribution function (cdf)
where 2F1 is the hypergeometric function
Mean 0 for ν > 1, otherwise undefined
Median 0
Mode 0
Variance for ν > 2, ∞ for 1 < ν ≤ 2, otherwise undefined
Skewness 0 for ν > 3, otherwise undefined
Excess kurtosis for ν > 4, ∞ for 2 < ν ≤ 4, otherwise undefined
Entropy
Moment-generating function (mgf) undefined
Characteristic function for ν > 0

A normal distribution describes a full population, t-distributions describe samples drawn from a full population; accordingly, the t-distribution for each sample size is different, and the larger the sample, the more the distribution resembles a normal distribution.

The t-distribution plays a role in many widely used statistical analyses, including the Student's t-test for assessing the statistical significance of the difference between two sample means, the construction of confidence intervals for the difference between two population means, and in linear regression analysis. The Student's t-distribution also arises in the Bayesian analysis of data from a normal family.

History

change

Gosset worked at a brewery and was interested in the problems of small samples, for example the chemical properties of barley. In the problems he analyzed, the sample size might be as low as three. Because of the small sample size, estimating the standard deviation is not possible. Also, in many cases Gosset encountered, the probability distribution of the samples was not known.

One version of the origin of the pseudonym is that Gosset's employer preferred staff to use pen names (instead of their real name) when publishing scientific papers, so he used the name "Student" to hide his identity. Another version is that the brewery did not want their competitors to know that they were using the t-test to test the quality of raw material.[4]

Properties

change

If we take a sample of n observations from a normal distribution, then the t-distribution with ν = n−1 degrees of freedom can be defined as the distribution of the location of the sample mean  , relative to the true mean  , divided by the sample standard deviation   over the normalizing term   (that is,  ).[5] In this way, the t-distribution can be used to estimate how likely it is that the true mean lies in any given range.

The t-distribution is symmetric and bell-shaped, like the normal distribution, but has heavier tails, meaning that it is more prone to producing values that fall far from its mean.[6] This makes it useful for understanding the statistical behavior of certain types of ratios of random quantities, in which variation in the denominator is amplified and may produce outlying values when the denominator of the ratio falls close to zero. The Student's t-distribution is a special case of the generalised hyperbolic distribution.

change

References

change
  1. "Student" (William Sealy Gosset), original Biometrika paper as a scan
  2. "Student" [William Sealy Gosset] (March 1908). "The probable error of a mean" (PDF). Biometrika. 6 (1): 1–25. doi:10.1093/biomet/6.1.1.{{cite journal}}: CS1 maint: numeric names: authors list (link)
  3. Weisstein, Eric W. "Student's t-Distribution". mathworld.wolfram.com. Retrieved 2020-09-14.
  4. Mortimer, Robert G. (2005) Mathematics for Physical Chemistry, Academic Press. 3 edition. ISBN 0-12-508347-5 (page 326)
  5. "List of Probability and Statistics Symbols". Math Vault. 2020-04-26. Retrieved 2020-09-14.
  6. "1.3.6.6.4. t Distribution". www.itl.nist.gov. Retrieved 2020-09-14.