Math_Stats
[ class tree: Math_Stats ] [ index: Math_Stats ] [ all elements ]

Class: Math_Stats

Source Location: /Math_Stats-0.9.1/Math/Stats.php

Class Overview


A class to calculate descriptive statistics from a data set.


Author(s):

Version:

  • 0.9

Methods


Inherited Variables

Inherited Methods


Class Details

[line 119]
A class to calculate descriptive statistics from a data set.

Data sets can be simple arrays of data, or a cummulative hash. The second form is useful when passing large data set, for example the data set:

 $data1 = array (1,2,1,1,1,1,3,3,4.1,3,2,2,4.1,1,1,2,3,3,2,2,1,1,2,2);

can be epxressed more compactly as:

 $data2 = array('1'=>9, '2'=>8, '3'=>5, '4.1'=>2);

Example of use:

 include_once 'Math/Stats.php';
 $s = new Math_Stats();
 $s->setData($data1);
 // or
 // $s->setData($data2, STATS_DATA_CUMMULATIVE);
 $stats = $s->calcBasic();
 echo 'Mean: '.$stats['mean'].' StDev: '.$stats['stdev'].' 
\n'; // using data with nulls // first ignoring them: $data3 = array(1.2, 'foo', 2.4, 3.1, 4.2, 3.2, null, 5.1, 6.2); $s->setNullOption(STATS_IGNORE_NULL); $s->setData($data3); $stats3 = $s->calcFull(); // and then assuming nulls == 0 $s->setNullOption(STATS_USE_NULL_AS_ZERO); $s->setData($data3); $stats3 = $s->calcFull();

Originally this class was part of NumPHP (Numeric PHP package)



[ Top ]


Method Detail

Math_Stats (Constructor)   [line 176]

object Math_Stats Math_Stats( [optional $nullOption = STATS_REJECT_NULL])

Constructor for the class
  • Access: public

Parameters:

optional   $nullOption   —  int $nullOption how to handle null values

[ Top ]

absDev   [line 777]

mixed absDev( )

Calculates the absolute deviation of the data points in the set Handles cummulative data sets correctly

[ Top ]

absDevWithMean   [line 800]

mixed absDevWithMean( numeric $mean)

Calculates the absolute deviation of the data points in the set given a fixed mean (average) value. Not used in calcBasic(), calcFull() or calc().

Handles cummulative data sets correctly

  • Return: the absolute deviation on success, a PEAR_Error object otherwise
  • See: Math_Stats::absDev()
  • See: __sumabsdev()
  • Access: public

Parameters:

numeric   $mean   —  the fixed mean value

[ Top ]

calc   [line 326]

mixed calc( int $mode, [boolean $returnErrorObject = true])

Calculates the basic or full statistics for the data set

Parameters:

int   $mode   —  one of STATS_BASIC or STATS_FULL
boolean   $returnErrorObject   —  whether the raw PEAR_Error (when true, default), or only the error message will be returned (when false), if an error happens.

[ Top ]

calcBasic   [line 349]

mixed calcBasic( [boolean $returnErrorObject = true])

Calculates a basic set of statistics

Parameters:

boolean   $returnErrorObject   —  whether the raw PEAR_Error (when true, default), or only the error message will be returned (when false), if an error happens.

[ Top ]

calcFull   [line 373]

mixed calcFull( [boolean $returnErrorObject = true])

Calculates a full set of statistics

Parameters:

boolean   $returnErrorObject   —  whether the raw PEAR_Error (when true, default), or only the error message will be returned (when false), if an error happens.

[ Top ]

center   [line 295]

mixed center( )

Transforms the data by substracting each entry from the mean.

This will reset all pre-calculated values to their original (unset) defaults.


[ Top ]

coeffOfVariation   [line 1144]

mixed coeffOfVariation( )

Calculates the coefficient of variation of a data set.

The coefficient of variation measures the spread of a set of data as a proportion of its mean. It is often expressed as a percentage. Handles cummulative data sets correctly


[ Top ]

count   [line 626]

mixed count( )

Calculates the number of data points in the set Handles cummulative data sets correctly
  • Return: the count on success, a PEAR_Error object otherwise
  • See: Math_Stats::calc()
  • Access: public

[ Top ]

frequency   [line 1206]

mixed frequency( )

Calculates the value frequency table of a data set.

Handles cummulative data sets correctly


[ Top ]

geometricMean   [line 992]

mixed geometricMean( )

Calculates the geometrical mean of the data points in the set Handles cummulative data sets correctly

[ Top ]

getData   [line 217]

mixed getData( [boolean $expanded = false])

Returns the data which might have been modified according to the current null handling options.
  • Return: array of data on success, a PEAR_Error object otherwise
  • See: _validate()
  • Access: public

Parameters:

boolean   $expanded   —  whether to return a expanded list, default is false

[ Top ]

harmonicMean   [line 1030]

mixed harmonicMean( )

Calculates the harmonic mean of the data points in the set Handles cummulative data sets correctly

[ Top ]

interquartileMean   [line 1273]

mixed interquartileMean( )

The interquartile mean is defined as the mean of the values left after discarding the lower 25% and top 25% ranked values, i.e.:

interquart mean = mean(<P(25),P(75)>)

where: P = percentile

  • Return: a numeric value on success, a PEAR_Error otherwise
  • See: Math_Stats::quartiles()
  • Todo: need to double check the equation
  • Access: public

[ Top ]

interquartileRange   [line 1311]

mixed interquartileRange( )

The interquartile range is the distance between the 75th and 25th percentiles. Basically the range of the middle 50% of the data set, and thus is not affected by outliers or extreme values.

interquart range = P(75) - P(25)

where: P = percentile


[ Top ]

kurtosis   [line 859]

mixed kurtosis( )

Calculates the kurtosis of the data distribution in the set The kurtosis measures the degrees of peakedness of a distribution.

It is also called the "excess" or "excess coefficient", and is a normalized form of the fourth central moment of a distribution. A normal distributions has kurtosis = 0 A narrow and peaked (leptokurtic) distribution has a kurtosis > 0 A flat and wide (platykurtic) distribution has a kurtosis < 0 Handles cummulative data sets correctly


[ Top ]

max   [line 451]

mixed max( )

Calculates the maximum of a data set.

Handles cummulative data sets correctly


[ Top ]

mean   [line 651]

mixed mean( )

Calculates the mean (average) of the data points in the set Handles cummulative data sets correctly

[ Top ]

median   [line 891]

mixed median( )

Calculates the median of a data set.

The median is the value such that half of the points are below it in a sorted data set. If the number of values is odd, it is the middle item. If the number of values is even, is the average of the two middle items. Handles cummulative data sets correctly


[ Top ]

midrange   [line 967]

mixed midrange( )

Calculates the midrange of a data set.

The midrange is the average of the minimum and maximum of the data set. Handles cummulative data sets correctly


[ Top ]

min   [line 427]

mixed min( )

Calculates the minimum of a data set.

Handles cummulative data sets correctly$this->_data[0]


[ Top ]

mode   [line 927]

mixed mode( )

Calculates the mode of a data set.

The mode is the value with the highest frequency in the data set. There can be more than one mode. Handles cummulative data sets correctly


[ Top ]

percentile   [line 1427]

mixed percentile( numeric $p)

The pth percentile is the value such that p% of the a sorted data set is smaller than it, and (100 - p)% of the data is larger.

A quick algorithm to pick the appropriate value from a sorted data set is as follows:

  • Count the number of values: n
  • Calculate the position of the value in the data list: i = p * (n + 1)
  • if i is an integer, return the data at that position
  • if i < 1, return the minimum of the data set
  • if i > n, return the maximum of the data set
  • otherwise, average the entries at adjacent positions to i
The median is the 50th percentile value.


Parameters:

numeric   $p   —  the percentile to estimate, e.g. 25 for 25th percentile

[ Top ]

product   [line 548]

numeric|array|PEAR_Error product( )

Calculates PROD { (xi) }, (the product of all observations) Handles cummulative data sets correctly
  • Return: the product as a number or an array of numbers (if there is numeric overflow) on success, a PEAR_Error object otherwise
  • See: Math_Stats::productN()
  • Access: public

[ Top ]

productN   [line 571]

numeric|array|PEAR_Error productN( numeric $n)

Calculates PROD { (xi)^n }, which is the product of all observations Handles cummulative data sets correctly
  • Return: the product as a number or an array of numbers (if there is numeric overflow) on success, a PEAR_Error object otherwise
  • See: Math_Stats::product()
  • Access: public

Parameters:

numeric   $n   —  the exponent

[ Top ]

quartileDeviation   [line 1336]

mixed quartileDeviation( )

The quartile deviation is half of the interquartile range value

quart dev = (P(75) - P(25)) / 2

where: P = percentile


[ Top ]

quartiles   [line 1237]

mixed quartiles( )

The quartiles are defined as the values that divide a sorted data set into four equal-sized subsets, and correspond to the 25th, 50th, and 75th percentiles.
  • Return: an associative array of quartiles on success, a PEAR_Error otherwise
  • See: Math_Stats::percentile()
  • Access: public

[ Top ]

quartileSkewnessCoefficient   [line 1387]

mixed quartileSkewnessCoefficient( )

The quartile skewness coefficient (also known as Bowley Skewness), is defined as follows:

quart skewness coeff = (P(25) - 2*P(50) + P(75)) / (P(75) - P(25))

where: P = percentile

  • Return: a numeric value on success, a PEAR_Error otherwise
  • See: Math_Stats::quartiles()
  • Todo: need to double check the equation
  • Access: public

[ Top ]

quartileVariationCoefficient   [line 1359]

mixed quartileVariationCoefficient( )

The quartile variation coefficient is defined as follows:

quart var coeff = 100 * (P(75) - P(25)) / (P(75) + P(25))

where: P = percentile

  • Return: a numeric value on success, a PEAR_Error otherwise
  • See: Math_Stats::quartiles()
  • Todo: need to double check the equation
  • Access: public

[ Top ]

range   [line 672]

mixed range( )

Calculates the range of the data set = max - min
  • Return: the value of the range on success, a PEAR_Error object otherwise.
  • Access: public

[ Top ]

sampleCentralMoment   [line 1075]

mixed sampleCentralMoment( integer $n)

Calculates the nth central moment (m{n}) of a data set.

The definition of a sample central moment is:

m{n} = 1/N * SUM { (xi - avg)^n }

where: N = sample size, avg = sample mean.

  • Return: the numeric value of the moment on success, PEAR_Error otherwise
  • Access: public

Parameters:

integer   $n   —  moment to calculate

[ Top ]

sampleRawMoment   [line 1111]

mixed sampleRawMoment( integer $n)

Calculates the nth raw moment (m{n}) of a data set.

The definition of a sample central moment is:

m{n} = 1/N * SUM { xi^n }

where: N = sample size, avg = sample mean.

  • Return: the numeric value of the moment on success, PEAR_Error otherwise
  • Access: public

Parameters:

integer   $n   —  moment to calculate

[ Top ]

setData   [line 189]

mixed setData( array $arr, [optional $opt = STATS_DATA_SIMPLE])

Sets and verifies the data, checking for nulls and using the current null handling option
  • Return: true on success, a PEAR_Error object otherwise
  • Access: public

Parameters:

array   $arr   —  the data set
optional   $opt   —  int $opt data format: STATS_DATA_CUMMULATIVE or STATS_DATA_SIMPLE (default)

[ Top ]

setNullOption   [line 236]

mixed setNullOption( $nullOption)

Sets the null handling option.

Must be called before assigning a new data set containing null values

  • Return: true on success, a PEAR_Error object otherwise
  • See: _validate()
  • Access: public

Parameters:

   $nullOption   — 

[ Top ]

skewness   [line 822]

mixed skewness( )

Calculates the skewness of the data distribution in the set The skewness measures the degree of asymmetry of a distribution, and is related to the third central moment of a distribution.

A normal distribution has a skewness = 0 A distribution with a tail off towards the high end of the scale (positive skew) has a skewness > 0 A distribution with a tail off towards the low end of the scale (negative skew) has a skewness < 0 Handles cummulative data sets correctly


[ Top ]

stdErrorOfMean   [line 1181]

mixed stdErrorOfMean( )

Calculates the standard error of the mean.

It is the standard deviation of the sampling distribution of the mean. The formula is:

S.E. Mean = SD / (N)^(1/2)

This formula does not assume a normal distribution, and shows that the size of the standard error of the mean is inversely proportional to the square root of the sample size.


[ Top ]

stDev   [line 718]

mixed stDev( )

Calculates the standard deviation (unbiased) of the data points in the set Handles cummulative data sets correctly

[ Top ]

stDevWithMean   [line 758]

mixed stDevWithMean( numeric $mean)

Calculates the standard deviation (unbiased) of the data points in the set given a fixed mean (average) value. Not used in calcBasic(), calcFull() or calc().

Handles cummulative data sets correctly


Parameters:

numeric   $mean   —  the fixed mean value

[ Top ]

studentize   [line 259]

mixed studentize( )

Transforms the data by substracting each entry from the mean and dividing by its standard deviation. This will reset all pre-calculated values to their original (unset) defaults.

[ Top ]

sum   [line 476]

mixed sum( )

Calculates SUM { xi } Handles cummulative data sets correctly

[ Top ]

sum2   [line 498]

mixed sum2( )

Calculates SUM { (xi)^2 } Handles cummulative data sets correctly

[ Top ]

sumN   [line 521]

mixed sumN( numeric $n)

Calculates SUM { (xi)^n } Handles cummulative data sets correctly

Parameters:

numeric   $n   —  the exponent

[ Top ]

variance   [line 698]

mixed variance( )

Calculates the variance (unbiased) of the data points in the set Handles cummulative data sets correctly

[ Top ]

varianceWithMean   [line 742]

mixed varianceWithMean( numeric $mean)

Calculates the variance (unbiased) of the data points in the set given a fixed mean (average) value. Not used in calcBasic(), calcFull() or calc().

Handles cummulative data sets correctly


Parameters:

numeric   $mean   —  the fixed mean value

[ Top ]


Documentation generated on Mon, 11 Mar 2019 15:39:18 -0400 by phpDocumentor 1.4.4. PEAR Logo Copyright © PHP Group 2004.