Calculating the Mean and Median of a Data Set

Question:

Data from 20 customers’ feedback on the services of three employees at a company with scores ranging from 1 to 10 are as follows:

Table 1: Scores on the performance of three employees of a service from 20 customers

Calculate the median and the mean of each data set. Provide comments on each employee’s performance.


Answer:

The median and mean are measures of central tendency. Each summarises the main feature of a data set.

Let \mathbf{x} be the data set, \mathbf{x} =\{x_1, x_2, \ldots, x_n\} , then its mean is its average value, \bar{x}, which is calculated as

 \texrm{(1)}\hspace{10px}\displaystyle{\bar{x}=\frac{\sum_{j=1}^{N}x_j}{N}=\frac{x_1+x_2+\cdots+x_N}{N}}

Median is the value in the middle of the data range; therefore, to compute the median the set has to be sorted into an ordered list first, then the value at the middle of the range is the set’s median. A general formula for finding median of an ordered data set \mathbf{X} is

 \texrm{(2a)}\hspace{10px}\displaystyle{Med(\mathbf{X})&=X_\frac{N+1}{2} \qquad \qquad \qquad if\; N\; is\; odd}

 \texrm{(2b)}\hspace{10px}\displaystyle{Med(\mathbf{X})=\frac{X_\frac{N}{2}\ \ +\ {\ X}_\frac{N+2}{2}}{2} \qquad if\;N\;is\;even}

where \mathbf{X} is the ordered list of data set \mathbf{x}

Calculations

Let \mathbf{x_1}, \mathbf{x_2}, \mathbf{x_3} be score sets of employee #1, #2, and #3, respectively, using Eq. (1), the respective mean of the three data sets are

 \begin{alignat*}{5} &\bar{x}_1 &&=\frac{6\:+\:7+4+\cdots+4}{20} &&=5.35 \\ &\bar{x}_2 &&=\frac{7+8+5+\cdots+6}{20} &&=6.35 \\ &\bar{x}_3 &&=\frac{5+6+3+\cdots+4}{20} &&=5.35 \end{align*}

To calculate the median, these data sets are sorted into order lists \mathbf{X_1}, \mathbf{X_2}, \mathbf{X_3}, respectively as shown in Table 2.

Table 2: Ordered lists, X1, X2 and X3, created by sorting data sets in Table 1 in ascending order; j is the position of a data value in the respective ordered lists.

Median of set #1 is

 \displaystyle{Median = \frac{X_\frac{N}{2} + X_\frac{N+2}{2}}{2} = \frac{6 + 6}{2} = 6}

Similarly, median of set #2 is 6, and set #3 is 5.

Comments

Set #1 and #2 have the same median value (6) despite employee #2 having better scores overall, which is reflected in the higher mean score value of 6.35.

Employee #3 received equal or lower scores than those of employee #1 in 17 out of 20 entries, therefore, had lower median (5), but thanks to 3 extreme scores (9 and 10) the mean values of the two sets are the same (5.35). Table 1 shows that while customer #5 could be seen as generous by giving high scores for all three employees, customers #9 and #15 show clear favor to employee #3. Unlike the median, the mean is greatly affected by extreme scores. On the other hand, the median alone may not reflect the true quality of the inspected data.

Extra Notes

A more meaningful way to compute the median is to consider the area under the graph or distribution of the data series, such that the line through this median divides the area exactly in half. This method of calculating the median is for continuous data series or grouped data sets but can be applied for other types of data series in general. The formula to calculate this median is

 \texrm{(3)}\hspace{10px}\displaystyle{Median &= L &+ I\left(\frac{\frac{N}{2} &-F}{f}\right)}

where

  • L = lower limit of the interval containing the median
  • I = width of the interval containing the median
  • N = total number of data
  • F = cumulative frequency at the lower limit of the interval containing the median. Cumulative frequency is the sum of all frequencies less than a reference value in a frequency distribution
  • f = the frequency at the interval containing the median.

The meaning of these symbols is illustrated in the following graph:

Figure 1: The median is the point such that the line through which divides the area exactly in half

Let’s apply Eq. 3 to find this new median for data set #1.

First, we calculate frequency f_i and cumulative frequency F_i for each score i(see Table 3)
The total number of data is 20, therefore N &= 20
The median is at the middle of the data list, that is at position \frac{N}{2}&=\frac{20}{2}&= 10, therefore, the interval at F_i &= 15 must contain the set’s median (colored in pink). Frequency f is in this interval and equal to 6.
F is the value before F_i= 15, therefore, F &= 9
Scores are 1, 2, 3, … 10, therefore I = 1 (the width of the interval containing f = 6)
In this exercise the interval for plotting score i is [i &- 0.5, i &+ 0.5] and value i is the middle point of the interval, therefore, L (the lower limit of interval i = 6) is 5.5.
Substitute these values into equation Eq. 3 we have

 \displaystyle{Median&= 5.5 &+ 1\left(\frac{\frac{20}{2}\ \ -\ \ 9}{6}\right)= 5.7}

The median of data set #1, and intermediate calculations are shown in Table 3; a visual presentation of these values are shown in Figure 2.

Table 3
Table 3: List of frequency fi and cumulative frequency Fi calculated for each score i for data set #1. Values of N, f, F, L are shown in matching background color. The median Med(X) is calculated using Eq. 3. Value of I is explained in the text.
Figure 2: Visual presentation of f, F, L, and Med(X) as calculated in Table 3

The medians of data set #2 and #3 can be calculated similarly, and results are shown in Table 4

Table 4
Table 4: List of frequency fi and cumulative frequency Fi calculated for each score i for data sets #2 and #3. Respective values of N, f, F, L are shown in matching background color. The median Med(X) is calculated using Eq. 3. Value of I is explained in the text.

Using this method, the medians for set #1, #2, and #3 are 5.7, 6.3 and 5.2, respectively, which reflects a more accurate perspective of the three sets.