15 LAB V: Foundations for Statistical Inference

When we have finished this Lab, we should be able to:

Learning objectives

Understand the hypothesis testing
Know how to apply the one sample z-test

In this Lab we are going to test whether the sample mean and the mean of the population of a variable of interest differ significantly (Figure 15.1) using the theory of hypothesis testing.

Figure 15.1: Example of a simple research question

We are going to answer in the research question with the help of a Shiny application. So, clink on the following link Z-Test.

A Shiny app opens in a web window as shown below (Figure 15.2):

Figure 15.2: The Shiny application for hypothesis testing

To the left is the interactive panel with drop down menus, radio buttons and slider bars, and to the right is the place where the results appear (Figure 15.2).

15.1 The Research question

Research question: Does a new experimental treatment affect the number of days it takes for patients to recover from a disease?

As reported by hospitals across the nation, the patients in the population who received the usual care have a mean number of days to recover equal to 8 (μ= 8 days) with standard deviation 2 days (σ= 2 days). Additionally, assume that the the variable of “number of days for recovering from the disease” is normally distributed.

As we run the experiment, we track how long it takes for each patient to fully recover from the disease and we record for 10 patients the following values: 7, 6, 9, 8, 8, 7, 7, 6, 8, 6.

We summarize the previous information in Table 15.1.

Table 15.1: Organize the information of the example
Research question	Does a new experimental treatment affect the number of days it takes for patients to recover from a disease?
Variable of interest	Time to recover from the disease in days (discrete numerical variable; normally distributed).
Population group (μ=8 days, σ=2 days)	Patients in the population received the usual care as reported by hospitals across the nation.
Treatment group	Ten patients received the experimental medical treatment with the following times in days to recover from the disease: 7, 6, 9, 8, 8, 7, 7, 6, 8, 6

15.2 Applying the theory of Hypothesis Testing (One Sample Z-Test)

Step 1: Determine the appropriate null hypothesis and alternative hypothesis

The null hypothesis, \(H_{0}\), states that the treatment group and the population group will recover from the disease in about the same number of days, on average (\(μ = 8\)).

The alternative hypothesis, \(H_{1}\), states that the experimental medical treatment will affect the number of days it takes for patients to recover from the disease. The two groups will differ significantly (\(μ \neq 8\)).

First, we select from the drop down menu Inference for one mean. Then we set the Null hypothesis: \(H_{0}: μ= 8\) and the Alternative \(\neq\) (two sided test). We also check the box of variance of the population, where we type \(\sigma^2 = 4\), as shown below (Figure 15.3):

Figure 15.3: Settings for one sample Z-test

Step 2: Set the level of significance, α

We set the value α=0.05 for the level of significance (type I error).

At the slider bar we keep the default value α=0.05 for the level of significance (Figure 15.4):

Figure 15.4: We set the value α=0.05 for the level of significance

Step 3: Identify the appropriate test statistic and check the assumptions. Calculate the test statistic.

Supposing the distribution of number of days until the recover is Normal, the appropriate test statistic is the z-statistic:

\[z = \frac{\bar{x} - \mu_{o}}{SE_{z}}= \frac{\bar{x} - \mu_{o}}{\sigma/ \sqrt{n}} \] To calculate the z-statistic we also need the data from the treatment group.

In the Sample box type the values of the treatment group 7, 6, 9, 8, 8, 7, 7, 6, 8, 6 (Figure 15.5):

Figure 15.5: The sample data of the treatment group

The sample mean is equal to \(\bar{x} = 7.2\) days. Therefore:

\[z = \frac{\bar{x} - \mu_{o}}{\sigma/ \sqrt{n}}= \frac{7.2 - 8}{2/ \sqrt{10}}= \frac{- 0.8}{2/ \sqrt{10}}= \frac{- 0.8}{0.632}= -1.265\]

Step 4: Decide whether or not the result is statistically significant

When we run the analysis, we get a z-value and a p-value.

The z-value is a measure of how different the two groups are on our variable of interest.
A p-value is the probability of obtaining the observed results, or something more extreme, if the null hypothesis is true. In our example, the chance of observing our results assuming that the new treatment does not differ from the usual care in population regarding the number of recovery days.

A p-value less than 0.05 means that our result is statistically significant and there is evidence that the difference in recovery days is not due to chance alone (reject \(H_{0}\)).

A p-value greater than or equal to 0.05 means that our result is not statistically significant. There is no evidence of a significant difference between the mean recovery days in the treatment group and the population (not reject \(H_{0}\)).

In our example, we observe that the p-value for a two-sided test is equal to 2*0.103= 0.206 > 0.05 (Figure 15.6). So, the sample mean and the mean of the population does not differ significantly in days to recover from the disease, we can not reject the null hypothesis, \(H_{0}\).

Figure 15.6: The blue vertical line of Test stististic=-1.265 is outside of the rejection region. Therefore, we failed to reject Ho.

Step 5: Interpret the results

There is no evidence that the new experimental treatment affect significantly the number of days it takes for patients to recover from the disease compared to the usual care in the population.

Finally, we can calculate the 95%CI of mean as following:

\[ 95\%CI= \bar{x} \ \pm 1.96 \frac{\sigma}{\sqrt{n}}= 7.2 \ \pm 1.96 \frac{2}{\sqrt{10}}= 7.2 \ \pm \frac{3.92}{3.162} = 7.2 \ \pm 1.24 = [5.96, 8.44]\] Note that, in our example, the value of population mean (μ=8) is included in the range of values of the 95%CI.