Quantitative user researchers often need to answer such a question: Is there a significant difference between experimental and control groups?
Usually, this question can be answered by conducting a t-test. However, there is a prerequisite: the test statistic should follow a normal distribution.
The normal distribution can only be concluded when there is a large amount of data. What if I am not sure whether the data follows normal distribution or not?
The parametric and nonparametric test
When I was working on user research for the SPEAKER project, I also faced such a problem. I conducted a well-designed user experiment of Voice User Interface on three different user groups: A(neutral response), B(with the response of sentiment), and C(apologetic response). I collected data by asking users to fill a SASSI(Subjective Assessment of Speech System Interfaces) questionnaire afterward.
I would like to conduct a t-test to see if the variant somehow significantly influenced user experience, but my supervisor, who majored in mathematics in university, told me I cannot do so because our user pool was too small: only around 5 to 8 participants for each group. Based on this fact, she suggested me conducting a nonparametric test.
So what is the difference between parametric and nonparametric tests?
Generally speaking, a parametric test will always make the assumption, that the data follows a normal distribution. In contrast, the nonparametric model does not make any assumption of the data distribution, which made it easier for a small amount of data collection.
Mann–Whitney U test
After some research, I chose to conduct the Mann-Whitney U test (which is also called Mann–Whitney–Wilcoxon test). The benefit of it was that it is especially suitable for a small number of participants (5 to 20). The original question is to see if there is a difference between the two groups, to rephrase it in the U-test, the question would be: do the two groups come from the same dataset?
For my analysis, I used R to conduct this Mann-Whitney U test, and here is the tutorial from R which I taught myself of this test. It is very clear and easy to understand :)
Back to my experiment, the result was actually quite interesting: users like neutral responses over the other two categories of responses, and there was a significant difference of preference.
Bonferroni correction when conducting multiple comparisons
When making multiple comparisons — in my case, I had to compare the data from groups A and B, B and C also A and C, there is a potential risk for increasing the possibility of incorrectly rejecting a null hypothesis, which means an increased risk of wrongly conclude “There is a significant difference between user groups.” One solution for this is to conduct a Bonferroni correction.
To explain it in a simple way: because of the increasing risk of seeing a “fake significant difference”, we have to set the criteria more strictly. This means for example, when the desired result is α≤0.05, if I take 2 comparisons for each group (for my experiment, each group was compared twice), the Bonferroni correction would test each individual hypothesis at α=0.05/2=0.25 — when the α≤0.025, I can reject the null hypothesis, and it means I can conclude there is a significant difference between different groups.
Wrap up
To sum up: when trying to answer the question “Is there a significant difference between Group A and B?” if the data follows a normal distribution, we can conduct a parametric test; when it is unclear whether the data follows the normal distribution or not, it is better to conduct a nonparametric test. When there are multiple comparisons, a Bonferroni test could be useful to prevent type I error.