Hypothesis Testing
What is Hypothesis Testing?
A process to determine if a hypothesis is statistically significant. It involves comparing a baseline assumption about a population to observed sample data. The outcome dictates whether the initial assumption should be rejected in favor of an alternative assumption based on a strict, predefined probability threshold.
What are the Null and Alternative Hypotheses?
The Null Hypothesis is the default statement asserting that there is no relationship, no difference, or no effect between groups or variables in a dataset. The Alternative Hypothesis is the opposing statement asserting that a specific relationship or difference does exist. The testing process mathematically evaluates the data to determine if there is enough evidence to reject the Null Hypothesis.
What is a p-value and how does it determine the result?
A p-value is a calculated probability. It represents the likelihood of obtaining the observed sample data if the Null Hypothesis is completely true. The evaluator compares this p-value against a predetermined significance level. If the p-value is lower than this level, the result is deemed statistically significant, which leads to the rejection of the Null Hypothesis.
What are the final conclusions of this testing process?
There are exactly two possible outcomes: rejecting the Null Hypothesis or failing to reject the Null Hypothesis.
Rejecting it implies that the sample data provides sufficient statistical evidence to support the Alternative Hypothesis.
Failing to reject it means the data lacks sufficient evidence to support the Alternative Hypothesis; however, it does not definitively prove that the Null Hypothesis is true.
In which programming languages is Hypothesis Testing implemented?
Hypothesis testing is standardly executed using statistical programming languages, most notably R and Python. In Python, libraries such as scipy.stats and statsmodels provide specific built-in functions to conduct these tests. These libraries automatically calculate the required test statistics and p-values directly from the provided datasets, eliminating the need for manual mathematical computation.
A practical data science example of using Hypothesis Testing
In a data science project analyzing e-commerce website traffic, a developer uses A/B testing to evaluate a new checkout page design. The Null Hypothesis states that the new design has no effect on the user conversion rate compared to the old design. The developer collects purchase data from user sessions for both page versions and uses the scipy library in Python to run a statistical test. If the resulting p-value is below the defined significance level, the developer concludes that the new design causes a statistically significant difference in conversion rates and approves the permanent deployment of the new page.