**Confidence Interval for Population Proportion basic understanding in python.**

Today I am going to explain about confidence intervals for population proportions. Before going though this article please go through the confidence interval theory.

**The first major topic is learning about proportions.**

In a sample of size *𝑁* there are *𝑀* “successes” (say, people who clicked on an advertisement) and *𝑁*−*𝑀 *“failures” (everyone else, who did not click on an advertisement). The **sample proportion** is then:

In fact, if your data *𝑥𝑖* is 1 for every “success” and 0 for every “failure”, then we can say:

*That is, the sample proportion is the sample mean of the dataset.*

Let’s say we want to know what proportion of visitors (including future visitors, not yet seen) will click on our ad based on previous data. How can we go from a sample proportion to a statement about the **population proportion**? then confidence interval comes into play.

**Confidence Interval for Population Proportion**

We can construct a **confidence interval**, an interval we believe will contain the true population proportion of visitors who click our ad. We have an interval with a lower and upper bound and we believe that the true population proportion is within this interval with some level of confidence. For a 95% confidence interval, we are “95% confident” the true proportion is in the interval (in the sense that such intervals contain the population proportion 95% of the time).

The classical way to construct this interval is to use the interval:

where *𝑧𝑝* is the 100×*𝑝*100×pth percentile of the Normal distribution. And alpha(α) is significance level.

In Python, the **statsmodels** package can be used for statistical computations such as computing a confidence interval.

Let’s suppose that on a certain website, out of 1126 visitors on a given day, 310 clicked on an ad purchased by a sponsor. Let’s construct a confidence interval for the *population* proportion of visitors who click the ad.

import statsmodels.api as sm310 / 1126 # Sample proportion

# Function for computing confidence intervals

from statsmodels.stats.proportion import proportion_confint

proportion_confint(count=310, # Number of "successes"

nobs=1126, # Number of trials

alpha=(1 - 0.95))

# Alpha, which is 1 minus the confidence levelOUTPUT:

(0.24922129423231776, 0.30140037539468045)

If we wanted a 99% confidence interval, we would have a wider interval, but more confidence that the true proportion lies in this interval.

proportion_confint(310, 1126, alpha=(1 - 0.99))OUTPUT:

(0.24102336643386685, 0.30959830319313136)

Now will use hypotheses testing for a business use case with problem statement.

The website administrator claims that 30% of visitors to the website click the advertisement. Is this true? The sample proportion does not match the administrator’s claim, but this does not discredit the claim.

We will do a **statistical test** to test the administrator’s claim. We test the **null hypothesis**:

(where *𝑝*p denotes the true proportion of visitors who click the ad on the site) against the **alternative hypothesis**:

How do we do this? We first compute a **test statistic**.

We then compute a *𝑝*-value, which can be interpreted as the probability of observing a test statistic at least as “extreme” as the test statistic actually observed. If the *𝑝*-value is small, we will reject *𝐻*0 and conclude that the administrator’s claim is false; the proportion of visitors who click the ad is not 0.3. If the *𝑝*-value is not small, then we do not reject *𝐻*0; the evidence from our data does not contradict his claim.

What counts as a “small” *𝑝*-value? Here, we will decide that if a *𝑝*-value is less than 0.05, then the *𝑝*-value is “small” and we reject the null hypothesis. If we see a *𝑝*-value greater than 0.05, we will not reject the null hypothesis. (We could have chosen a number other than 0.05; maybe 0.01 if we wanted to enter on the side of not contradicting the administrator.)

I now conduct the test and compute the *𝑝*-value.

# Performs the test just described

from statsmodels.stats.proportion import proportions_ztestres = proportions_ztest(count=310,

nobs=1126,

value=0.3, # The hypothesized value of population proportion p

alternative='two-sided') # Tests the "not equal to" alternative hypothesisOUTPUT:

(-1.8547614674673856, 0.063630296776840831)

# A tuple; the first entry is the value of the test statistic, and # the second is the p-value

Here, we got a test statistic of *𝑧*≈−1.85 and a *𝑝*-value of ≈0.0636>0.05. We conclude there is not enough statistical evidence to disagree with the website administrator.

**Testing for Common Proportions.**

The website decides to conduct an experiment. One day, the website shows its visitors different versions of an advertisement created by a sponsor. Users are randomly assigned to Version A and Version B. The website tracks how often Version A was clicked and how often Version B was clicked.

On this day, 516 visitors saw Version A of the ad, and 510 saw Version B. Of those who saw Version A, 108 clicked the ad, while 144 clicked Version B when shown.

Which ad generates more clicks?

Here we test the following hypotheses:

The test statistic for this test is:

where *𝑝*̂*𝐴* and *𝑝*̂*𝐵* are the sample proportions for group A and group B and *𝑝*̂ is the proportion from the pooled sample (grouping A and B together). `proportions_ztest()`

can perform this test.

import numpy as npproportions_ztest(count=np.array([108, 144]),

nobs=np.array([516, 510]),

alternative='two-sided')OUTUT:

(-2.7179204953199174, 0.0065693621488401655)

With a p-value of about 0.0066, which is small, we reject the null hypothesis; it appears that the two ads do not have the same proportion of clicks.

Thanks for reading. If you liked this article please follow me and share.