what would it mean for x to be an unbiased estimate of the population mean

Terminology

In the context of estimation, a parameter is a fixed number associated with the population. That'southward the same equally the style we take used the term before: the parameter is a constant in the distribution of each sampled element.

For case, if the population consists of all U.South. adults, the parameter could be the average annual income in the population. We will denote this parameter past $\mu$ for "mean". Data scientists commonly utilise $\mu$ to stand for ways, in vastly dissimilar contexts. When you read the description of a model or an analysis and see the notation $\mu$, brand sure you understand exactly how it is divers in that context.

At present suppose you draw a random sample from the population. A statistic is any number computed based on the data in the sample. Thus for example the boilerplate income of the sampled people is a statistic.

In general, if $X_i$ represents the $i$th element in the sample, so a statistic is a function $g(X_1, X_2, \ldots, X_n)$. The sample average is the statistic $\bar{X}$ defined as the function

$$ \bar{X} ~ = ~ \frac{i}{n} \sum_{i=1}^n X_i $$

One important deviation between a parameter and a statistic, equally they have been divers above, is that a parameter is a fixed but possibly unknown number, whereas a statistic is a random variable. The value of the statistic depends on the elements that get randomly selected to exist in the sample.

In our example about incomes, the parameter $\mu$ is the average income in the whole population. Even if nosotros don't know what it is, it's a fixed number. The statistic $\bar{X}$ is the boilerplate income in the sample. This is a random quantity since information technology depends on $X_1, X_2, \ldots, X_n$ which are all random variables.

If a statistic is existence used to approximate a parameter, the statistic is sometimes called an estimator of the parameter.

Thus if you use the sample mean $\bar{X}$ to approximate the population hateful $\mu$, then $\bar{X}$ is an estimator of $\mu$.

This section is nearly a holding that is often – but not ever – considered desirable in an estimator.

Sample Mean

Suppose you desire to judge the hateful of a population based on a sample $X_1, X_2, \ldots, X_n$ drawn at random with replacement from the population.

It is natural to want to use the sample hateful $\bar{X}$ as an estimator of the population mean $\mu$. To see whether $\bar{X}$ is an unbiased computer of $\mu$ we have to summate its expectation. Nosotros can do this past using the linear part dominion and additivity.

$$ E(\bar{Ten}) ~ = ~ Eastward\big{(} \frac{one}{n}\sum_{i=i}^northward X_i \big{)} ~ = ~ \frac{1}{n}\sum_{i=1}^n Due east(X_i) ~ = ~ \frac{one}{n} \cdot n\mu ~ = ~ \mu $$

Thus $\bar{10}$ is an unbiased figurer of $\mu$.

Discover that in the calculation above we accept also discovered many other unbiased estimators of $\mu$.

For example, $X_1$ is an unbiased estimator of $\mu$ because $E(X_1) = \mu$. Indeed if you fix any $i$ and so $X_i$ is an unbiased reckoner of $\mu$.

Even though both $\bar{X}$ and $X_1$ are unbiased estimators, it seems similar a meliorate thought to use $\bar{X}$ to estimate $\mu$ than to use but $X_1$. Why throw away the residual of the data?

This intution is correct: information technology is indeed better to use $\bar{X}$, because it is likely to be closer to $\mu$ than $X_1$. Nosotros volition prove this later in the course. For now, just notation that the same sample can be used to construct more than one unbiased estimator for the parameter.

Sample Proportion

An important special case of the sample mean is when the population consists of zeros and ones.

You lot know that the sum of a sequence of zeros and ones is equal to the number of ones in the sequence. It follows that the boilerplate of a sequence of zeros and ones is the proportion of ones in the sequence.

Suppose a population has a proportion $p$ of ones and $1-p$ of zeros. Then the hateful of the population is $p$, the population proportion of ones.

Let $X_1, X_2, \ldots, X_n$ be draws at random with replacement from the population. Then $X_1, X_2, \ldots, X_n$ are independent identically distributed indicator random variables, each with chance $p$ of being ane.

The sample mean $\bar{X}$ is the sample proportion of ones, and is an unbiased reckoner of the population proportion of ones.

Notation that in this example the sample sum $S_n = X_1 + X_2 + \ldots + X_n$ is the number of ones in the sample and has the binomial $(northward, p)$ distribution. The sample mean is $\bar{X} = S_n/n$.

The graph below shows the relation between the sample proportion $\bar{X}$ and the population proportion $p$ in an example.

Suppose y'all roll a die 30 times and detect the sample proportion of sixes. The histogram below shows the results of 20,000 repetitions of this experiment. On boilerplate, the 20,000 sample proportions are almost indistinguishable from $p = 1/half dozen$.

Estimating the Largest Possible Value

Suppose $X_1, X_2, \ldots, X_n$ are contained and identically distributed (i.i.d.), each uniform on $ane, ii, three, \ldots, Northward$ for some fixed simply unknown $Northward$. Let us construct an unbiased reckoner of $N$.

The population hateful is $(N+i)/two$. If $\bar{X}$ is the sample hateful then

$$ Due east(\bar{X}) ~ = ~ \frac{N+ane}{two} $$

so $\bar{X}$ is not an unbiased figurer of $Northward$. We wouldn't expect it to be, considering $Northward$ is the largest whatever of the sampled elements could exist whereas $\bar{X}$ is probable to be somewhere in the eye of the sample.

But we tin run into that

$$ 2E(\bar{X}) - one ~ = ~ N $$

By the linear function rule,

$$ 2E(\bar{X}) - 1 ~ = ~ E(2\bar{X} - 1) $$

So the statistic $T = 2\bar{10} - 1$ is an unbiased estimator of $N$.

Globe War II Tanks

The calculation to a higher place stems from a problem the Allied forces faced in Earth War II. Germany had a seemingly never-ending fleet of Panzer tanks, and the Allies needed to approximate how many they had. They decided to base of operations their estimates on the serial numbers of the tanks that they saw.

Here is a picture of one from Wikipedia.

Panzer Tank

Notice the serial number on the height left. When tanks were disabled or destroyed, information technology was discovered that their parts had serial numbers besides. The ones from the gear boxes proved very useful.

The idea was to model the observed serial numbers equally random draws from $i, 2, \ldots, N$ and and so estimate $N$. This is of class a very simplified model of reality, and we will make some additional simplifications. Merely estimates based on even such simple probabilistic models proved to exist quite a bit more accurate than those based on the intelligence gathered by the Allies. For example, in August 1942, intelligence estimates were that Germany was producing 1,550 tanks per month. The prediction based on the probability model was 327 per month. After the state of war, German records showed that the bodily production rate was 342 per month.

The model was that the draws were made at random without replacement from the integers 1 through $Northward$. But for even more simplicity, permit's pretend that the draws were made with replacement. That is, if we saw the aforementioned tank twice and so we would record it twice.

In the example in a higher place, nosotros constructed the random variable $T$ to be an unbiased calculator of $N$.

The Allied statisticians instead started with $K$, the sample maximum:

$$ M ~ = ~ \max\{X_1, X_2, \ldots, X_n\} $$

The sample maximum $Thousand$ is not an unbiased estimator of $N$, considering we know that its value is always less than or equal to $Northward$. Its average value therefore will be somewhat less than $North$.

Simply how much less? The histograms below show a comparison of the two estimates in the case where $N=300$ and the sample size is $n=thirty$, based on v,000 repetitions of the sampling process. Of course the Allies didn't know $N$. Merely simulating the sample for "pretend" values of $Northward$ helps us understand how the estimators behave.

thornhillactem1984.blogspot.com

Source: http://stat88.org/textbook/notebooks/Chapter_05/04_Unbiased_Estimators.html

0 Response to "what would it mean for x to be an unbiased estimate of the population mean"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel