class: center, middle, inverse, title-slide # Point Estimation ### Sebastian Hoyos-Torres --- # What we will talk about in the next two weeks: - General Concepts - Methods of point estimation --- # The Basis of Statistical Inference - The reason people use statistics is because we often want to know characteristics of a population from the characteristics of a sample. - Often, this results in us taking samples (for multiple reasons) in order to estimate the population. - This results in an **estimator** which is a function defined on a sample space. The resulting value is called an **estimate**. - Since we want our estimator to be a sensible rule and to give estimates which are close in some way to the population characteristic we want to determine. But how do we know if our estimates are sensible? --- # The "Sniff Test" - A simple way of choosing your estimator is to pick the sample version of the parameter to be estimated: - This means that when we want to find the expected value of a population, we should use the sample mean - Likewise, if we are interested in the population standard deviation, we can use the sample standard deviation to generate estimates of the population. --- # Example - Suppose we had the following random sample of observations on coating thickness for low viscosity paint. Assuming the distribution of coating thickness is normal;find the following properties: ``` ## [1] 0.84 0.88 0.88 1.04 1.10 1.15 1.29 1.31 1.48 1.49 1.60 1.65 1.75 1.71 ## [15] 1.86 ``` - Calculate the point estimate for the mean value of coating thickness. - Calculate a point estimate for the median. - Calculate a point value for the 95th percentile of coating thickness - Estimate P(X < 1.6) - **As always we can do this in R** --- # Doing it in R - For the first problem, we just take the sample mean of the values of the coating thickness (which is done through mean) ```r mean(vectors) ``` ``` ## [1] 1.335333 ``` - The median is simply calculated as follows ```r median(vectors) ``` ``` ## [1] 1.31 ``` - of course to speed up some of the processes, we can simply use summary from base R ```r summary(vectors) ``` ``` ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 0.840 1.070 1.310 1.335 1.625 1.860 ``` --- # Example Solution continued - the point value for the 95th percentile of coating thickness is simply: ```r quantile(vectors,.95) ``` ``` ## 95% ## 1.783 ``` - And finally, for our point estimates regarding the mean, we can do a little R subsetting to speed things up for us ```r length(vectors[vectors <= 1.5])/length(vectors) ``` ``` ## [1] 0.6666667 ``` - **OR** since we have already determined that the population is normal ```r pnorm(1.5, mean = mean(vectors), sd = sd(vectors)) ``` ``` ## [1] 0.6847211 ``` --- # General concepts of point estimation - A point estimate of a parameter `\(\theta\)` is a single number which can be sensibly regarded as a sensible value of `\(\theta\)` which is often written as `\(\hat{\theta}\)` - The corresponding random variable is called an estimator. This is written as `\(\Theta\)` - Often, there is a simple and obvious estimator. - For binomial data, the parameter is p, the probability of success, and the obvious estimator is `\(\hat{P}= X/n\)`, the proportion of successes. - For a sample of continuous measurements from a distribution with mean `\(\mu\)` and variance `\(\sigma^2\)` we use the estimator `\(\hat{\mu}= \bar{X} = \frac{1}{n}\Sigma{x_i}\)`. The usual estimator for `\(\sigma^2\)` is `\(\hat{\sigma^2} = S^2\)` with value `\(s^2\)` --- # Example - Let's look at another example on dielectric breakdown voltage for pieces of epoxy resin. ``` ## [1] 24.46 25.65 26.25 26.50 26.66 27.20 27.30 27.54 27.76 27.94 27.90 ## [12] 28.05 28.29 28.50 28.87 29.11 29.13 29.50 30.90 31.37 ``` - Assuming breakdown voltage has a normal distribution with an unknown mean of `\(\mu\)` - We have 20 independent, identically distributed (iid) normal random variables `\(X_1,...,X_{20}\)` with mean `\(\mu\)`. --- --- # Example continued: - The sample mean ```r mean(dielectric) ``` ``` ## [1] 27.944 ``` - The sample median: ```r median(dielectric) ``` ``` ## [1] 27.92 ``` - the average of the largest and smallest values ```r mean(dielectric[c(1,20)]) ``` ``` ## [1] 27.915 ``` - a trimmed mean: ```r mean(dielectric,trim = 0.1) ``` ``` ## [1] 27.90625 ``` --- # Things to think about: - Which of the prior estimates are best for handling a normal distribution? - Any of these rules could get a value close to the real value in any sample occurrence. - Ideally, we want our estimator to estimate the correct value on average and not have too much variation around that value. - Simulation time! --- # The Simulation - Let's try running 5000 simulations with each simulation having a sample size of 20 with mean 20 and standard deviation 5. ```r mn1 <- mn2 <- mn3 <- mn4 <- c() for (i in 1:5000) { x <- rnorm(20,20,5) mn1[i] <- mean(x) mn2[i] <- median(x) mn3[i] <- mean(x, trim = .1) mn4[i] <- min(x) + max(x)/2 meanlist <- list(sample_mean = mn1,sample_median= mn2, trimmed_mean = mn3,min_max_mean = mn4) } par(mfrow = c(2,2)) lapply(meanlist, hist) ``` <img src="week7_pres_files/figure-html/unnamed-chunk-13-1.png" width="20%" style="display: block; margin: auto;" /> ``` ## $sample_mean ## $breaks ## [1] 15.5 16.0 16.5 17.0 17.5 18.0 18.5 19.0 19.5 20.0 20.5 21.0 21.5 22.0 ## [15] 22.5 23.0 23.5 24.0 24.5 25.0 ## ## $counts ## [1] 1 2 13 39 122 258 501 701 910 863 691 472 250 112 45 16 3 ## [18] 0 1 ## ## $density ## [1] 0.0004 0.0008 0.0052 0.0156 0.0488 0.1032 0.2004 0.2804 0.3640 0.3452 ## [11] 0.2764 0.1888 0.1000 0.0448 0.0180 0.0064 0.0012 0.0000 0.0004 ## ## $mids ## [1] 15.75 16.25 16.75 17.25 17.75 18.25 18.75 19.25 19.75 20.25 20.75 ## [12] 21.25 21.75 22.25 22.75 23.25 23.75 24.25 24.75 ## ## $xname ## [1] "X[[i]]" ## ## $equidist ## [1] TRUE ## ## attr(,"class") ## [1] "histogram" ## ## $sample_median ## $breaks ## [1] 14 15 16 17 18 19 20 21 22 23 24 25 26 ## ## $counts ## [1] 1 10 48 317 790 1357 1372 781 248 63 12 1 ## ## $density ## [1] 0.0002 0.0020 0.0096 0.0634 0.1580 0.2714 0.2744 0.1562 0.0496 0.0126 ## [11] 0.0024 0.0002 ## ## $mids ## [1] 14.5 15.5 16.5 17.5 18.5 19.5 20.5 21.5 22.5 23.5 24.5 25.5 ## ## $xname ## [1] "X[[i]]" ## ## $equidist ## [1] TRUE ## ## attr(,"class") ## [1] "histogram" ## ## $trimmed_mean ## $breaks ## [1] 15.5 16.0 16.5 17.0 17.5 18.0 18.5 19.0 19.5 20.0 20.5 21.0 21.5 22.0 ## [15] 22.5 23.0 23.5 24.0 24.5 25.0 ## ## $counts ## [1] 2 2 21 50 138 252 503 686 879 869 656 478 259 135 44 19 5 ## [18] 0 2 ## ## $density ## [1] 0.0008 0.0008 0.0084 0.0200 0.0552 0.1008 0.2012 0.2744 0.3516 0.3476 ## [11] 0.2624 0.1912 0.1036 0.0540 0.0176 0.0076 0.0020 0.0000 0.0008 ## ## $mids ## [1] 15.75 16.25 16.75 17.25 17.75 18.25 18.75 19.25 19.75 20.25 20.75 ## [12] 21.25 21.75 22.25 22.75 23.25 23.75 24.25 24.75 ## ## $xname ## [1] "X[[i]]" ## ## $equidist ## [1] TRUE ## ## attr(,"class") ## [1] "histogram" ## ## $min_max_mean ## $breaks ## [1] 12 14 16 18 20 22 24 26 28 30 32 34 36 ## ## $counts ## [1] 2 15 47 148 431 904 1313 1225 696 194 23 2 ## ## $density ## [1] 0.0002 0.0015 0.0047 0.0148 0.0431 0.0904 0.1313 0.1225 0.0696 0.0194 ## [11] 0.0023 0.0002 ## ## $mids ## [1] 13 15 17 19 21 23 25 27 29 31 33 35 ## ## $xname ## [1] "X[[i]]" ## ## $equidist ## [1] TRUE ## ## attr(,"class") ## [1] "histogram" ``` --- # Unbiasedness: - An estimator `\(\Theta\)` for the parameter `\(\theta\)` is said to be unbiased if `$$E[\hat{\Theta}] = \theta$$` - The sample mean `\(\bar{X}\)` is an unbiased estimator for `\(\mu\)` if we can assume that `\(X_1,...,X_n\)` are a random sample (independent and identically distributed) - The sample median is an unbiased estimator for `\(\mu\)` if we can assume that `\(X_1,...,X_n\)` are a random sample and the distribution of the `\(X_i's\)` is continuous and symmetrical. - The sample variance `\(S^2 = \frac{(\Sigma{X_i - \bar{X})^2}}{n-1}\)` is an unbiased estimator for `\(\sigma^2\)` with a random sample from a normal population. - **NOTE** even when `\(S^2\)` is an unbiased estimator for `\(\sigma^2\)`, `\(\sqrt{S^2}\)` is **NOT** an unbiased estimator for `\(\sigma\)`. --- # More Notes on Unbiasedness - Every time we take a random sample from a population we get different values. We use those to compute an estimate `\(\hat{\Theta_i}\)` of some population parameter `\(\theta\)`. We will almost never know the value of `\(\theta\)`. - `\(\hat{\Theta_i}\)` is calculated from DATA so it in turn also has its own probability distribution. - Usually, we want as unbiased an estimator as we can get. - formally, if our `\(\hat{\Theta_i}\)` is not systematically over or underestimating `\(\theta\)`, then we typically refer to it as an unbiased estimator. --- # Variance of estimators: - Given two unbiased estimators, we generally prefer the one with the smallest variance. - Occasionally, it is possible to prove mathematically that an estimator is a minimum variance unbiased estimator. This means it has the minimum variance among the class of unbiased estimators so it should be good to use. - Generally, the desirability of an estimator depends on the form of the underlying distribution. When working with actual data, we often don't know the distribution though. - Let's try it out! --- # Simulation example: - Suppose we have multiple populations of interest following the cauchy, normal and uniform distributions as follows (normal is red, cauchy is blue). <img src="week7_pres_files/figure-html/unnamed-chunk-14-1.png" style="display: block; margin: auto;" /> --- # The Simulation continued: Let's continue with a for loop ```r mns1 <- mns2 <- mns3 <- c() for(i in seq_along(1:5000)){ mns1[i] <- mean(rnorm(200, 6,1)) mns2[i] <- mean(rcauchy(200,6,1)) mns3[i] <- mean(runif(200,3,9)) meanslist <- list(normal_means = mns1,cauchy_means = mns2,unif_means = mns3) } par(mfrow = c(1,3)) lapply(meanslist,hist) ``` <img src="week7_pres_files/figure-html/unnamed-chunk-15-1.png" style="display: block; margin: auto;" /> ``` ## $normal_means ## $breaks ## [1] 5.75 5.80 5.85 5.90 5.95 6.00 6.05 6.10 6.15 6.20 6.25 6.30 ## ## $counts ## [1] 9 61 327 798 1303 1302 787 338 60 14 1 ## ## $density ## [1] 0.036 0.244 1.308 3.192 5.212 5.208 3.148 1.352 0.240 0.056 0.004 ## ## $mids ## [1] 5.775 5.825 5.875 5.925 5.975 6.025 6.075 6.125 6.175 6.225 6.275 ## ## $xname ## [1] "X[[i]]" ## ## $equidist ## [1] TRUE ## ## attr(,"class") ## [1] "histogram" ## ## $cauchy_means ## $breaks ## [1] -1600 -1400 -1200 -1000 -800 -600 -400 -200 0 200 400 ## [12] 600 800 1000 1200 1400 ## ## $counts ## [1] 1 0 0 0 1 0 4 252 4735 4 1 1 0 0 ## [15] 1 ## ## $density ## [1] 0.000001 0.000000 0.000000 0.000000 0.000001 0.000000 0.000004 ## [8] 0.000252 0.004735 0.000004 0.000001 0.000001 0.000000 0.000000 ## [15] 0.000001 ## ## $mids ## [1] -1500 -1300 -1100 -900 -700 -500 -300 -100 100 300 500 ## [12] 700 900 1100 1300 ## ## $xname ## [1] "X[[i]]" ## ## $equidist ## [1] TRUE ## ## attr(,"class") ## [1] "histogram" ## ## $unif_means ## $breaks ## [1] 5.55 5.60 5.65 5.70 5.75 5.80 5.85 5.90 5.95 6.00 6.05 6.10 6.15 6.20 ## [15] 6.25 6.30 6.35 6.40 6.45 ## ## $counts ## [1] 2 8 25 53 131 301 469 666 844 773 683 472 316 151 63 30 11 ## [18] 2 ## ## $density ## [1] 0.008 0.032 0.100 0.212 0.524 1.204 1.876 2.664 3.376 3.092 2.732 ## [12] 1.888 1.264 0.604 0.252 0.120 0.044 0.008 ## ## $mids ## [1] 5.575 5.625 5.675 5.725 5.775 5.825 5.875 5.925 5.975 6.025 6.075 ## [12] 6.125 6.175 6.225 6.275 6.325 6.375 6.425 ## ## $xname ## [1] "X[[i]]" ## ## $equidist ## [1] TRUE ## ## attr(,"class") ## [1] "histogram" ``` --- # Further look at the distributions and their estimates: ```r fnctlist <- list(sample_mean = mean,sample_sd = sd) unlist(lapply(fnctlist,function(f){lapply(meanslist,f)})) ``` ``` ## sample_mean.normal_means sample_mean.cauchy_means sample_mean.unif_means ## 6.0005443 5.9078670 6.0023537 ## sample_sd.normal_means sample_sd.cauchy_means sample_sd.unif_means ## 0.0705207 36.0687311 0.1209933 ``` From the prior plot, we should begin to notice that the mean is a horrendous estimator for the cauchy distribution. --- --- # Standard Error of an Estimator - Once we compute a point estimator, often, we are interested in how precise that estimate would be from sample to sample - The standard deviation of the estimator is a reasonable measure to use as it measures the dispersion surrounding the distribution. The standard deviation is called the standard error or the estimator. - For a binomial model, the estimator of success probability ,p, `\(\hat{P} = \frac{X}{n}\)` has a standard deviation `\(\sqrt{\frac{p(1-p)}{n}}\)`. This depends on p which we often don't know and are trying to estimate. - For a normal (or near normal model), to estimate `\(\mu\)`, we use estimator `\(\bar{X}\)` whose standard deviation is `\(\frac{\sigma}{\sqrt{n}}\)` - To address unknown parameters in the population, we often rely on the estimated standard error of the estimator which is calculated by: For binomially distributed data `$$\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}$$` or for normally distributed data: `\(\frac{s}{\sqrt{n}}\)` --- # The Formulation of Estimators - We have discussed how to compare different estimators based off of a parameter based on their expected values and variances. - However, knowing how to compare estimators does not help us in formulating an estimator. - Two methods : maximum likelihood estimate and method of moments --- # Method of Moments - Formal Definition Let `\(X_1,...,X_n\)` be a random sample from a pmf or pdf. For k = 1,2,3,...,the kth population moment, is `\(E(X^k)\)`. The sample moment is `\(\frac{1}{n}\Sigma{X_i^k}\)` Let `\(X_1,...,X_n\)` be a random sample from a pmf/ pdf `\(f(x,\Theta_1,...\Theta_m)\)` are parameters whose values are unknown. Then the moment estimators are obtained by equating the first m sample moments and solving for `\(\Theta_1,...,\Theta_m\)` --- # An example from the gamma distribution ```r x <- c(152,115,109,94,88,137,152,77,160,165,125,40,128,123,136,101,62,153, 83,69) (n <- length(x)) ``` ``` ## [1] 20 ``` ```r (EX <- sum(x)/n) #1st sample moment ``` ``` ## [1] 113.45 ``` ```r (EX2 <- sum(x^2)/n) ``` ``` ## [1] 14087.75 ``` ```r (alphahat <- -(EX^2/(EX^2-EX2))) ``` ``` ## [1] 10.57725 ``` ```r (betahat <- -(-EX2 + EX^2)/EX) ``` ``` ## [1] 10.72585 ``` --- # Problems with Methods of Moments - Problems that arise with Method of moments estimators is the math can get complicated pretty quickly. For example; let's look at the weibull distribution. Starting with the equations `$$\mu = \beta \Gamma{(1 + \frac{1}{\alpha})}$$` `$$\sigma^2 = \beta^2[\Gamma(1+\frac{2}{\alpha})-[\Gamma(1 + \frac{1}{\alpha})^2]]$$` - Methods of moments typically use a few quantiles from the distribution and the data to formulate estimators. The values of the estimators do not have to be consistent with the distribution. --- # The Maximum Likelihood Estimation: - The essence of the maximum likelihood estimate is that we want to find the value of `\(\theta\)` which maximizes the probability/likelihood of getting the data observed. - R.A. Fisher suggested that to avoid some of the problems involved in maximum likelihood estimation, we can consider the joint density of the responses as a function of the parameters with the data fixed. `$$L(\theta|y) = f(y|\theta)$$` where L is the likelihood, f is the probability density, y is the vector of responses, and `\(\theta\)` is the vector of parameters for the distribution. - Fisher argued that we should choose estimates of parameters which provided the greatest likelihood of seeing the data that we did. In other words `$$\hat{\theta} = argmaxL(\theta|y)$$` --- # Maximum Likelihood - Likelihood function: Let f(x,$\theta$) be the probability density function for a discrete or continuous probability distribution for random variable X. Suppose `\(X_1,...,X_n\)` are the actual observed values from a random sample of size n from X. Then the likelihood function is `$$L(x_1,x_2,...,x_n,\theta) = L(\theta|x_1,x_2,...,x_n) = \Pi_{i = 1}^nf(x_i,\theta)$$` considered as a function of `\(\theta\)` with the `\(x_i's\)` fixed. `\(\theta\)` can be multidimensional. --- # Example from (https://onlinecourses.science.psu.edu/stat414/node/191/) Suppose the weights of randomly selected American female college students are normally distributed with unknown mean μ and standard deviation σ. A random sample of 10 American female college students yielded the following weights (in pounds): ```r weights <- c(115,122,130,127,149,160,152,138,149,180) ``` Based on the definitions given above, identify the likelihood function and the maximum likelihood estimator of μ, the mean weight of all American female college students. Using the given sample, find a maximum likelihood estimate of μ as well. --- # Example worked out - In chapter 4 we talked briefly about normal random variables and touched briefly upon their pdfs. - If we recall, the probability density function for the normal random variable was simply `$$f(x;\mu,\sigma) = \frac{1}{\sigma\sqrt{2\pi}}e^{\frac{(x-\mu)^2}{2\sigma^2}}$$` The Likelihood function then becomes: `$$L(\theta_1,\theta_2) = \sigma^{-n}2(\pi)^{-n/2}exp[-\frac{1}{2\sigma^2}\Sigma(x_i-\mu)^2]$$` Which ultimately results in: `$$\frac{1}{n}\Sigma{x_i}$$` or ```r sum(weights)/length(weights) ``` ``` ## [1] 142.2 ``` --- # Properties of the Maximum Likelihood Estimate: - The Likelihood or log likelihood measures the suitability of parameters for the probability model applied to the data. - For distributions with one or two parameters, we can examine the log-likelihood function or the contours of the function to get an idea surrounding the precision of the estimates. - There is typically a way to associate a confidence level with particular contours of the log-likelihood. - The invariance principle (see p 260 of the text.) - MLE's are the best estimators which can be formulated.