Tuesday, December 10, 2019

Statistical Models for Data

Question: Describe about the Report for Statistical Models for Data. Answer: 1: In the problem a random variable z is considered which is said to follow a standard normal distribution. The standard normal distribution has mean equal to 0 units and a standard deviation equal to 1 unit. The probability that the value of z lie within the interval (a, b) is given by = 0.95 The value of a is given -2. The value of b has to be calculated from the equation so that the probability will be 0.95. The calculation is shown below: The value of b has to be calculated from the standard normal tables for which the probability is 0.97275. The value of b is 1.91. In the next problem one has to find the value of b for which there will be no solution in the equation for the variable b. The variable b will have no solution if the probability of b is less than zero. The probability function always has values greater than or equal to zero. The value of b is determined solving the following equation. Therefore no such value exists for which 0.95+P[za] is less than zero. Then the value of P[za] has to be less than -0.95. But P[za] cannot have negative value being probability. Therefore, such value of b do not exist for which P[zb] do not have any solution. b.In the second part, an interval (a, b] needs to be calculated for which the interval length (b-a) will be the shortest. Let us take a random value of b to 3.9. Then P [zb] is 0.99995. In order to get 0.95, 0.0495 has to be subtracted from the result. The value of z for which the probability is 0.0495 is -1.67. Therefore the length of the interval is 3.9 - -1.67 = 5.57. Now the value of a for which the probability is almost equal to 0 is -3.99. The probability is 0.0003. Then the probability of b needs to be 0.95 and the value of b is 1.6. Therefore it can be seen that the interval will get smaller as one takes the value of b to be smaller and the value of a to be larger. The smallest value of b having probability greater than 0.95 is 1.65 and the value of a has to be so chosen that the probability is nearly equal to zero. Consider the following table: A B c.d.f Length of interval -1.6 3.9 1 0.05 = 0.95 5.5 -1.7 2.3 .99 - .04 =0.95 4 -1.8 2.0 .98 - .03 =0 .95 3.8 -2.0 1.8 0.97 - .02 = 0.95 3.8 2.3 1.7 0.96 - .01 = 0.95 4 3.99 1.6 0.95 - .00 = 0.95 5.5 Therefore from the table it can be concluded that the smallest value of the interval (a, b] for which the interval has the shortest length is (2.0, -1.8] and the length is approximately 3.8. 2: The scores of students for the assignment have been given. On the basis of the scores a 90% confidence interval has to be constructed for the average scores. The confidence interval can be constructed by considering the distribution to be approximately normal with mean value and standard deviation . The sample mean is denoted by x-bar and the sample standard deviation by s. The size of the sample is denoted by n which is equal to 20. Then the confidence interval for the mean value is given by the following formula: C.I = ( x-bar - 1.96 * s/sqrt(n) , x-bar + 1.96 * s/ sqrt(n) ). The value of x-bar is 11.25 and the sample standard deviation is 2.14905. The confidence interval is calculated to be (10.301813, 12.19186). The confidence interval specifies an interval within which the confidence coefficient value is expected to lie. The phrase 95% confidence interval for x-bar actually signifies that the probability the estimate from the observed values will lie within the interval is 0.95. The tolerance interval on the other hand gives the interval within which a specified proportion of the population lies with certain confidence (Liao, Lin Iyer, 2012). The tolerance interval is given by the following formula: b.Tolerance interval = x-bar s * k2 , where, k2 is a constant factor of two sided confidence interval. The k2 value for tolerance limit = 95, and sample size n= 20 is 3.895. Therefore, the value of the tolerance interval is (2.87944558, 19.62055). The interpretation of tolerance interval is simple. The length of the interval suggests that the probability that .95 portion of the future values of the population that will lie inside the interval is 0.95. 3. The assignment scores of 20 students in the assignment are given. Out of them 11 students has got a score more than 10. The proportion of people who has got a score more than 10 is 0.55. On the basis of the data a confidence interval based on the population proportion has to be calculated. The variance of the population proportion is given by the following formula: The confidence interval is given by the following formula: C.I = ( p - z * sqrt(p (1- p)/n) , p + z * sqrt(p (1- p)/n) ). The value of p is the estimated sample proportion. The value is equal to 0.55 in case of the given dataset. Z is the tabulated value from the standard normal distribution and at 95% the value is 1.96 for a two sided confidence interval. The confidence interval is calculated to be (0.76803624 , 0.33196376). In the second part the size of the sample needs to be determined so that the confidence interval is doubled. This means that *2 Therefore the value of n calculated by the above formula is 6.512621. 4: There are 3 random variables: B, P and N. The random variable B follows a binomial distribution with parameters n=100 and p= 0.001m. The variable P follows a poisson distribution with parameters = m and N follows a Normal distribution with mean m and variance equals to m * (100 0.001m). The values that were missing in the table are given below: Distribution Variable(x values) Parameters Probability density function/Probability Mass function Binomial m=0 100,0.001 0.366 Poisson m=1 1 0.367879 Normal m=1 (1.5,2.5) 1, 1*(100-0.001) 0.003989 Poisson m=50 49 0.0557 Normal m=50(49.5,50.5) 50 ,50* (100 0.001 * 50) 0.00798 Binomial m=99 100, 0.001*99 0.0003697 Normal m=99(97.5, 98.5) 99, 99 * (100 0.001* 99) 0 Normal m=99(99.5, 100.5) 99, 99 * (100 - .001*99) 0.0399 The binomial distribution can be approximated to normal distribution by CLT. This happens when the sample size is large more than 100.The normal distribution is a continuous distribution. The probability at any point of a continuous distribution is equal to zero. The discrete distribution has probability only at certain points. In order to convert a binomial distribution into normal distribution, a correction for continuity is required. The binomial distribution takes values only at the points 0,1,2,3,..,n where n is the size of the sample. These points can be termed as x. When the binomial distribution is converted into Poisson distribution, then x takes vales in the interval(x , x + ). Then the binomial variable has mean np and variance equals to np(1-p). The normal variable is given by the following formula: z= (x-np)/np(1-p) The binomial distribution can be approximated into standard normal distribution if the value of p is very small and the value of n is large enough. Then the density and distribution functions cannot be calculated for binomial distribution. But the binomial distribution tends to follow a Poisson distribution with parameter lambda = np. There is a rule for this transformation. The value of n and p should be so chosen that np 10. In the case of this problem all the values of np is greater than 10. Therefore this binomial distribution can be converted to Poisson distribution. The conversion is shown below. - When n- and p- 0 np -. The variance used in the normal distribution is given by the formula .001m * (100 m). This is given by the formula p * (n m). The variance of the binomial variable was np * (1-p). The normal variance has been derived from the binomial variance(Huber-Carol et al., 2012). 5: A sample having size 5 is taken from a population having the following density function: f(x) = (1+x)^(-1- ) The value of theta has to be estimated from on the basis of the observed values of the population. The value can be estimated by solving the likelihood equation. The likelihood equation is given by the formula: L () = So the likelihood function is the product of the density functions. The likelihood function can be converted into a simple function by taking logarithmic transformation. This does not affect the values as the transformation is one to one. The estimated value and standard error calculation of theta are shown below: L() = 5ln( ) (1 + ) * The first order derivative of the log likelihood equation is: L() = 5/ The estimate of theta is obtained by equating the first derivative of the above equation to zero. The second order derivative is used to calculate the standard error of the parameter. The second order derivative is given by the formula: l``( ) = -5/ ^2 The standard error of a parameter is obtained from the second order derivative of the likelihood function. The expected value of the negative of the second order derivative of the likelihood function at the point theta-hat (estimated value of theta) gives the fishers information matrix. The Fisher information matrix gives the standard error. Estimated value of theta = 55.6235. The value of the likelihood function by taking the estimated value of population parameter is 0.001616045. The standard error in the measurement of the parameter has been calculated to be 0.00161045. Reference: Huber-Carol, C., Balakrishnan, N., Nikulin, M., Mesbah, M. (Eds.). (2012).Goodness-of-fit tests and model validity. Springer Science Business Media. Liao, C. T., Lin, T. Y., Iyer, H. K. (2012). One-and two-sided tolerance intervals for general balanced mixed models and unbalanced one-way random models.Technometrics.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.