Chapter 1 Introduction to Statistics 1.1 Preliminary Definitions Definition 1.1. Data are observations (such as measurements, genders, survey responses) that have been collected. Definition 1.2. Statistics is a collection of methods for planning studies and experiments, obtaining data, and then organizing, summarizing, presenting, analyzing, interpreting, and drawing conclusions based on data. Definition 1.3. A Population is the entire collection of individuals or measurements about which information is desired. Definition 1.4. A Sample is a subset of the population that has been selected for study. Definition 1.5. A statistic is a numerical description of a SAMPLE. Definition 1.6. A parameter is a numerical description of a POPULATION. Definition 1.7. Statistical Inference consists of methods of techniques for generalizing from a sample to the population from which the sample is selected. Definition 1.8. Sampling Variability describe the extent to which samples differ from one another. 1 2 1.2 Framework of Statistics Population Sample Parameter statistic 3 Idea for a Confidence Interval 0 1 2 3 4 5 6 7 8 9 10 4 Idea for a Hypothesis Test 0 1 2 3 4 5 6 7 8 9 10 Chapter 2 Probability Remark 2.1. The information regarding probability can be found in Chapter 4 of your textbook. • How do we measure likeliness? 1 • How do we determine what is considered (un)likely? Definition 2.1. The Probability of an event is • • 0 Definition 2.2. A significance level α is the largest probability an unlikely event can have. 5 6 2.1 Definitions & Examples Definition 2.3. The result of a single trial of a given procedure is called an outcome. Definition 2.4. An event is any collection of results of outcomes of a procedure. Definition 2.5. A simple event is an outcome or an event that cannot be further broken down into simpler components. Definition 2.6. The sample space for a procedure consists of all possible simple events. Example 2.1. A bucket contains some numbered balls. Eventually, one ball will be removed at random. 1. Find the sample space for this procedure. 2. Let A denote the event that the outcome is even. Describe A in terms of simple events. Example 2.2. Two numbered balls are removed individually from a bucket. Replacing each after they are removed. The numbers on the balls are written down. 1. Find the sample space for this procedure. 2. Let B denote the event that the outcome at least one ball is a three. Describe B in terms of simple events. 3. Is the event “ at least one ball is a three” a simple event? 4. If the balls were added together, what would be the sample space? 7 2.2 Some Methods for Computing Probabilities of an Event There are three approaches to determining the probability of an event: 1. Subjective Probabilities 2. Relative Frequency Approximation 3. Classical Approach Theorem 2.1. The Law of Large Numbers states that as a procedure is repeated again and again, the relative frequency approximation for the probability of an event tends to approach the actual probability. 8 Example 2.3. 65 men and women were surveyed. They were asked “Which do you like better: Pollen or Propolis?” The answers are tallied below. Pollen Propolis Total Men 12 14 26 Women 24 15 39 Total 36 29 65 a. What is the probability that a randomly selected survey respondent will be a woman? Pollen Propolis Total Men 12 14 26 Women 24 15 39 Total 36 29 65 b. What is the probability that a randomly selected survey respondent will prefer pollen? Pollen Propolis Total Men 12 14 26 Women 24 15 39 Total 36 29 65 c. If you consider only the female responses, what is the probability that you would randomly select one of the women that prefer pollen? Pollen Propolis Total Men 12 14 26 Women 24 15 39 Total 36 29 65 Example 2.4. A colored ball is removed, at random, from a bucket. What is the probability that the ball will be green? 9 Example 2.5. Two fair four-sided dice is rolled. What is the probability that both numbers will be even? Example 2.6. 256 fair four-sided dice are rolled. What is the probability that all the numbers will be even? 10 2.3 2.3.1 Counting Fundamental Counting Rule Given two sequential events, if the first can occur m ways, and the second event can occur n ways, then the number of ways both events can occur in sequence is equal to m × n. Example 2.7. An airline has 6 routes from city A to City B, and 9 routes from City B to City C. If you were to take this use this airline, how many routes could you take from City A to City C? Example 2.8. How many ways can a family with 6 members be lined up to take a family portrait? 11 2.3.2 Order or no order? Repeats or not? Definition 2.7. Permutations of items are arrangements in which different sequences of the same items are counted separately. Definition 2.8. Combinations of items are arrangements in which different sequences of the same items are not counted separately. Selecting r of n distinct objects. Repeats Unordered Ordered No Repeats n Cr nr n Pr 12 Example 2.9. You have 4 extra tickets for a concert and 7 friends. How many different groups of your friends could accompany you to the concert? Example 2.10. You have three astronauts, Anna, George, and Michele, on the first Mission to Mars. For the first Marswalk, two of them will be allowed to leave their flying saucer, and walk on the planet; one will have to remain behind. How many different ways can they be assigned a job for their first landing? If they are randomly given their assignment, what is the probability that George will be left on the ship? Example 2.11. How many five letter words can be made with the letters F, S, H, E. A letter can be used more than once. What is the probability that a five letter ’word’ will start with the letter F ? Only the letters F, S, H, E can be used. Letters can be repeated. When some Items are Identical to Others - Another Permutation Rule Example 2.12. How many different ways can the letters in TENNESSEE be arranged? If these letters are randomly arranged, what it the probability that they will spell TENNESSEE? 13 2.4 The Addition Rule for Probabilities Definition 2.9. A compound event is any event combining two or more simple events. Notation 2.1. More notation that will be used • (A or B) = • (A and B) = Formal Addition Rule P (A or B) = P (A) + P (B) − P (A and B) Example 2.13. Suppose the following: P (A) = .9, P (B) = .8, P (A and B) = .77. Find P (A or B). Example 2.14. In a group of 101 students 40 are juniors, 50 are female, and 22 are female juniors. Find the probability that a student picked from this group at random is either a junior or female. 14 Example 2.15. A family of 6 is going to have their picture taken. The photographer is going to randomly line everyone up. What is the probability that the mother ends up in the first chair or the father ends up in the sixth chair? Example 2.16. A single card is chosen at random from a standard deck of 52 playing cards. What is the probability of choosing a king or a club? Example 2.17. Two dice are rolled. The first is a fair 6-sided die. The second is a fair 4-sided die. Once they are rolled, the two numbers on the two disc are used to create a 2 digit number. The number from the six sided die is used to make the 10s digit. The number from the 4-sided die is used to make the ones digit. What it the probability that the resulting number is odd or begins with an even number? 15 Definition 2.10. Events A and B are disjoint( or mutually exclusive) if they cannot occur at the same time. (That is, they do not overlap.) Probability of the Intersection of Two Disjoint Events If events A and B are disjoint, then P (A and B) = Addition Rule for DISJOINT Events If events A and B are disjoint, then P (A or B) = P (A) + P (B) − P (A and B) = Example 2.18. Suppose that A and B are disjoint events such that the following is true: P (A) = .9, P (B) = .06. Find P (A or B). Example 2.19. In a group of 201 students 70 are freshmen, 41 are sophomores, 30 are junior, 50 are seniors, and 10 are graduate students. Find the probability that a student picked from this group at random is either a freshman or sophomore. Example 2.20. A family of 6 is going to have their picture taken. The photographer is going to randomly line everyone up. What is the probability that the mother ends up in the first chair or the father ends up in the first chair? 16 Example 2.21. Two dice are rolled. The first is a fair 6-sided die. The second is a fair 4-sided die. Once they are rolled, the two numbers on the two disc are used to create a 2 digit number. The number from the six sided die is used to make the 10s digit. The number from the 4-sided die is used to make the ones digit. What it the probability that the resulting number is odd or ends with a 2? Example 2.22. A bucket contains some bouncy balls that are colored as well as numbered. The following table indicates the number of each kind of ball in the bucket. Yellow Green Orange Red Blue Brown Purple Total Odd 4 6 3 23 2 7 71 116 Even 45 68 13 25 9 7 11 178 Total 49 74 16 48 11 14 82 294 1. If a ball is randomly chosen, what is the probability that the ball will be blue even ball or a purple odd ball? 2. If a ball is randomly chosen, what is the probability that the ball will be blue or purple? 3. If a ball is randomly chosen, what is the probability that the ball will be even, or purple? 17 Rule for Complimentary Events Example 2.23. Find the indicated probabilities. ¯ 1. Suppose P (A) = .23. Find P (A). ¯ = .12, P (B) ¯ = .21, P (C) ¯ = .22. Find P (B). 2. Suppose P (A) Example 2.24. Same bucket as used in Example 2.22. What is the probability that a randomly selected ball is neither brown nor even? Yellow Green Orange Red Blue Brown Purple Total Odd 4 6 3 23 2 7 71 116 Even 45 68 13 25 9 7 11 178 Total 49 74 16 48 11 14 82 294 18 Example 2.25. Two dice are rolled. The first is a fair 6-sided die. The second is a fair 4-sided die. Once they are rolled, the two numbers on the two disc are used to create a 2 digit number. The number from the six sided die is used to make the 10s digit. The number from the 4-sided die is used to make the ones digit. What it the probability that the resulting number is not 42? Example 2.26. A single card is chosen at random from a standard deck of 52 playing cards. What is the probability of choosing neither a king nor a club? Example 2.27. In a group of 700 families, 75 had more than 3 children, 125 had exactly 3 children, 300 had 2 children, and 100 had only a single child. If one family is randomly selected, what is the probability that it will have no children? 19 2.5 Conditional Probability & The Multiplication Rule Definition 2.11. Let A and B be two events. The conditional probability of A given B, P (A|B), is the probability that A happens given the information that B occurs. It is the probability of an event with the additional information that some other event has already occurred. Denoted by P (B|A). Example 2.28. A bucket contains some bouncy balls that are colored as well as numbered. The following table indicates the number of each kind of ball in the bucket. The contents of the bucket are separated into two buckets, odds and evens. If we randomly select a single ball from the odd bucket, what is the probability that the ball is red? Yellow Green Orange Red Blue Brown Purple Total Odd 4 6 3 23 2 7 71 116 Even 45 68 13 25 9 7 11 178 Total 49 74 16 48 11 14 82 294 Example 2.29. Two dice are rolled. The first is a fair 6-sided die. The second is a fair 4-sided die. Once they are rolled, the two numbers on the two disc are used to create a 2 digit number. The number from the six sided die is used to make the 10s digit. The number from the 4-sided die is used to make the ones digit. If a two appeared on the 6-sided die, what it the probability that the resulting number is odd? Example 2.30. Five cards are dealt from a freshly shuffled deck of cards. Suppose the first four cards are kings, what it the probability that the fifth card will be an ace? 20 Definition 2.12. Two events A and B are independent if the occurrence of one event does not affect the probability of the occurrence of the other event. If A and B are not independent, they are said to be dependent. If two events A and B are independent, then • P (A|B) = P (A) • P (B|A) = P (B) • P (B and A) = P (A)P (B) Example 2.31. Given two events A and B. Suppose that P (A|B) = .8 and P (A) = .81. Are the events A and B independent? Example 2.32. Given two independent events A and B. Suppose that P (B) = .8 and P (A) = .42. Find P (B and A). Example 2.33. An urn contains 2 colored balls: 1 blue & 1 red. If two balls are removed, one at a time, replacing each after it is drawn. What is the probability that the second ball is red, if the first was blue? Example 2.34. An urn contains 2 colored balls: 1 blue & 1 red. If two balls are removed, one at a time, without replacing each after it is drawn. What is the probability that the second ball is red, if the first was blue? Remark 2.2. The method used for selecting, or sampling items, is very important and can determine whether two events are independent or dependent. • Selections (Sampling) without replacement: Dependent events. • Selections (Sampling) with replacement: Independent events. 21 Formal Multiplication Rule P (A and B) = P (A) × P (B|A) Example 2.35. A bucket contains several colored bouncy balls, red,yellow and blue. One at a time, two balls are removed from the bucket. After the first ball is removed, it will not be replaced. What is the probability that the first ball is red and the second bouncy ball is green. Example 2.36. A bucket contains several colored bouncy balls, red,yellow and blue. One at a time, two balls are removed from the bucket. After the first ball is removed, it is replaced. What is the probability that the first ball is red and the second bouncy ball is green. Example 2.37. If two cards are dealt from a deck without replacing them, what is the probability that an ace will be dealt first and a two will be dealt second? Example 2.38. If two cards are dealt from a deck with replacement, what is the probability that an ace will be dealt first and a two will be dealt second? 22 2.5.1 More Conditional Probability Definition 2.13. Let A and B be two events. The conditional probability of A given B, P (A|B), is the probability that A happens given the information that B occurs. It is the probability of an event with the additional information that some other event has already occurred. P (B|A) = P (A and B) P (A) Example 2.39. A statistics professor tosses two coins that cannot be seen by any of the students. One student asks: ” Did one of the coins turn up heads?” Suppose the professor answered “yes”, find the probability that both coins turned up heads. Example 2.40. An urn contains 3 colored balls: 2 blue & 1 red. If two balls are removed, one at a time, without replacing each after it is drawn. What is the probability that the second ball is red, if the first was blue? 23 Example 2.41. A student answers a multiple choice examination question that has 4 possible answers. Suppose that the probability that the student knows the answer to the question is 0.80 and the probability that the student guesses is 0.20. Also, If the student guesses, the probability of a correct guess is 0.25. If the question is answered correctly, what is the probability that the student really knew the correct answer? Chapter 3 Probability & Random Variables Remark 3.1. This is chapter 5 in the textbook. Our goal is to compute probabilities for Random Procedures/Phenomenon whose outcomes are numbers Definition 3.1. A random variable is a variable (typically represented by x) that has a single numerical value, determined by chance, for each outcome of a procedure. A random variable is a variable whose value is a numerical outcome of a random procedure/phenomenon. Example 3.1. Examples of Random Variables. • The weight of a randomly selected package taken from the post office. • The amount of time it takes to walk from the first floor to the fourth floor. • The temperature of a randomly selected popsicle. • The amount of money you spend on your next tank of gas. • The number of lunches served in the cafeteria on a given day. • The color of a ball pulled out of a bucket. 24 25 There are two ways to assign probabilities to a random variable. These provide two types of random variables: Definition 3.2. A Continuous Random Variable has infinitely many values, and the collection of values is not countable. Definition 3.3. A Discrete Random Variable has a collection of possible values that is finite or countable. • Random variables will usually ( but not always ) be denoted by capital letters from the end of the alphabet. • When a random variable describes a random phenomenon, the sample space S lists the possible values of the random variable. Definition 3.4. A Probability Distribution is a description that gives the probability for each possible value of a random variable. It is often expressed as a table, a formula, or a graph. Examples of Probability Distributions Example 3.2. A bucket contains 4 green, 3 brown and 3 purple bouncy balls. A ball is randomly selected from the bucket. We check the color of the ball. (We could say that we count the number of green balls observed.) 26 Example 3.3. A bucket contains 4 green, 3 brown and 3 purple bouncy balls. One at a time, four balls are randomly removed, and replaced, from the bucket. We count the number of green balls observed. Definition 3.5. A Binomial Probability Distribution results from a procedure that meets all the following requirements: a.) The procedure has a fixed number of trials. A trial is a single observation. b.) The trials must be independent. The outcome of any one trial has no affect on the probabilities in the other trials. c.) Each trial must have all outcomes classified into two categories (commonly referred to as success and failure). d.) The probability of a success remains the same for all trials. If X has the Binomial distribution B(n, p) with n observations and probability p of success on each experiment, or observation, the possible values of X are 0, 1, 2, . . . , n. If k is any one of these values, the binomial probability is P (X = k) =n Ck pk (1 − p)n−k . The mean and standard deviation of a binomial random variable X is µ = np σ= p np(1 − p) 27 Example 3.4. A coin is tossed four times. 1. What is the probability distribution of the discrete random variable X that counts the number of heads? 2. Find P (X > 1). 3. Find P (X ≥ 1). 4. Find P (X ≤ 1). Remark 3.2. A Binomial Probability Distribution results from a procedure that meets all the following requirements: a.) The procedure has a fixed number of trials. A trial is a single observation. b.) The trials must be independent. The outcome of any one trial has no affect on the probabilities in the other trials. c.) Each trial must have all outcomes classified into two categories (commonly referred to as success and failure). d.) The probability of a success remains the same for all trials. 28 Definition 3.6. If X has the Poisson distribution, P oisson(µ), with mean number of occurrences equal to µ, the possible values of X are 0, 1, 2, 3, . . . . If k is any one of these values, the Poisson probability is P (x) = µx e−µ . x! The mean is µ. The standard deviation of a Poisson random variable X is σ = √ µ. Remark 3.3. A Poisson Probability Distribution results from a procedure that meets all the following requirements: a.) The random variable counts the number of occurrences of an event over a time interval; b.) The occurrences must be random, independent, and uniformly distributed over the time interval. Example 3.5. Assume that the mean number of aircraft accidents in the United States is 8.5 per month. Use the Poisson distribution to find the probability that in a month there will be a.) 6 aircraft accidents. b.) at least 5 aircraft accidents., c.) no more than 7 aircraft accidents. d.) Over a one year period, how many aircraft accidents would you expect there to be? 29 PDF vs CDF More Examples of Random Variables - Continuous • The probability distribution of X is described by a density curve (a graph). • The probability of any event is the area under the density curve and above the x axis, and between the values of X that make up the event. • The total area under a density curve is equal to 1, and a density curve never goes below the x-axis. • Every individual outcome for a continuous random variable has probability zero. 30 Definition 3.7. A continuous random variable has a uniform distribution if its values are spread evenly over the range of possible values. The density curve (graph) of a uniformly distributed random variable is a rectangle. Example 3.6. The amount of time a particular subway train will wait at a station is uniformly distributed between 5 and 10 minutes. Find the probability that the train will wait 1. exactly 6 minutes. 2. at most 6 minutes. 3. at least 7 minutes 31 Definition 3.8. A continuous random variable X has a normal distribution with mean µ and standard deviation σ if its density curve is given by y=√ 1 x−µ 2 1 e− 2 ( σ ) . 2πσ µ+3σ µ+2σ x value µ+1σ µ µ−1σ µ−2σ µ−3σ Density Normal Distribution µ+3σ µ+2σ x value µ+1σ µ µ−1σ µ−2σ µ−3σ Density Normal Distribution µ+2σ µ+3σ µ+2σ µ+3σ x value µ+1σ µ µ−1σ µ−2σ µ−3σ Density Normal Distribution x value µ+1σ µ µ−1σ µ−2σ µ−3σ Density Normal Distribution • The probability distribution of X is described by a density curve (a graph). • The probability of any event is the area under the density curve and above the x axis, and between the values of X that make up the event. • The total area under a density curve is equal to 1, and a density curve never goes below the x-axis. • Every individual outcome for a continuous random variable has probability zero. 32 Example 3.7. The heights of fully grown white oak trees are normally distributed with a mean height of 90 feet and standard deviation of 3.5 feet. 1. What is the probability that a randomly selected fully grown white oak tree is less than 87 feet tall? 2. What is the probability that a randomly selected fully grown white oak tree is greater than 94 feet tall? Example 3.8. The ACT is an exam used by colleges and universities to evaluate undergraduate applicants. The test scores are normally distributed. In a recent year, the mean test score was 20.1 and the standard deviation was 4.3. 1. What is the probability that a randomly selected ACT score is between 16 and 24? 2. What is the probability that a randomly selected ACT score is greater then 22.5? 33 0.0 0.1 0.2 0.3 0.4 t−distributions −3 −2 −1 0 1 2 3 X Definition 3.9. A continuous random variable X has a t-distribution with k degrees of freedom, if its density curve is given by − k+1 2 Γ k+1 x2 2 y=√ . 1+ k k kπΓ 2 0.0 0.1 0.2 0.3 0.4 t−distributions −3 −2 −1 0 1 2 3 2 3 2 3 2 3 X 0.1 0.2 0.3 t−distributions −3 −2 −1 0 1 X 0.1 0.2 0.3 t−distributions −3 −2 −1 0 1 X 0.1 0.2 0.3 t−distributions −3 −2 −1 0 1 X • The probability distribution of X is described by a density curve (a graph). • The probability of any event is the area under the density curve and above the x axis, and between the values of X that make up the event. • The total area under a density curve is equal to 1, and a density curve never goes below the x-axis. • Every individual outcome for a continuous random variable has probability zero. 34 0.00 0.10 0.20 0.30 chi−square distributions 0 2 4 6 8 10 12 X Definition 3.10. A continuous random variable X has a χ2 -distribution with k degrees of freedom, if its density curve is given by y= 1 k 2 2 Γ k k 2 x x 2 −1 e− 2 . 0.00 0.10 0.20 0.30 chi−square distributions 0 2 4 6 8 10 12 X 0.00 0.10 0.20 0.30 chi−square distributions 0 2 4 6 8 10 12 X 0.00 0.10 0.20 0.30 chi−square distributions 0 2 4 6 8 10 12 X • The probability distribution of X is described by a density curve (a graph). • The probability of any event is the area under the density curve and above the x axis, and between the values of X that make up the event. • The total area under a density curve is equal to 1, and a density curve never goes below the x-axis. • Every individual outcome for a continuous random variable has probability zero. 35 Measuring the Center of a Distribution 4 6 8 10 0.4 0.3 0.1 0.2 0.3 0.0 0.1 0.2 0.3 0.0 0.1 0.2 0.3 0.2 0.1 0.0 2 p = 0.8 0.4 p = 0.5 0.4 p = 0.25 0.4 p = 0.1 2 x 4 6 8 0.0 3.1 10 2 4 x 6 8 10 2 4 x 6 8 10 x Definition 3.11. The mean of a probability distribution, or the mean of a random variable, is a number that indicates the center, or location, of the random variables distribution. • If X is a discrete random variable whose distribution is Possible Value of X Probability x1 x2 ... xk P (x1 ) P (x2 ) ... P (xk ) then mean of X is computed as follows: µX = x1 P (x1 ) + x2 P (x2 ) + · · · + xk P (xk ) • The mean for a random variable X is also called the EXPECTED VALUE OF X. • If you repeat a random procedure an extreme number of times, and average the observed random variable will be very close to the mean of the random variable. • The mean is what you expect to see on average. • If a random variable X has a Binomial Distribution with n trials and probability of success p, then µX = np. 0.00 0.15 • You will not need to compute the mean for a continuous random variable. −10 −5 0 5 X 10 15 20 36 3.2 Measuring the Spread of a Distribution Definition 3.12. The standard deviation of a probability distribution, or the standard deviation of a random variable, is a number that indicates the spread, or dispersion, of the random variables distribution. • If X is a discrete random variable with mean µ, and distribution Possible Value of X Probability then the standard deviation of X is σ = x1 x2 ... xk P (x1 ) P (x2 ) ... P (xk ) p (x1 − µ)2 P (x1 ) + (x2 − µ)2 P (x2 ) + · · · + (xk − µ)2 P (xk ) Example 3.9. Determine the mean, standard deviation, and variance for the following distribution: X P (X) -1 .25 2 .6 10 .15 • If a random variable X has a Binomial Distribution with n trials and probability of success p, then p σX = np(1 − p). 0.00 0.10 • You will not need to compute the mean for a continuous random variable. −20 Variance −10 0 10 20 X • The Variance of a random variable X is its standard deviation squared. • The Variance of a random variable is another measure of the spread of a random variables distribution. 37 3.3 Percentiles & Critical Values Percentiles Definition 3.13. The 100αth -percentile is a number, P100α , that divides the probability distribution of a random variable X into two parts where P (X ≤ Pα ) ≥ α and P (X ≥ Pα ) ≥ 1 − α. • The 100αth -percentile is a number, P100α , that separates the bottom 100α% of a distribution from the top 100(1 − α)%. Normal Chi−Square InvN orm(α, µ, σ) t−distribution InvT (α, df ) MATH ↓ Solver... 2 MATH ↓ Solver... 0 = α − χ cdf (0, X, df ) 0 = α − tcdf (−299 , X, df ) ENTER ALPHA ENTER ENTER ALPHA ENTER 38 Normal Chi−Square InvN orm(α, µ, σ) t−distribution InvT (α, df ) MATH ↓ Solver... 0=α− χ2 cdf (0, X, df ) ENTER ALPHA ENTER MATH ↓ Solver... 0 = α − tcdf (−299 , X, df ) ENTER ALPHA ENTER Example 3.10. Find P99 for a t distributed random variable with 5 degrees of freedom. Example 3.11. Find P95 for a χ2 -square distributed random variable with 3 degrees of freedom. Example 3.12. Find P90 for a normally distributed random variable with µ = 5, and σ = 3. Example 3.13. In a large section of a statistics class, the points for the final exam are normally distributed with a mean of 72 and a standard deviation of 9. Find the lowest score on the final exam that would qualify a student for an A, if an A should include the top 10% of the class. Example 3.14. The annual per capita utilization of apples (in pounds) in the United States can be approximated by a normal distribution with µ = 17.4 lb. and σ = 4 lb. What annual per capita utilization of apples represents the 10th percentile? 39 Critical Values Definition 3.14. A critical value is a number that is used to separate unusual ( unlikely ) values for a random variable from those values that are expected ( likely ) to occur. • The placement of a critical value will depend on: – the distribution of the random variable; – the significance level α used to define what it means for an event to be unlikely. • Some questions will require the determination of two critical values. • ( Usual, Expected, Common, Likely ) values will generally be considered values “close” to the mean. • ( Unusual, Unexpected, Surprising, Unlikely ) values will generally be considered values “far” to the mean. 40 Critical Values for Specific Distributions Notation 3.1. zα , or z ∗ , denotes a critical value for a Standard Normal Random variable with an area, or probability, of α to its right. Example 3.15. Find z.05 standard normal Notation 3.2. tα,k , or t∗ , denotes a critical value for a t-Random Variable, with k degrees of freedom, with an area, or probability, of α to its right. Example 3.16. Find t.05,3 t−distribution Notation 3.3. χ2α,k denotes a critical value for a χ2 -Random Variable, with k degrees of freedom, with an area, or probability, of α to its right. Example 3.17. Find χ2.05,4 chi−square • The critical values given above define ( Unusual, Unexpected, Surprising, Unlikely ) values to be numbers that are “far” from zero. • Later, we will define these values to be the distance between what we expect to happen, and what actually happens. • This translates into the idea that unlikely values are those that are a “great distance” (relatively) from what we expect. 41 Tail Events & Tail Probabilities Definition 3.15. A one-tail event for a random variable X is an event such as {X ≥ t}, {X ≤ t}, where t is any number. Definition 3.16. A two-tail event for a random variable X is an event such as {X > t or X < r}, where r < t are any numbers. Definition 3.17. A tail probability is the probability of a ( two ) tail event. – Percentiles and Critical Values are defined in terms of tail events. – If a tail probability is smaller than a given significance level, α, then the tail event will be considered unlikely. – If a tail probability is smaller than a given significance level, α, then any outcome within that tail event will be considered ( Unusual, Unexpected, Surprising, Unlikely ). 42 • Depending upon the situation, and significance level α, we may define “( Unusual, Unexpected, Surprising, Unlikely ) values” to be values that are – Far from the mean AND too small µ – Far from the mean AND too big µ – Far from the mean AND either too big or too small µ Chapter 4 Samples Population Sample Parameter statistic 43 44 Remark 4.1. You should read Chapter 1 from your textbook. We will cover only the information necessary for the procedures that will be introduced later. 4.1 Goals: • Describe a population’s unknown distribution; • Describe a population’s unknown parameters; • Describe the nature of the relationship between populations. 4.2 Collecting Data Definition 4.1. SAMPLE: 1. VERB To sample a population is the act of selecting individuals, items, object, or members of a population. 2. NOUN A Sample is the subset of the population that has been selected. Definition 4.2. A simple random sample of n subjects is selected in such a way that every possible sample of the same size n has the same probability of being selected. • All of the procedures that will be discussed later will use a simple random sample. • A simple random sample is a selection of n subjects without replacement. This means we have dependent selections from a finite population. • If the sample size is no more than 5% of the overall population, we will treat the selections as being independent. • We will think of our samples, as selections make with replacement. • For examples in class, we will take samples ( make selections ) with replacement. 45 Other Sample Types Definition 4.3. In systematic sample, we select some starting point and then select every k th element in a population. Definition 4.4. In stratified sample, we subdivide the population into at least two different subgroups ( or strata ) so that subjects within the same subgroup share the same characteristics. Then we draw a sample from each subgroup (or stratum). Definition 4.5. In cluster sampling, we first divide the population area into sections ( or clusters ). Then we randomly select some of those clusters and choose all the members from those selected clusters. Definition 4.6. With convenience sample, we simply use results that are very easy to get. Definition 4.7. In an observational study, we observe and measure specific characteristics, but do not attempt to modify the subjects being studied. Definition 4.8. In an experiment, we apply some treatment and then proceed to observe its effects on the subjects. ( Subjects in experiments are called experimental units.) Type of Observational Studies Definition 4.9. In a cross-sectional study, data are observed measured, and collected at one point in time. Definition 4.10. In a retrospective study, data are collected from the past by going back in time (through examination of records, interviews, and so on. Definition 4.11. In a prospective study, data are collected in the future from groups sharing common factors. 46 4.3 Describing Populations using Graphs of Sample Data Graphs of Sample ( Quantitative ) data can be used to make guesses about the distribution of a population. We will look at the graphs to determine whether they appear to be : • Normal • Uniform • Symmetric • Skewed Definition 4.12. A ( relative ) frequency histogram is a graph consisting of bars of equal width drawn adjacent to each other ( unless there are gaps in the data). The horizontal scale represents classes of quantitative data value and the vertical represents ( relative )frequencies. The heights of the bars correspond to the ( relative ) frequency values. 47 Remark 4.2. Having a guess about the SHAPE of a distribution, allows you make a guess about how to compute probabilities about future samples from the same type of distribution. • If we do not know the SHAPE of a distribution, we CAN NOT make any GOOD guesses about the probability of an event. Assessing Normality with a Small Data Set With a small data set, the shape of a distribution may not be very clear. It is very important to us to be able to identify populations with Normal Distributions. A normal quantile plot can assist us with this. • Normal Distribution • Non-Normal Distribution 48 Stemplot A Stemplot (Stem & Leaf plot) is a quick way to look at the SHAPE of a distribution, if your working by hand, and have a relatively small data set. Stem 1 Leaf 1 2 4.3.1 3 1 1 4 2 3 5 7 7 7 6 2 5 6 6 7 0 2 2 7 8 8 2 3 3 4 5 8 9 9 0 2 3 4 4 4 4 4 8 9 Other Types of Graphics Definition 4.13. A scatterplot is a plot of paired (x, y) quantitative data with a horizontal x-axis and vertical y-axis. 49 Definition 4.14. A time-series graph is a graph of times-series data, which are quantitative data that have been collected over a period of time. Definition 4.15. A Pareto chart is a bar graph for categorical data, with the bars arranged in descending order according to frequencies. Definition 4.16. A Pie Chart is a graph that depicts categorical data as slices of a circle, in which each slice is proportional to the frequency count for the category. 50 4.4 Estimating Population Parameters using Sample Data With a probability distribution for a random variable, defined several numbers that could be used to describe the characteristics of the distribution. • Center – Mean – • Spread – Standard Deviation – – • Proportion of Successes • Percentiles – – – – If we have a population, but don’t know its distribution, we probably don’t know some of these parameters. We will need a method to estimate these parameters, based on samples that we take. Remark 4.3. Not every parameter is interesting for every population. 51 4.4.1 Estimating a Population Mean Definition 4.17. The sample mean is an estimate of the mean of a probability distribution. It can be found by adding all the sample data values together, and dividing by the sample size. x ¯= x1 + x2 + · · · + xn n Example 4.1. Find the mean of the following sample values: • It is a statistic. • It is one possible measure of the center of a SAMPLE. • It is an estimate of a center of a probability distribution. • Its value will change depending upon the sample taken. • one extreme value can change the value of the mean substantially. • Sample means drawn from the same population tend to vary less than other measures of center. 52 Estimating the SAMPLE MEAN from a Frequency Distribution # Frequency 0.5 − 1.4 9 1.5 − 2.4 0 2.5 − 3.4 81 3.5 − 4.4 1 4.5 − 5.4 3 5.5 − 6.4 12 N 106 Estimating the SAMPLE MEAN from a Relative Frequency Distribution # Frequency 0.5 − 1.4 0.25 1.5 − 2.4 0.30 2.5 − 3.4 0.10 3.5 − 4.4 0.20 4.5 − 5.4 0.00 5.5 − 6.4 0.15 1.0 53 4.4.2 Estimating a Population Standard Deviation Definition 4.18. The sample standard deviation is an estimate of the standard deviation of a probability distribution. It is denoted by s and is a measure of how much the sample data deviates away from the sample mean x ¯. s s= 2 (x − x ¯) n−1 Example 4.2. Find the sample standard deviation of the following sample values: Facts about the sample standard deviation • s≥0 • s = 0 only if all if the data values are the same. • s will increase greatly if only one additional data value is added that looks very different from the others. • The units for s are the same as the units on the original data. • s2 = the sample variance is another measure of variation. It is the square of the sample standard deviation. 54 Estimating the STANDARD DEVIATION from a Dataset Frequency # 0.5 − 1.4 9 2.5 − 3.4 81 5.5 − 6.4 1 6.5 − 7.4 3 9.5 − 10.4 12 N 106 Definition 4.19. The range of a data set is the measure of spread found by subtracting the smallest data value from the largest data value. Range Rule of Thumb σ≈ 4.4.3 Range 4 Estimating a Proportion of Successes Definition 4.20. The sample proportion is an estimate of the probability of a success p for some random procedure. It is denoted by pˆ. It is also called a sample proportion. pˆ = # of successes n Example 4.3. Find the sample proportion for the following samples: 55 4.4.4 Estimating Percentiles Definition 4.21. The 100α-Percentile of a dataset, P100α , is a number that breaks the ordered dataset into two groups with about 100α% of the dataset less than, or equal to, P100α and about 100(1 − α)% of the dataset greater than, or equal to, P100α . Finding the Percentile of a Data Value Percentile of x = # of data values < x × 100 n (Round up) Example 4.4. Find the percentile of 18 for the following data: 2, 3, 4, 6, 7, 7, 8, 8, 9, 10, 13, 13, 14, 16, 18, 22, 22, 34, 56, 78 Converting a Percentile to a Data Value L= k ×n 100 Example 4.5. Find the value of the 20th percentile, P20 , for the following data: 2, 3, 4, 6, 7, 7, 8, 8, 9, 10, 13, 13, 14, 16, 18, 22, 22, 34, 56, 78 Example 4.6. Find the value of the 33rd percentile, P33 , for the following data: 2, 3, 4, 6, 7, 7, 8, 8, 9, 10, 13, 13, 14, 16, 18, 22, 22, 34, 56, 78 56 4.4.5 Boxplot - Using Sample Percentiles Definition 4.22. For a set of data, the 5-number summary consists of these five values: Minimum, Q1 , Q2 , Q3 , Maximum Example 4.7. Give the 5-number summary for the following data: 2, 3, 4, 6, 7, 7, 8, 8, 9, 10, 13, 13, 14, 16, 18, 22, 22, 34, 56, 78 Definition 4.23. A boxplot is a graph of a data set that consists of a number line extending from the minimum to the maximum data value, and a box drawn at the first, second and third quartiles. Example 4.8. Construct a boxplot for the following data: 2, 3, 4, 6, 7, 7, 8, 8, 9, 10, 13, 13, 14, 16, 18, 22, 22, 34, 56, 78 57 1.5 × IQR Guideline for outliers It is always important to look for data values that don’t apparently fit with the rest. Potential outliers can be identifies as those data values that are • less than Q1 − 1.5 × IQR. • greater than Q3 + 1.5 × IQR. Example 4.9. Identify any potential outliers for the following data: 2, 3, 4, 6, 7, 7, 7, 8, 9, 10, 13, 13, 14, 16, 18, 22, 22, 34, 56, 78 • This rule helps identify values that are “far” away from the central 50% of the data values. 4.4.6 Relative Distance From the Center Definition 4.24. A z-score or standardized value is the number of standard deviations that a given value x is above or below the mean. A z-score is calculated as follows: 58 Facts about z-scores • A z-score allows a comparison of distances between two distributions that are spread out in different manners. • In many cases, a z-score will represent the relative distance between an observation and a distributions expected value. • Large z-scores will represent observations that are “far” to what is expected. These observations would be considered ( Unusual, Unexpected, Surprising, Unlikely ). • Small z-scores will represent observations that are “close” to what we expect. These observations would be considered ( Usual, Expected, Common, Likely ). 59 Example 4.10. Two statistics classes take an exam. The distribution of the test scores looked relatively normal. Class A has a mean of 72 and a standard deviation of 3. Class B had a mean of 83 and a standard deviation of 6. Michele is in Class A. She received a score of 81. Elaine is in Class B. She received a 91. Elaine obviously has the higher overall score, but who did better with respect to their class? Does either one of them have an unusually high score compared to their class? 60 4.5 Probability distribution of a z-score • The observation used in the computation of a z-score are generally the outcome of some random procedure. • The observation represents the outcome of some random variable. • If the probability distribution of the observation has a Normal distribution, then the z-score – is a random variable, – has a standard normal distribution. If X ∼ Normal(µX , σX ) then z = X − µX ∼ Normal(0, 1) σX We can use this idea to make estimates about the probabilities of future events, or about proportions of a dataset. Example 4.11. A sample was taken and the following histogram was made. Estimate the proportion of the data that was within 1 standard deviations of the mean. Which data values appear to be within 1 standard deviations of the mean? 5 6 7 8 9 10 11 61 Example 4.12. A sample was taken and the following histogram was made. Estimate the proportion of the data that was within 1 standard deviations of the mean. Which data values appear to be within one standard deviations of the mean? 10 15 20 25 30 Example 4.13. A sample was taken and the following histogram was made. Estimate the proportion of the data that was within 2 standard deviations of the mean. Which data values appear to be within 2 standard deviations of the mean? 7 8 9 10 11 12 13 62 Example 4.14. A sample was taken and the following histogram was made. Estimate the proportion of the data that was within 3 standard deviations of the mean. Which data values appear to be within 3 standard deviations of the mean? 2 4 6 8 10 12 14 Empirical Rule: 68-95-99.7 µ+3σ µ+2σ x value µ+1σ µ µ−1σ µ−2σ µ−3σ Density Normal Distribution 63 4.6 Sampling Distributions Definition 4.25. The sampling distribution of a statistic is the distribution of that statistic based on a fixed sample size. Recall. The following statistics are random variables: • Sample Mean x ¯ • Sample Proportion pˆ • Sample Standard Deviation s Remark 4.4. Many other statistics exist. 4.6.1 Central Limit Theorem Theorem 4.1. Central Limit Theorem Suppose that a random variable X has a mean µX and a standard deviation σX < ∞, then the (sampling) distribution ( based on a simple random sample of size n ) of x ¯ will be: √ • Normally distributed with mean µX and standard deviation σ/ n, if X has a normal distribution. √ • Approximately Normally distributed with mean µX and standard deviation σX / n, if the n > 30 and the distribution of X is not heavily skewed. σ x ¯ ∼ Normal µX , √ n 64 Example 4.15. The height of adult females is normally distributed with a mean of 205.5 cm and a standard deviation of 8.6 cm. 1. What is the probability that a randomly selected female will be taller than 210 cm? 2. What is the probability that the average height of 25 randomly selected females will be taller than 210 cm? 3. (α = .01) What heights of females would be considered unusually tall? 4. (α = .01) If 25 women are randomly selected, what would be considered an unusually high average height? 65 Example 4.16. Suppose that the amount of time that you will wait for a bus, at a particular bus stop, has a mean of 10 minutes with a standard deviation of 1 minute? 1. What is the probability that on a randomly selected day, you will wait longer than 12 minutes? 2. What is the probability that over 31 randomly selected days you will wait longer than 12 minutes on average? 3. (α = .05) What would be considered an unusually long wait time? 4. (α = .05) Over the course of 31 randomly selected days, what would be considered an unusually long average wait? 66 Corollary 4.2. If a population can be split into two disjoint groups, success and failure, and the proportion of success is equal to p and a sample of size n is taken, where np ≥ 5 and n(1 − p) ≥ 5 then ! r p(1 − p) pˆ ∼ Normal p, n Example 4.17. Seventy percent of a town is republican. A random sample of 100 residents will be taken. What is the probability more than 71% of those sampled will be republicans? Example 4.18. A coin is flipped 25 times, what is the probability that more than 60% of the flips will be tails? Chapter 5 Inference: Confidence Intervals Idea for a Confidence Interval 0 1 2 3 4 5 67 6 7 8 9 10 68 5.1 Confidence Intervals for a Single Population Definition 5.1. A Confidence Level 100(1 − α)% indicates that there is a 1 − α probability that a random procedure produced an acceptable result. Definition 5.2. An Interval Estimate is a range of numbers, determined by following a random procedure, used to estimate an unknown population parameter. Definition 5.3. A 100(1 − α)% Confidence Interval is an Interval Estimate produced by following a procedure that correctly estimates an unknown population parameter at least 100(1 − α)% of the time, i.e. the procedure has a 100(1 − α)% Confidence Level. 69 General Procedure for Constructing a Confidence Interval for a Mean or Proportion 1. Decide how confident you want to be in your interval estimate. 2. Decide how precise you want your estimate to be. 3. Using Step 1 and Step 2, determine the necessary sample size n. 4. If necessary, revisit Step 1 and Step 2, if the sample size determined in Step 3 is too large to manage. 5. Take a sample of at least size n. 6. Compute x ¯ or pˆ. 7. Compute your margin of error E. 8. Construct your Confidence Interval. (Estimate − Margin of Error, Estimate + Margin of Error) 9. State with 100(1 − α)% Confidence that the unknown parameter is captured by the confidence interval. 70 5.1.1 Confidence Interval for a Population Mean One possible way to produce a confidence interval for a mean. However, it is unrealistic. It assumes that we know a population standard deviation σ σ x ¯ − z α2 √ < µ < x ¯ + z α2 √ n n z= −z α2 x ¯−µ √ σ/ n 0 z α2 71 Real Life • We don’t know the distribution. • In real life, we don’t know σ. • We estimate σ with s. • We estimate the z-score with a t-score: t= t= −t α2 x ¯−µ √ s/ n x ¯−µ √ s/ n 0 100(1 − α)% Confidence Interval for µ s s x ¯ − t α2 √ < µ < x ¯ + t α2 √ n n t α2 72 5.1.2 Confidence Interval for a Population Proportion In a similar manner to the mean, we can make an estimate for a population proportion. r r p(1 − p) p(1 − p) pˆ − z α2 < p < pˆ + z α2 n n z= −z α2 qpˆ−p p(1−p) n z α2 0 We ended with a method for estimating the unknown population proportion p. This has the problem that we need to know the population proportion in order to estimate the population proportion. 100(1 − α)% Confidence Interval for p r pˆ − z α 2 pˆ(1 − pˆ) < p < pˆ + z α2 n r pˆ(1 − pˆ) n 73 5.1.3 Examples Example 5.1. Twelve leaves were randomly selected from the ground below a single tree and their length (cm) was measured. Use the following information to estimate the mean length of all leaves found under this tree. (95% Confidence) 13.65 15.3 15.45 15.7 11.9 10.4 13.6 16 Histogram of Data 3 2 Frequency 1 15 13 0 11 Sample Quantiles 4 Normal Q−Q Plot 11.30 −1.5 −1.0 −0.5 0.0 0.5 Theoretical Quantiles 1.0 1.5 10 11 12 13 Data 14 15 16 12.2 11.6 10.5 x ¯ = 13.133 s = 2.086 74 Example 5.2. A survey of 17 randomly selected UTM students was conducted. (Not really) They were each asked if they had ever seen an episode of The Walking Dead. Their responses are recorded below. A ‘1’ indicates that they said “yes”. A ‘0’ indicates that they said “no”. Estimate with 99% Confidence the true proportion of UTM students that have seen an episode of The Walking Dead. 0 1 0 1 1 0 1 1 1 1 1 1 0 0 1 1 0 75 5.1.4 Precision A short Confidence Interval gives a more precise estimate for the unknown population parameter. Precision is controlled by three things: • The desired and acceptable precision • The Confidence Level • The Sample Size Example 5.3. A moving company is asked to move 10,000 identical blocks. The moving company wants to know how much each box weighs in order to determine what equipment is needed to move the blocks. The owner of the blocks knows that they all weigh about the same amount. Which would be a more useful guess? • Between 2 and 300 pounds; • Between 30 and 40 pounds. 76 Sample Size for Estimating a Population Mean n= z α/2 σ 2 E ( round up ) where σ is • the known population standard deviation, • an estimate of the population standard deviation taken from a previous study, • estimated using the range rule of thumb, Sample Size for Estimating a Population Proportion When an estimate of p is known: n = pˆ(1 − pˆ) z α/2 2 E ( round up ) When an estimate of p is unknown: n = 0.25 z α/2 E 2 ( round up ) 77 Example 5.4. You want to estimate the mean SAT score of all college applicants. Possible SAT scores range from 600 to 2400. How many scores must be sampled if you would like to estimate the population mean score to within 100 points with 98% confidence? Example 5.5. Find the sample size needed to estimate the percentage of Republicans among registered voters in California to within 3 percentage points with 90% confidence. Example 5.6. A prior Pew Research Center report suggests that 15% of adults have consulted fortune tellers. Determine the sample size necessary to estimate the percentage of adults that consult fortune tellers within 3 percentage points with 98% confidence. 78 5.2 Confidence Intervals for a Comparing Two Populations Many times, it is of interest to compare two populations. We might be interested in the following parameters: • p1 − p2 • µ1 − µ2 These differences will still be unknown to us, and we will need to estimate them with confidence intervals in the same manner as with a single population. (Estimated Difference − Margin of Error, Estimated Difference + Margin of Error) 5.2.1 100(1 − α)% Confidence Intervals for p1 − p2 Margin of Error r E = z α2 pˆ1 qˆ1 pˆ2 qˆ2 + n1 n2 Confidence Interval (ˆ p1 − pˆ2 ) − E < p1 − p2 < (ˆ p1 − pˆ2 ) + E Example 5.7. A study was conducted to determine the proportion of people who dream in black and white instead of color. Among 306 people over the age of 55, 68 dream in black and white, and among 298 people under the age of 25, 13 dream in black and white. Construct a 99% confidence interval estimate for difference in proportions between the two age groups. 79 5.2.2 100(1 − α)% Confidence Intervals for µ1 − µ2 (Independent) Margin of Error s E = t α2 s21 s2 + 2 n1 n2 Confidence Interval (¯ x1 − x ¯2 ) − E < µ1 − µ2 < (¯ x1 − x ¯2 ) + E Example 5.8. The accompanying table gives results from a study of the words spoken in a day by men and women. The original data can be found in your textbook, if your curious. Construct a 95% confidence interval estimate for the difference in mean number of words spoken by men and women. The data collected from each population looked relatively normal. Men Women n1 = 186 n2 = 210 x ¯1 = 15668.5 x ¯2 = 16215.0 s1 = 8632.5 s2 = 7301.2 80 5.2.3 100(1 − α)% Confidence Intervals for µd = µ1 − µ2 ( Dependent Samples) Margin of Error sd E = t∗ √ n Confidence Interval d¯ − E < µd < d¯ + E Example 5.9. A Sample of students from two classes of statistics were given a sheet of paper with a straight line drawn on it. Each student was asked to estimate the length of the line in two units of measure, centimeters and inches. The estimates taken in inches were then converted into centimeters. The two estimates could then be compared. Construct a 99% confidence interval for the mean of the difference in the students estimates. cm converted 9 8.5 10 10 9 8 8.5 9 8.5 9 13 11.5 9 9.5 8.89 9.525 10.16 7.62 7.62 12.7 11.43 10.16 10.16 11.43 11.43 11.43 11.43 9.525 Chapter 6 Inference: Hypothesis Tests Definition 6.1. A hypothesis is a claim or statement about a property of a population Definition 6.2. A hypothesis test is a procedure for making a decision about a property of a population. Idea for a Hypothesis Test 0 1 2 3 4 5 81 6 7 8 9 10 82 6.1 Hypothesis Tests General Procedure for a Hypothesis Test 1. Decide which parameter of a population(s) you are interested in. 2. Decide upon a significance level α. 3. Make your claim about the parameter. 4. Determine your hypotheses. (a) State your claim symbolically (b) State the “opposite” of your claim symbolically. (c) One of (4a) or (4b) contains equality. Call this your null hypothesis (d) Call the remaining of (4a) or (4b) the alternative hypothesis 5. Pick a statistic that will estimate the parameter of interest. 83 General Procedure for a Hypothesis Test Continued 6. Make a rule for deciding which hypothesis the estimate is consistent with. • For the sake of argument, we assume that the null hypothesis is true. (a) Determine what it means for your estimate to be “far” from the parameter of interest. Critical Values Significance Level (b) The rule: The estimate is inconsistent with the null hypothesis if The estimate is “far” from the parameter. 7. Take a sample and compute your estimate. 8. Make a decision by applying your rule. 9. State your conclusion. P-Value ≤ α. 84 6.1.1 Testing a Claim about a Mean t= Test Statistic: x ¯−µ √0 s/ n df = n − 1 Hypotheses: Left-tailed H0 : µ = µ0 H1 : µ < µ0 Right-tailed H0 : µ = µ0 H1 : µ > µ0 Two-Tailed H0 : µ = µ0 H1 : µ 6= µ0 Rejection Region: Reject H0 if: Critical Value t ≤ −tα,n−1 Critical Value t ≥ tα,n−1 Critical Value |t| ≥ t α2 ,n−1 P-value tcdf (−299 , t, df ) ≤ α P-value tcdf (t, 299 , df ) ≤ α P-value 2 × tcdf (|t|, 299 , df ) ≤ α 85 Example 6.1. When 40 people used the Weight Watchers diet for one year, their mean weight loss was 3.0 lb and the standard deviation was 4.9 lb. Use a 0.01 significance level to test the claim that the mean weight loss is greater than 0. 1. What is the critical value? Do you reject or fail to reject the null hypothesis? • How “far” must x ¯ be from 0 before we are convinced that the mean weight loss is greater then 0? • How “far” is x ¯ actually from the hypothesized value of 0? 2. What is the P - Value? Do you reject or fail to reject the null hypothesis? • If the mean weight loss is equal to zero, how likely is it that I would get an x ¯ of 3.0 or more? 86 Example 6.2. Listed below are brain volumes (cm3 ) of unrelated subjects used in a study. Use a 0.01 significance level to test the claim that the population of brain volumes has a mean equal to 1100.0 cm3 . 963 1027 1272 1079 1070 1173 1067 1347 1100 1204 1. What are the critical values? Do you reject or fail to reject the null hypothesis? • How “far” must x ¯ be from 1100 before we are convinced that the mean brain volume is not 1100.0 cm3 ? • How “far” is x ¯ actually from the hypothesized value of 1100? 2. What is the P - Value? Do you reject or fail to reject the null hypothesis? • If the mean brain volume is 1100.0 cm3 , how likely is it that I would observe an x ¯ at least as different as ? 87 88 6.1.2 Testing a Claim about a Proportion z= Test Statistic: Hypotheses: Left-tailed H0 : p = p0 H1 : p < p0 Right-tailed H0 : p = p0 H1 : p > p0 Two-Tailed H0 : p = p0 H1 : p 6= p0 q pˆ−p0 p0 (1−p0 ) n Rejection Region: Reject H0 if: Critical Value z ≤ −zα Critical Value z ≥ zα Critical Value |z| ≥ z α2 P-value normcdf (−299 , z, 0, 1) ≤ α P-value normcdf (z, 299 , 0, 1) ≤ α P-value 2 × normcdf (|z|, 299 , 0, 1) ≤ α 89 Example 6.3. In a study of 420,095 Danish cell phone users, 135 subjects developed cancer of the brain or nervous system. Test the claim that cell phone users develop cancer of the brain or nervous system at a rate different from the rate of those that do not use cell phone. The cancer rate of non-cell phone users is 0.0340%. 1. What are the critical values? Do you reject or fail to reject the null hypothesis? • How “far” from 0.000340 must pˆ be before we are convinced that the cancer rate of Danish cell phone users is not 0.000340? • How “far” is pˆ actually from the hypothesized value of 0.000340? 2. What is the P - Value? Do you reject or fail to reject the null hypothesis? • If the cancer rate of Danish cell phone users is 0.000340, how likely is it that I would observe an pˆ at least as different as ? 90 Example 6.4. A Consumer Reports Research center survey of 427 women showed that 22.0% of them purchased books online. Test the claim that less than 25% of women purchased books online. 1. What is the critical value? Do you reject or fail to reject the null hypothesis? • How “far” must pˆ be from 0.25 before we are convinced that the the proportion of women who purchased books online is less than 0.25? • How “far” is pˆ actually from the hypothesized value of 0.25? 2. What is the P - Value? Do you reject or fail to reject the null hypothesis? • If the proportion of women who purchased books online is 0.25, how likely is it that I would observe an pˆ smaller than ? 91 6.2 Hypothesis Tests for comparing two populations As with confidence intervals, Hypothesis Test can be used to compare two populations. The following parameters will be of interest: • p1 − p2 • µ1 − µ2 These differences will still be unknown to us. The procedures will be similar to those used for single populations. What are the differences? • Two samples will be collected. • The difference between the parameters will be estimated. • Our null hypothesis will generally be that the two parameters are the same. • Our central question will become, how far apart must our estimates be, before we are convinced that the parameters are different in some way. 92 6.2.1 Testing a Claim about p1 − p2 z= Test Statistic: pˆ1 −ˆ p2 r p¯(1−¯ p) n1 + n1 1 p¯ = Hypotheses: Left-tailed H0 : p1 = p2 H1 : p1 < p2 Right-tailed H0 : p1 = p2 H1 : p1 > p2 Two-Tailed H0 : p1 = p2 H1 : p1 6= p2 2 x1 +x2 n1 +n2 Rejection Region: Reject H0 if: Critical Value z ≤ −zα Critical Value z ≥ zα Critical Value |z| ≥ z α2 P-value normcdf (−299 , z, 0, 1) ≤ α P-value normcdf (z, 299 , 0, 1) ≤ α P-value 2 × normcdf (|z|, 299 , 0, 1) ≤ α 93 Example 6.5. A study was conducted to determine the proportion of people who dream in black and white instead of color. Two populations were considered. The first consisted of people over the age of 55, and the second consisted of people under the age of 25. We want to use a 0.01 significance level to test the claim that the proportion of people over 55 who dream in black and white is greater than the proportion for those under 25. Two hundred people over 55 were surveyed, and 54 said that they dream in black and white. Three hundred people under 25 were surveyed, and 47 said that they dream in black and white. 1. What is the critical value? Do you reject or fail to reject the null hypothesis? • How “far” must pˆ1 − pˆ2 be from 0 before we are convinced that the proportion of adults over 55 who dream in Black and White is greater than the proportion for the under 25 group? • How “different” is pˆ1 from pˆ2 ? 2. What is the P - Value? Do you reject or fail to reject the null hypothesis? • If the proportions are actually the same, how likely is it that I would get a difference of pˆ1 − pˆ2 at least as big as ? 94 6.2.2 Testing a Claim about µ1 − µ2 (Independent) t= Test Statistic: df = Hypotheses: Left-tailed H0 : µ1 = µ2 H1 : µ1 < µ2 Right-tailed H0 : µ1 = µ2 H1 : µ1 > µ2 Two-Tailed H0 : µ1 = µ2 H1 : µ1 6= µ2 (A+B)2 A2 B2 n1 −1 + n2 −1 ¯1 −¯ x2 rx 2 s2 1 + s2 n1 n2 , A = s21 /n1 , B = s22 /n2 Rejection Region: Reject H0 if: Critical Value t ≤ −tα,df Critical Value t ≥ tα Critical Value |t| ≥ t α2 P-value tcdf (−299 , t, df ) ≤ α P-value tcdf (t, 299 , df ) ≤ α P-value 2 × tcdf (|t|, 299 , df ) ≤ α 95 Example 6.6. The accompanying table gives results from a study of the words spoken in a day by men ( Pop. 1 ) and women ( Pop. 2 ). The original data can be found in your textbook, if you’re curious. Use a 0.01 significance level to test the claim that the mean number of words spoken in a day by men is less than that for women. Men Women n1 = 186 n2 = 210 x ¯1 = 15668.5 x ¯2 = 16215.0 s1 = 8632.5 s2 = 7301.2 1. What is the P - Value? Do you reject or fail to reject the null hypothesis? • If the means are actually the same, how likely is it that I would get a difference of x ¯1 − x ¯2 at least as small as 2. What would you conclude? ? 96 6.2.3 Testing a Claim about µd = µ1 − µ2 (Dependent) t= Test Statistic: ¯ 0 d−d √ sd / n df = n − 1 Hypotheses: Left-tailed H0 : µd = d0 H1 : µd < d0 Right-tailed H0 : µd = d0 H1 : µd > d0 Two-Tailed H0 : µd = d0 H1 : µd 6= d0 Rejection Region: Reject H0 if: Critical Value t ≤ −tα,n−1 Critical Value t ≥ tα,n−1 Critical Value |t| ≥ t α2 ,n−1 P-value tcdf (−299 , t, df ) ≤ α P-value tcdf (t, 299 , df ) ≤ α P-value 2 × tcdf (|t|, 299 , df ) ≤ α 97 Example 6.7. A study was conducted to investigate the effectiveness of hypnosis in reducing pain. Results for randomly selected subjects are given in the accompanying table. The values are before and after hypnosis; the measurements are in centimeters on a pain scale. It is claimed that the treatment is effective. Subject A B C D E F G H Before 6.6 6.5 9.0 10.3 11.3 8.1 6.3 11.6 After 6.8 2.4 7.4 8.5 8.1 6.1 3.4 2.0 Difference 1. What is the P - Value? Do you reject or fail to reject the null hypothesis? • How “far” must d¯ be from 0 before we are convinced that hypnosis is effective? • How “far” is d¯ actually from 0? 2. What is the P - Value? Do you reject or fail to reject the null hypothesis? • If hypnosis is not effective, how likely is it that I would get an d¯ greater than ? 98 6.3 Other Types of Tests Many types of tests exist. They all compare how “closely” our sample matches our expectations, i.e. they compare how close a statistic is to some assumed parameter. However, in the next two tests, we can make conclusion about more than just a single parameter. 6.3.1 Goodness of Fit A goodness of fit test compares many proportions at one time. It can be used to determine how well the distribution of a sample fits with a given distribution. Hypotheses: H0 : p1 = p1,0 , p2 = p2,0 , . . . , pk = pk,0 Ha : at least one proportion is not as claimed. Test Statistic: χ2 = P (O−E)2 E df = k − 1 Rejection Region at Level α: Reject H0 if: Critical Values: χ2 ≥ χ2α,df P-Values: χ2 cdf (χ2 , 299 , df ) ≤ α 99 Example 6.8. For a recent year, the following numbers are the numbers of homicides that occurred each month in NYC: 38, 30, 46, 40, 46, 49, 47, 50, 50, 42, 37, 37. Use a 0.05 significance level to test the claim that homicides in NYC are equally likely for each of the twelve months. 1. What is the critical value? Do you reject or fail to reject the null hypothesis? • How “big” must χ2 be before we are convinced that homicides are not equally likely for each month? • How “big” is χ2 actually? 2. What is the P - Value? Do you reject or fail to reject the null hypothesis? • If homicides are equally likely for each month, how likely is it that I would get an χ2 greater than ? 100 Example 6.9. Is the die that we rolled in class unfair? Use a 0.05 significance level to test the claim that the outcomes are not equally likely. 1. What is the critical value? Do you reject or fail to reject the null hypothesis? • How “big” must χ2 be before we are convinced that the die is unfair? • How “big” is χ2 ? 2. What is the P - Value? Do you reject or fail to reject the null hypothesis? • If the die is still fair, how likely is it that I would get an χ2 greater than ? 101 6.3.2 Contingency Tables - Test for Independence Hypotheses: H0 : The variables are independent. Ha : The variables are dependent. Test Statistic: χ2 = P (O−E)2 E df = (r − 1)(c − 1) Rejection Region at Level α: Reject H0 if: Critical Values: χ2 ≥ χ2α,df P-Values: χ2 cdf (χ2 , 299 , df ) ≤ α 102 Example 6.10. In an imaginary study of the “gender effect”, 120 UTM students were observed. Each was classified by gender (M,F) and by hair color (Light, Dark, Red) Use a 0.05 significance level to test the claim that hair color is independent of gender. The observed counts are listed below. Red Dark Light Female 5 10 45 Male 15 30 15 1. What is the critical value? Do you reject or fail to reject the null hypothesis? • How “big” must χ2 be before we are convinced that hair color is dependent on gender? • How “big” is χ2 ? 2. What is the P - Value? Do you reject or fail to reject the null hypothesis? • If hair color is independent of gender, how likely is it that I would get an χ2 greater than ? What are the expected counts? 103 Example 6.11. In a clinical trial of the effectiveness of Echinacea for preventing colds, the results in the table below were obtained. Use a 0.10 significance level to test the claim that getting a cold is independent of the treatment group. Treatment Group Placebo 20% Extract 60% Extract Got a cold 88 48 42 Didn’t get a cold 15 4 10 1. What is the critical value? Do you reject or fail to reject the null hypothesis? • How “big” must χ2 be before we are convinced that getting a cold is dependent of the treatment group? • How “big” is χ2 ? 2. What is the P - Value? Do you reject or fail to reject the null hypothesis? • If getting a cold is independent of the treatment group, how likely is it that I would get an χ2 greater than What are the expected counts? ? 104 6.4 Errors in Hypothesis Testing Hypothesis Tests can be performed properly, and our conclusions may be contrary to what is actually true. • Sampling variability results in uncertain inferences and the possibilities of making errors in our decisions. • Statistical procedures are designed to minimize the probability of committing an error. • The two types of errors are called – Type I Error - Reject H0 , when H0 is true. ∗ The significance level is the probability of a Type I error. ∗ We choose the significance level, so we choose the Type I error rate. – Type II Error - Fail to Reject H0 , when H0 is false. Hypothesis Tests Outcomes Population Reject H0 H0 True H0 False Type I Error Correct Decision Correct Decision Type II Error Sample Fail To Reject H0 Example 6.12. Suppose a test of H0 : µ = 9 vs H0 : µ 6= 9 is performed. Describe what a Type I and Type II error would be. 105 Example 6.13. Suppose a test of H0 : p = .9 vs H0 : p < .9 is performed. At the end of the test, you determined that you would fail to reject the null hypothesis. It was later determined that p was actually equal to .76. Did your test produce an erroneous result? If so, what type of error did you make? Example 6.14. Suppose a test of H0 : σ ≤ 87 vs H0 : σ > 87 is performed. At the end of the test, you determined that you would fail to reject the null hypothesis. It was later determined that σ was actually equal to 73. Did your test produce an erroneous result? If so, what type of error did you make? Chapter 7 Correlation & Regression Regression Techniques allow us to describe the relationship between paired random variables. 7.1 Correlation Definition 7.1. The linear correlation ρ measures the strength & direction of the linear relationship between a collection of paired random variables. Definition 7.2. The linear correlation coefficient r measures the strength & direction of the linear relationship between a collection of paired data values. It is used to estimate the linear correlation ρ. 106 107 7.1.1 Test for Linear Correlation Hypotheses: H0 : ρ = 0 There is no linear correlation. Ha : ρ 6= 0 There is a linear correlation. Test Statistic: r= n t= P n r P x2 − xy − P x 2 P x P r n P y y2 − P y 2 q r 1−r 2 n−2 df = n − 2 Rejection Region at Level α: Reject H0 if: Critical Values: t ≥ t α2 ,df or t ≤ −t α2 ,df P-Values: tcdf |t|, 299 , df ≤ α 108 Example 7.1. Listed below are annual data for various years. The data are weights (metric tons) of lemons imported from Mexico and U.S. car crash fatality rates per 100,000 population. Estimate the strength of the linear relationship between the Lemon Import Data and the number of crash fatalities? Is there a linear correlation between car crash fatalities and lemon imports from Mexico? Lemon Imports 230 265 358 480 530 Crash Fatalities 15.9 15.7 15.4 15.3 14.9 Example 7.2. One classic application of correlation involves the association between the temperature and the number of times a cricket chirps in a minute. Listed below are the number of chirps in one minute and the corresponding temperatures. Estimate the strength of the linear relationship between the two variables. Is there a correlation between the temperature and the number of times a cricket chirps? Chirps in 1 min 882 1188 1104 864 1200 1032 960 900 ◦ 69.7 93.3 84.3 76.3 88.6 82.6 71.6 79.6 Temperature ( F ) 109 7.2 Regression Definition 7.3. Given a collection of paired sample data, the regression line is the straight line that “best” fits the scatterplot of data. The regression equation describes the regression line. yˆ = b0 + b1 x Example 7.3. Listed below are annual data for various years. The data are weights (metric tons) of lemons imported from Mexico and U.S. car crash fatality rates per 100,000 population. Measure the strength of the linear relationship between the Lemon Import Data and the Crash fatality data. Additionally, find the best predicted crash fatality rate for a year in which there are 500 metric tons of lemon imports. Lemon Imports 230 265 358 480 530 Crash Fatalities 15.9 15.7 15.4 15.3 14.9 Example 7.4. One classic application of correlation involves the association between the temperature and the number of times a cricket chirps in a minute. Listed below are the number of chirps in one minute and the corresponding temperatures. Measure the strength of the linear relationship between the two variables. Find the best predicted temperature at a time when a cricket chirps 950 times in one minute. Chirps in 1 min 882 1188 1104 864 1200 1032 960 900 Temperature (◦ F ) 69.7 93.3 84.3 76.3 88.6 82.6 71.6 79.6

© Copyright 2017 ExploreDoc