### here - Jeremy Entner

```Chapter 1
Introduction to Statistics
1.1
Preliminary Definitions
Definition 1.1. Data are observations (such as measurements, genders, survey responses) that have been
collected.
Definition 1.2. Statistics is a collection of methods for planning studies and experiments, obtaining data,
and then organizing, summarizing, presenting, analyzing, interpreting, and drawing conclusions based on
data.
Definition 1.3. A Population is the entire collection of individuals or measurements about which information is desired.
Definition 1.4. A Sample is a subset of the population that has been selected for study.
Definition 1.5. A statistic is a numerical description of a SAMPLE.
Definition 1.6. A parameter is a numerical description of a POPULATION.
Definition 1.7. Statistical Inference consists of methods of techniques for generalizing from a sample to
the population from which the sample is selected.
Definition 1.8. Sampling Variability describe the extent to which samples differ from one another.
1
2
1.2
Framework of Statistics
Population
Sample
Parameter
statistic
3
Idea for a Confidence Interval
0
1
2
3
4
5
6
7
8
9
10
4
Idea for a Hypothesis Test
0
1
2
3
4
5
6
7
8
9
10
Chapter 2
Probability
Remark 2.1. The information regarding probability can be found in Chapter 4 of your textbook.
• How do we measure likeliness?
1
• How do we determine what is considered (un)likely?
Definition 2.1. The Probability of an event is
•
•
0
Definition 2.2. A significance level α is the largest probability an unlikely event can have.
5
6
2.1
Definitions & Examples
Definition 2.3. The result of a single trial of a given procedure is called an outcome.
Definition 2.4. An event is any collection of results of outcomes of a procedure.
Definition 2.5. A simple event is an outcome or an event that cannot be further broken down into simpler
components.
Definition 2.6. The sample space for a procedure consists of all possible simple events.
Example 2.1. A bucket contains some numbered balls. Eventually, one ball will be removed at random.
1. Find the sample space for this procedure.
2. Let A denote the event that the outcome is even. Describe A in terms of simple events.
Example 2.2. Two numbered balls are removed individually from a bucket. Replacing each after they are
removed. The numbers on the balls are written down.
1. Find the sample space for this procedure.
2. Let B denote the event that the outcome at least one ball is a three. Describe B in terms of simple
events.
3. Is the event “ at least one ball is a three” a simple event?
4. If the balls were added together, what would be the sample space?
7
2.2
Some Methods for Computing Probabilities of an Event
There are three approaches to determining the probability of an event:
1. Subjective Probabilities
2. Relative Frequency Approximation
3. Classical Approach
Theorem 2.1. The Law of Large Numbers states that as a procedure is repeated again and again, the
relative frequency approximation for the probability of an event tends to approach the actual probability.
8
Example 2.3. 65 men and women were surveyed. They were asked “Which do you like better: Pollen or
Propolis?” The answers are tallied below.
Pollen
Propolis
Total
Men
12
14
26
Women
24
15
39
Total
36
29
65
a. What is the probability that a randomly selected survey respondent will be a woman?
Pollen
Propolis
Total
Men
12
14
26
Women
24
15
39
Total
36
29
65
b. What is the probability that a randomly selected survey respondent will prefer pollen?
Pollen
Propolis
Total
Men
12
14
26
Women
24
15
39
Total
36
29
65
c. If you consider only the female responses, what is the probability that you would randomly select one of
the women that prefer pollen?
Pollen
Propolis
Total
Men
12
14
26
Women
24
15
39
Total
36
29
65
Example 2.4. A colored ball is removed, at random, from a bucket. What is the probability that the ball
will be green?
9
Example 2.5. Two fair four-sided dice is rolled. What is the probability that both numbers will be even?
Example 2.6. 256 fair four-sided dice are rolled. What is the probability that all the numbers will be even?
10
2.3
2.3.1
Counting
Fundamental Counting Rule
Given two sequential events, if the first can occur m ways, and the second event can occur n ways, then the
number of ways both events can occur in sequence is equal to m × n.
Example 2.7. An airline has 6 routes from city A to City B, and 9 routes from City B to City C. If you
were to take this use this airline, how many routes could you take from City A to City C?
Example 2.8. How many ways can a family with 6 members be lined up to take a family portrait?
11
2.3.2
Order or no order? Repeats or not?
Definition 2.7. Permutations of items are arrangements in which different sequences of the same items
are counted separately.
Definition 2.8. Combinations of items are arrangements in which different sequences of the same items
are not counted separately.
Selecting r of n distinct objects.
Repeats
Unordered
Ordered
No Repeats
n Cr
nr
n Pr
12
Example 2.9. You have 4 extra tickets for a concert and 7 friends. How many different groups of your
friends could accompany you to the concert?
Example 2.10. You have three astronauts, Anna, George, and Michele, on the first Mission to Mars. For
the first Marswalk, two of them will be allowed to leave their flying saucer, and walk on the planet; one will
have to remain behind. How many different ways can they be assigned a job for their first landing? If they
are randomly given their assignment, what is the probability that George will be left on the ship?
Example 2.11. How many five letter words can be made with the letters F, S, H, E. A letter can be used
more than once. What is the probability that a five letter ’word’ will start with the letter F ? Only the
letters F, S, H, E can be used. Letters can be repeated.
When some Items are Identical to Others - Another Permutation Rule
Example 2.12. How many different ways can the letters in TENNESSEE be arranged? If these letters are
randomly arranged, what it the probability that they will spell TENNESSEE?
13
2.4
The Addition Rule for Probabilities
Definition 2.9. A compound event is any event combining two or more simple events.
Notation 2.1. More notation that will be used
• (A or B) =
• (A and B) =
P (A or B) = P (A) + P (B) − P (A and B)
Example 2.13. Suppose the following: P (A) = .9, P (B) = .8, P (A and B) = .77. Find P (A or B).
Example 2.14. In a group of 101 students 40 are juniors, 50 are female, and 22 are female juniors. Find
the probability that a student picked from this group at random is either a junior or female.
14
Example 2.15. A family of 6 is going to have their picture taken. The photographer is going to randomly
line everyone up. What is the probability that the mother ends up in the first chair or the father ends up in
the sixth chair?
Example 2.16. A single card is chosen at random from a standard deck of 52 playing cards. What is the
probability of choosing a king or a club?
Example 2.17. Two dice are rolled. The first is a fair 6-sided die. The second is a fair 4-sided die. Once
they are rolled, the two numbers on the two disc are used to create a 2 digit number. The number from the
six sided die is used to make the 10s digit. The number from the 4-sided die is used to make the ones digit.
What it the probability that the resulting number is odd or begins with an even number?
15
Definition 2.10. Events A and B are disjoint( or mutually exclusive) if they cannot occur at the same
time. (That is, they do not overlap.)
Probability of the Intersection of Two Disjoint Events
If events A and B are disjoint, then
P (A and B) =
Addition Rule for DISJOINT Events
If events A and B are disjoint, then
P (A or B) = P (A) + P (B) − P (A and B)
=
Example 2.18. Suppose that A and B are disjoint events such that the following is true: P (A) = .9, P (B) =
.06. Find P (A or B).
Example 2.19. In a group of 201 students 70 are freshmen, 41 are sophomores, 30 are junior, 50 are seniors,
and 10 are graduate students. Find the probability that a student picked from this group at random is either
a freshman or sophomore.
Example 2.20. A family of 6 is going to have their picture taken. The photographer is going to randomly
line everyone up. What is the probability that the mother ends up in the first chair or the father ends up in
the first chair?
16
Example 2.21. Two dice are rolled. The first is a fair 6-sided die. The second is a fair 4-sided die. Once
they are rolled, the two numbers on the two disc are used to create a 2 digit number. The number from the
six sided die is used to make the 10s digit. The number from the 4-sided die is used to make the ones digit.
What it the probability that the resulting number is odd or ends with a 2?
Example 2.22. A bucket contains some bouncy balls that are colored as well as numbered. The following
table indicates the number of each kind of ball in the bucket.
Yellow
Green
Orange
Red
Blue
Brown
Purple
Total
Odd
4
6
3
23
2
7
71
116
Even
45
68
13
25
9
7
11
178
Total
49
74
16
48
11
14
82
294
1. If a ball is randomly chosen, what is the probability that the ball will be blue even ball or a purple
odd ball?
2. If a ball is randomly chosen, what is the probability that the ball will be blue or purple?
3. If a ball is randomly chosen, what is the probability that the ball will be even, or purple?
17
Rule for Complimentary Events
Example 2.23. Find the indicated probabilities.
¯
1. Suppose P (A) = .23. Find P (A).
¯ = .12, P (B)
¯ = .21, P (C)
¯ = .22. Find P (B).
2. Suppose P (A)
Example 2.24. Same bucket as used in Example 2.22. What is the probability that a randomly selected
ball is neither brown nor even?
Yellow
Green
Orange
Red
Blue
Brown
Purple
Total
Odd
4
6
3
23
2
7
71
116
Even
45
68
13
25
9
7
11
178
Total
49
74
16
48
11
14
82
294
18
Example 2.25. Two dice are rolled. The first is a fair 6-sided die. The second is a fair 4-sided die. Once
they are rolled, the two numbers on the two disc are used to create a 2 digit number. The number from the
six sided die is used to make the 10s digit. The number from the 4-sided die is used to make the ones digit.
What it the probability that the resulting number is not 42?
Example 2.26. A single card is chosen at random from a standard deck of 52 playing cards. What is the
probability of choosing neither a king nor a club?
Example 2.27. In a group of 700 families, 75 had more than 3 children, 125 had exactly 3 children, 300
had 2 children, and 100 had only a single child. If one family is randomly selected, what is the probability
that it will have no children?
19
2.5
Conditional Probability & The Multiplication Rule
Definition 2.11. Let A and B be two events. The conditional probability of A given B, P (A|B), is the
probability that A happens given the information that B occurs. It is the probability of an event with the
additional information that some other event has already occurred. Denoted by P (B|A).
Example 2.28. A bucket contains some bouncy balls that are colored as well as numbered. The following
table indicates the number of each kind of ball in the bucket. The contents of the bucket are separated into
two buckets, odds and evens. If we randomly select a single ball from the odd bucket, what is the probability
that the ball is red?
Yellow
Green
Orange
Red
Blue
Brown
Purple
Total
Odd
4
6
3
23
2
7
71
116
Even
45
68
13
25
9
7
11
178
Total
49
74
16
48
11
14
82
294
Example 2.29. Two dice are rolled. The first is a fair 6-sided die. The second is a fair 4-sided die. Once
they are rolled, the two numbers on the two disc are used to create a 2 digit number. The number from the
six sided die is used to make the 10s digit. The number from the 4-sided die is used to make the ones digit.
If a two appeared on the 6-sided die, what it the probability that the resulting number is odd?
Example 2.30. Five cards are dealt from a freshly shuffled deck of cards. Suppose the first four cards are
kings, what it the probability that the fifth card will be an ace?
20
Definition 2.12. Two events A and B are independent if the occurrence of one event does not affect
the probability of the occurrence of the other event. If A and B are not independent, they are said to be
dependent.
If two events A and B are independent, then
• P (A|B) = P (A)
• P (B|A) = P (B)
• P (B and A) = P (A)P (B)
Example 2.31. Given two events A and B. Suppose that P (A|B) = .8 and P (A) = .81. Are the events A
and B independent?
Example 2.32. Given two independent events A and B. Suppose that P (B) = .8 and P (A) = .42. Find
P (B and A).
Example 2.33. An urn contains 2 colored balls: 1 blue & 1 red. If two balls are removed, one at a time,
replacing each after it is drawn. What is the probability that the second ball is red, if the first was blue?
Example 2.34. An urn contains 2 colored balls: 1 blue & 1 red. If two balls are removed, one at a time,
without replacing each after it is drawn. What is the probability that the second ball is red, if the first was
blue?
Remark 2.2. The method used for selecting, or sampling items, is very important and can determine
whether two events are independent or dependent.
• Selections (Sampling) without replacement: Dependent events.
• Selections (Sampling) with replacement: Independent events.
21
Formal Multiplication Rule
P (A and B) = P (A) × P (B|A)
Example 2.35. A bucket contains several colored bouncy balls, red,yellow and blue. One at a time, two
balls are removed from the bucket. After the first ball is removed, it will not be replaced. What is the
probability that the first ball is red and the second bouncy ball is green.
Example 2.36. A bucket contains several colored bouncy balls, red,yellow and blue. One at a time, two
balls are removed from the bucket. After the first ball is removed, it is replaced. What is the probability
that the first ball is red and the second bouncy ball is green.
Example 2.37. If two cards are dealt from a deck without replacing them, what is the probability that an
ace will be dealt first and a two will be dealt second?
Example 2.38. If two cards are dealt from a deck with replacement, what is the probability that an ace
will be dealt first and a two will be dealt second?
22
2.5.1
More Conditional Probability
Definition 2.13. Let A and B be two events. The conditional probability of A given B, P (A|B), is the
probability that A happens given the information that B occurs. It is the probability of an event with the
additional information that some other event has already occurred.
P (B|A) =
P (A and B)
P (A)
Example 2.39. A statistics professor tosses two coins that cannot be seen by any of the students. One
student asks: ” Did one of the coins turn up heads?” Suppose the professor answered “yes”, find the
probability that both coins turned up heads.
Example 2.40. An urn contains 3 colored balls: 2 blue & 1 red. If two balls are removed, one at a time,
without replacing each after it is drawn. What is the probability that the second ball is red, if the first was
blue?
23
Example 2.41. A student answers a multiple choice examination question that has 4 possible answers.
Suppose that the probability that the student knows the answer to the question is 0.80 and the probability
that the student guesses is 0.20. Also, If the student guesses, the probability of a correct guess is 0.25. If
the question is answered correctly, what is the probability that the student really knew the correct answer?
Chapter 3
Probability & Random Variables
Remark 3.1. This is chapter 5 in the textbook.
Our goal is to compute probabilities for Random Procedures/Phenomenon whose outcomes are numbers
Definition 3.1. A random variable is a variable (typically represented by x) that has a single numerical
value, determined by chance, for each outcome of a procedure. A random variable is a variable whose
value is a numerical outcome of a random procedure/phenomenon.
Example 3.1. Examples of Random Variables.
• The weight of a randomly selected package taken from the post office.
• The amount of time it takes to walk from the first floor to the fourth floor.
• The temperature of a randomly selected popsicle.
• The amount of money you spend on your next tank of gas.
• The number of lunches served in the cafeteria on a given day.
• The color of a ball pulled out of a bucket.
24
25
There are two ways to assign probabilities to a random variable. These provide two types of random
variables:
Definition 3.2. A Continuous Random Variable has infinitely many values, and the collection of values
is not countable.
Definition 3.3. A Discrete Random Variable has a collection of possible values that is finite or countable.
• Random variables will usually ( but not always ) be denoted by capital letters from the end of the
alphabet.
• When a random variable describes a random phenomenon, the sample space S lists the possible values
of the random variable.
Definition 3.4. A Probability Distribution is a description that gives the probability for each possible
value of a random variable. It is often expressed as a table, a formula, or a graph.
Examples of Probability Distributions
Example 3.2. A bucket contains 4 green, 3 brown and 3 purple bouncy balls. A ball is randomly selected
from the bucket. We check the color of the ball. (We could say that we count the number of green balls
observed.)
26
Example 3.3. A bucket contains 4 green, 3 brown and 3 purple bouncy balls. One at a time, four balls are
randomly removed, and replaced, from the bucket. We count the number of green balls observed.
Definition 3.5. A Binomial Probability Distribution results from a procedure that meets all the following requirements:
a.) The procedure has a fixed number of trials. A trial is a single observation.
b.) The trials must be independent. The outcome of any one trial has no affect on the probabilities in the other trials.
c.) Each trial must have all outcomes classified into two categories (commonly referred to as success and failure).
d.) The probability of a success remains the same for all trials.
If X has the Binomial distribution B(n, p) with n observations and probability p of success on each experiment, or observation,
the possible values of X are 0, 1, 2, . . . , n. If k is any one of these values, the binomial probability is
P (X = k) =n Ck pk (1 − p)n−k .
The mean and standard deviation of a binomial random variable X is
µ = np
σ=
p
np(1 − p)
27
Example 3.4. A coin is tossed four times.
1. What is the probability distribution of the discrete random variable X that counts the number of
2. Find P (X > 1).
3. Find P (X ≥ 1).
4. Find P (X ≤ 1).
Remark 3.2. A Binomial Probability Distribution results from a procedure that meets all the following requirements:
a.) The procedure has a fixed number of trials. A trial is a single observation.
b.) The trials must be independent. The outcome of any one trial has no affect on the probabilities in the other trials.
c.) Each trial must have all outcomes classified into two categories (commonly referred to as success and failure).
d.) The probability of a success remains the same for all trials.
28
Definition 3.6. If X has the Poisson distribution, P oisson(µ), with mean number of occurrences equal
to µ, the possible values of X are 0, 1, 2, 3, . . . . If k is any one of these values, the Poisson probability is
P (x) =
µx e−µ
.
x!
The mean is µ. The standard deviation of a Poisson random variable X is σ =
√
µ.
Remark 3.3. A Poisson Probability Distribution results from a procedure that meets all the following
requirements:
a.) The random variable counts the number of occurrences of an event over a time interval;
b.) The occurrences must be random, independent, and uniformly distributed over the time interval.
Example 3.5. Assume that the mean number of aircraft accidents in the United States is 8.5 per month.
Use the Poisson distribution to find the probability that in a month there will be
a.) 6 aircraft accidents.
b.) at least 5 aircraft accidents.,
c.) no more than 7 aircraft accidents.
d.) Over a one year period, how many aircraft accidents would you expect there to be?
29
PDF vs CDF
More Examples of Random Variables - Continuous
• The probability distribution of X is described by a density curve (a graph).
• The probability of any event is the area under the density curve and above the x axis, and between
the values of X that make up the event.
• The total area under a density curve is equal to 1, and a density curve never goes below the x-axis.
• Every individual outcome for a continuous random variable has probability zero.
30
Definition 3.7. A continuous random variable has a uniform distribution if its values are spread evenly
over the range of possible values. The density curve (graph) of a uniformly distributed random variable is a
rectangle.
Example 3.6. The amount of time a particular subway train will wait at a station is uniformly distributed
between 5 and 10 minutes. Find the probability that the train will wait
1. exactly 6 minutes.
2. at most 6 minutes.
3. at least 7 minutes
31
Definition 3.8. A continuous random variable X has a normal distribution with mean µ and standard
deviation σ if its density curve is given by
y=√
1 x−µ 2
1
e− 2 ( σ ) .
2πσ
µ+3σ
µ+2σ
x value
µ+1σ
µ
µ−1σ
µ−2σ
µ−3σ
Density
Normal Distribution
µ+3σ
µ+2σ
x value
µ+1σ
µ
µ−1σ
µ−2σ
µ−3σ
Density
Normal Distribution
µ+2σ
µ+3σ
µ+2σ
µ+3σ
x value
µ+1σ
µ
µ−1σ
µ−2σ
µ−3σ
Density
Normal Distribution
x value
µ+1σ
µ
µ−1σ
µ−2σ
µ−3σ
Density
Normal Distribution
• The probability distribution of X is described by a density curve (a graph).
• The probability of any event is the area under the density curve and above the x axis, and between the values of X that make
up the event.
• The total area under a density curve is equal to 1, and a density curve never goes below the x-axis.
• Every individual outcome for a continuous random variable has probability zero.
32
Example 3.7. The heights of fully grown white oak trees are normally distributed with a mean height of
90 feet and standard deviation of 3.5 feet.
1. What is the probability that a randomly selected fully grown white oak tree is less than 87 feet tall?
2. What is the probability that a randomly selected fully grown white oak tree is greater than 94 feet
tall?
Example 3.8. The ACT is an exam used by colleges and universities to evaluate undergraduate applicants.
The test scores are normally distributed. In a recent year, the mean test score was 20.1 and the standard
deviation was 4.3.
1. What is the probability that a randomly selected ACT score is between 16 and 24?
2. What is the probability that a randomly selected ACT score is greater then 22.5?
33
0.0
0.1
0.2
0.3
0.4
t−distributions
−3
−2
−1
0
1
2
3
X
Definition 3.9. A continuous random variable X has a t-distribution with k degrees of freedom, if its
density curve is given by
− k+1
2
Γ k+1
x2
2
y=√
.
1+
k
k
kπΓ 2
0.0
0.1
0.2
0.3
0.4
t−distributions
−3
−2
−1
0
1
2
3
2
3
2
3
2
3
X
0.1
0.2
0.3
t−distributions
−3
−2
−1
0
1
X
0.1
0.2
0.3
t−distributions
−3
−2
−1
0
1
X
0.1
0.2
0.3
t−distributions
−3
−2
−1
0
1
X
• The probability distribution of X is described by a density curve (a graph).
• The probability of any event is the area under the density curve and above the x axis, and between the values of X that make
up the event.
• The total area under a density curve is equal to 1, and a density curve never goes below the x-axis.
• Every individual outcome for a continuous random variable has probability zero.
34
0.00
0.10
0.20
0.30
chi−square distributions
0
2
4
6
8
10
12
X
Definition 3.10. A continuous random variable X has a χ2 -distribution with k degrees of freedom, if its
density curve is given by
y=
1
k
2
2 Γ
k
k
2
x
x 2 −1 e− 2 .
0.00
0.10
0.20
0.30
chi−square distributions
0
2
4
6
8
10
12
X
0.00
0.10
0.20
0.30
chi−square distributions
0
2
4
6
8
10
12
X
0.00
0.10
0.20
0.30
chi−square distributions
0
2
4
6
8
10
12
X
• The probability distribution of X is described by a density curve (a graph).
• The probability of any event is the area under the density curve and above the x axis, and between the values of X that make
up the event.
• The total area under a density curve is equal to 1, and a density curve never goes below the x-axis.
• Every individual outcome for a continuous random variable has probability zero.
35
Measuring the Center of a Distribution
4
6
8
10
0.4
0.3
0.1
0.2
0.3
0.0
0.1
0.2
0.3
0.0
0.1
0.2
0.3
0.2
0.1
0.0
2
p = 0.8
0.4
p = 0.5
0.4
p = 0.25
0.4
p = 0.1
2
x
4
6
8
0.0
3.1
10
2
4
x
6
8
10
2
4
x
6
8
10
x
Definition 3.11. The mean of a probability distribution, or the mean of a random variable, is a number
that indicates the center, or location, of the random variables distribution.
• If X is a discrete random variable whose distribution is
Possible Value of X
Probability
x1
x2
...
xk
P (x1 )
P (x2 )
...
P (xk )
then mean of X is computed as follows: µX = x1 P (x1 ) + x2 P (x2 ) + · · · + xk P (xk )
• The mean for a random variable X is also called the EXPECTED VALUE OF X.
• If you repeat a random procedure an extreme number of times, and average the observed random
variable will be very close to the mean of the random variable.
• The mean is what you expect to see on average.
• If a random variable X has a Binomial Distribution with n trials and probability of success p, then
µX = np.
0.00
0.15
• You will not need to compute the mean for a continuous random variable.
−10
−5
0
5
X
10
15
20
36
3.2
Measuring the Spread of a Distribution
Definition 3.12. The standard deviation of a probability distribution, or the standard deviation of a
random variable, is a number that indicates the spread, or dispersion, of the random variables distribution.
• If X is a discrete random variable with mean µ, and distribution
Possible Value of X
Probability
then the standard deviation of X is σ =
x1
x2
...
xk
P (x1 )
P (x2 )
...
P (xk )
p
(x1 − µ)2 P (x1 ) + (x2 − µ)2 P (x2 ) + · · · + (xk − µ)2 P (xk )
Example 3.9. Determine the mean, standard deviation, and variance for the following distribution:
X
P (X)
-1
.25
2
.6
10
.15
• If a random variable X has a Binomial Distribution with n trials and probability of success p, then
p
σX = np(1 − p).
0.00
0.10
• You will not need to compute the mean for a continuous random variable.
−20
Variance
−10
0
10
20
X
• The Variance of a random variable X is its standard deviation squared.
• The Variance of a random variable is another measure of the spread of a random variables distribution.
37
3.3
Percentiles & Critical Values
Percentiles
Definition 3.13. The 100αth -percentile is a number, P100α , that divides the probability distribution of
a random variable X into two parts where
P (X ≤ Pα ) ≥ α and P (X ≥ Pα ) ≥ 1 − α.
• The 100αth -percentile is a number, P100α , that separates the bottom 100α% of a distribution from the
top 100(1 − α)%.
Normal
Chi−Square
InvN orm(α, µ, σ)
t−distribution
InvT (α, df )
MATH ↓ Solver...
2
MATH ↓ Solver...
0 = α − χ cdf (0, X, df )
0 = α − tcdf (−299 , X, df )
ENTER ALPHA ENTER
ENTER ALPHA ENTER
38
Normal
Chi−Square
InvN orm(α, µ, σ)
t−distribution
InvT (α, df )
MATH ↓ Solver...
0=α−
χ2 cdf (0, X, df )
ENTER ALPHA ENTER
MATH ↓ Solver...
0 = α − tcdf (−299 , X, df )
ENTER ALPHA ENTER
Example 3.10. Find P99 for a t distributed random variable with 5 degrees of freedom.
Example 3.11. Find P95 for a χ2 -square distributed random variable with 3 degrees of freedom.
Example 3.12. Find P90 for a normally distributed random variable with µ = 5, and σ = 3.
Example 3.13. In a large section of a statistics class, the points for the final exam are normally distributed
with a mean of 72 and a standard deviation of 9. Find the lowest score on the final exam that would qualify
a student for an A, if an A should include the top 10% of the class.
Example 3.14. The annual per capita utilization of apples (in pounds) in the United States can be approximated by a normal distribution with µ = 17.4 lb. and σ = 4 lb. What annual per capita utilization of
apples represents the 10th percentile?
39
Critical Values
Definition 3.14. A critical value is a number that is used to separate unusual ( unlikely ) values for a
random variable from those values that are expected ( likely ) to occur.
• The placement of a critical value will depend on:
– the distribution of the random variable;
– the significance level α used to define what it means for an event to be unlikely.
• Some questions will require the determination of two critical values.
• ( Usual, Expected, Common, Likely ) values will generally be considered values “close” to the mean.
• ( Unusual, Unexpected, Surprising, Unlikely ) values will generally be considered values “far” to the
mean.
40
Critical Values for Specific Distributions
Notation 3.1. zα , or z ∗ , denotes a critical value for a Standard Normal Random variable with an area, or
probability, of α to its right.
Example 3.15. Find z.05
standard normal
Notation 3.2. tα,k , or t∗ , denotes a critical value for a t-Random Variable, with k degrees of freedom, with
an area, or probability, of α to its right.
Example 3.16. Find t.05,3
t−distribution
Notation 3.3. χ2α,k denotes a critical value for a χ2 -Random Variable, with k degrees of freedom, with an
area, or probability, of α to its right.
Example 3.17. Find χ2.05,4
chi−square
• The critical values given above define ( Unusual, Unexpected, Surprising, Unlikely ) values to be
numbers that are “far” from zero.
• Later, we will define these values to be the distance between what we expect to happen, and what
actually happens.
• This translates into the idea that unlikely values are those that are a “great distance” (relatively) from
what we expect.
41
Tail Events & Tail Probabilities
Definition 3.15. A one-tail event for a random variable X is an event such as
{X ≥ t}, {X ≤ t},
where t is any number.
Definition 3.16. A two-tail event for a random variable X is an event such as
{X > t or X < r},
where r < t are any numbers.
Definition 3.17. A tail probability is the probability of a ( two ) tail event.
– Percentiles and Critical Values are defined in terms of tail events.
– If a tail probability is smaller than a given significance level, α, then the tail event will be
considered unlikely.
– If a tail probability is smaller than a given significance level, α, then any outcome within that tail
event will be considered ( Unusual, Unexpected, Surprising, Unlikely ).
42
• Depending upon the situation, and significance level α, we may define “( Unusual, Unexpected, Surprising, Unlikely ) values” to be values that are
– Far from the mean AND too small
µ
– Far from the mean AND too big
µ
– Far from the mean AND either too big or too small
µ
Chapter 4
Samples
Population
Sample
Parameter
statistic
43
44
Remark 4.1. You should read Chapter 1 from your textbook. We will cover only the information necessary
for the procedures that will be introduced later.
4.1
Goals:
• Describe a population’s unknown distribution;
• Describe a population’s unknown parameters;
• Describe the nature of the relationship between populations.
4.2
Collecting Data
Definition 4.1. SAMPLE:
1. VERB To sample a population is the act of selecting individuals, items, object, or members of a
population.
2. NOUN A Sample is the subset of the population that has been selected.
Definition 4.2. A simple random sample of n subjects is selected in such a way that every possible
sample of the same size n has the same probability of being selected.
• All of the procedures that will be discussed later will use a simple random sample.
• A simple random sample is a selection of n subjects without replacement. This means we have dependent selections from a finite population.
• If the sample size is no more than 5% of the overall population, we will treat the selections as being
independent.
• We will think of our samples, as selections make with replacement.
• For examples in class, we will take samples ( make selections ) with replacement.
45
Other Sample Types
Definition 4.3. In systematic sample, we select some starting point and then select every k th element
in a population.
Definition 4.4. In stratified sample, we subdivide the population into at least two different subgroups
( or strata ) so that subjects within the same subgroup share the same characteristics. Then we draw a
sample from each subgroup (or stratum).
Definition 4.5. In cluster sampling, we first divide the population area into sections ( or clusters ). Then
we randomly select some of those clusters and choose all the members from those selected clusters.
Definition 4.6. With convenience sample, we simply use results that are very easy to get.
Definition 4.7. In an observational study, we observe and measure specific characteristics, but do not
attempt to modify the subjects being studied.
Definition 4.8. In an experiment, we apply some treatment and then proceed to observe its effects on the
subjects. ( Subjects in experiments are called experimental units.)
Type of Observational Studies
Definition 4.9. In a cross-sectional study, data are observed measured, and collected at one point in
time.
Definition 4.10. In a retrospective study, data are collected from the past by going back in time (through
examination of records, interviews, and so on.
Definition 4.11. In a prospective study, data are collected in the future from groups sharing common
factors.
46
4.3
Describing Populations using Graphs of Sample Data
Graphs of Sample ( Quantitative ) data can be used to make guesses about the distribution of a population.
We will look at the graphs to determine whether they appear to be :
• Normal
• Uniform
• Symmetric
• Skewed
Definition 4.12. A ( relative ) frequency histogram is a graph consisting of bars of equal width
drawn adjacent to each other ( unless there are gaps in the data). The horizontal scale represents classes of
quantitative data value and the vertical represents ( relative )frequencies. The heights of the bars correspond
to the ( relative ) frequency values.
47
Remark 4.2. Having a guess about the SHAPE of a distribution, allows you make a guess about how to
compute probabilities about future samples from the same type of distribution.
• If we do not know the SHAPE of a distribution, we CAN NOT make any GOOD guesses about the
probability of an event.
Assessing Normality with a Small Data Set
With a small data set, the shape of a distribution may not be very clear. It is very important to us to be
able to identify populations with Normal Distributions. A normal quantile plot can assist us with this.
• Normal Distribution
• Non-Normal Distribution
48
Stemplot
A Stemplot (Stem & Leaf plot) is a quick way to look at the SHAPE of a distribution, if your working by
hand, and have a relatively small data set.
Stem
1
Leaf
1
2
4.3.1
3
1
1
4
2
3
5
7
7
7
6
2
5
6
6
7
0
2
2
7
8
8
2
3
3
4
5
8
9
9
0
2
3
4
4
4
4
4
8
9
Other Types of Graphics
Definition 4.13. A scatterplot is a plot of paired (x, y) quantitative data with a horizontal x-axis and
vertical y-axis.
49
Definition 4.14. A time-series graph is a graph of times-series data, which are quantitative data that
have been collected over a period of time.
Definition 4.15. A Pareto chart is a bar graph for categorical data, with the bars arranged in descending
order according to frequencies.
Definition 4.16. A Pie Chart is a graph that depicts categorical data as slices of a circle, in which each
slice is proportional to the frequency count for the category.
50
4.4
Estimating Population Parameters using Sample Data
With a probability distribution for a random variable, defined several numbers that could be used to describe
the characteristics of the distribution.
• Center
– Mean
–
– Standard Deviation
–
–
• Proportion of Successes
• Percentiles
–
–
–
–
If we have a population, but don’t know its distribution, we probably don’t know some of these parameters.
We will need a method to estimate these parameters, based on samples that we take.
Remark 4.3. Not every parameter is interesting for every population.
51
4.4.1
Estimating a Population Mean
Definition 4.17. The sample mean is an estimate of the mean of a probability distribution. It can be
found by adding all the sample data values together, and dividing by the sample size.
x
¯=
x1 + x2 + · · · + xn
n
Example 4.1. Find the mean of the following sample values:
• It is a statistic.
• It is one possible measure of the center of a SAMPLE.
• It is an estimate of a center of a probability distribution.
• Its value will change depending upon the sample taken.
• one extreme value can change the value of the mean substantially.
• Sample means drawn from the same population tend to vary less than other measures of center.
52
Estimating the SAMPLE MEAN from a Frequency Distribution
#
Frequency
0.5 − 1.4
9
1.5 − 2.4
0
2.5 − 3.4
81
3.5 − 4.4
1
4.5 − 5.4
3
5.5 − 6.4
12
N
106
Estimating the SAMPLE MEAN from a Relative Frequency Distribution
#
Frequency
0.5 − 1.4
0.25
1.5 − 2.4
0.30
2.5 − 3.4
0.10
3.5 − 4.4
0.20
4.5 − 5.4
0.00
5.5 − 6.4
0.15
1.0
53
4.4.2
Estimating a Population Standard Deviation
Definition 4.18. The sample standard deviation is an estimate of the standard deviation of a probability
distribution. It is denoted by s and is a measure of how much the sample data deviates away from the sample
mean x
¯.
s
s=
2
(x − x
¯)
n−1
Example 4.2. Find the sample standard deviation of the following sample values:
Facts about the sample standard deviation
• s≥0
• s = 0 only if all if the data values are the same.
• s will increase greatly if only one additional data value is added that looks very different from the
others.
• The units for s are the same as the units on the original data.
• s2 = the sample variance is another measure of variation. It is the square of the sample standard
deviation.
54
Estimating the STANDARD DEVIATION from a Dataset
Frequency
#
0.5 − 1.4
9
2.5 − 3.4
81
5.5 − 6.4
1
6.5 − 7.4
3
9.5 − 10.4
12
N
106
Definition 4.19. The range of a data set is the measure of spread found by subtracting the smallest data
value from the largest data value.
Range Rule of Thumb
σ≈
4.4.3
Range
4
Estimating a Proportion of Successes
Definition 4.20. The sample proportion is an estimate of the probability of a success p for some random
procedure. It is denoted by pˆ. It is also called a sample proportion.
pˆ =
# of successes
n
Example 4.3. Find the sample proportion for the following samples:
55
4.4.4
Estimating Percentiles
Definition 4.21. The 100α-Percentile of a dataset, P100α , is a number that breaks the ordered dataset
into two groups with about 100α% of the dataset less than, or equal to, P100α and about 100(1 − α)% of the
dataset greater than, or equal to, P100α .
Finding the Percentile of a Data Value
Percentile of x =
# of data values < x
× 100
n
(Round up)
Example 4.4. Find the percentile of 18 for the following data:
2, 3, 4, 6, 7, 7, 8, 8, 9, 10, 13, 13, 14, 16, 18, 22, 22, 34, 56, 78
Converting a Percentile to a Data Value
L=
k
×n
100
Example 4.5. Find the value of the 20th percentile, P20 , for the following data:
2, 3, 4, 6, 7, 7, 8, 8, 9, 10, 13, 13, 14, 16, 18, 22, 22, 34, 56, 78
Example 4.6. Find the value of the 33rd percentile, P33 , for the following data:
2, 3, 4, 6, 7, 7, 8, 8, 9, 10, 13, 13, 14, 16, 18, 22, 22, 34, 56, 78
56
4.4.5
Boxplot - Using Sample Percentiles
Definition 4.22. For a set of data, the 5-number summary consists of these five values:
Minimum, Q1 , Q2 , Q3 , Maximum
Example 4.7. Give the 5-number summary for the following data:
2, 3, 4, 6, 7, 7, 8, 8, 9, 10, 13, 13, 14, 16, 18, 22, 22, 34, 56, 78
Definition 4.23. A boxplot is a graph of a data set that consists of a number line extending from the
minimum to the maximum data value, and a box drawn at the first, second and third quartiles.
Example 4.8. Construct a boxplot for the following data:
2, 3, 4, 6, 7, 7, 8, 8, 9, 10, 13, 13, 14, 16, 18, 22, 22, 34, 56, 78
57
1.5 × IQR Guideline for outliers
It is always important to look for data values that don’t apparently fit with the rest. Potential outliers can
be identifies as those data values that are
• less than Q1 − 1.5 × IQR.
• greater than Q3 + 1.5 × IQR.
Example 4.9. Identify any potential outliers for the following data:
2, 3, 4, 6, 7, 7, 7, 8, 9, 10, 13, 13, 14, 16, 18, 22, 22, 34, 56, 78
• This rule helps identify values that are “far” away from the central 50% of the data values.
4.4.6
Relative Distance From the Center
Definition 4.24. A z-score or standardized value is the number of standard deviations that a given
value x is above or below the mean. A z-score is calculated as follows:
58
• A z-score allows a comparison of distances between two distributions that are spread out in different
manners.
• In many cases, a z-score will represent the relative distance between an observation and a distributions
expected value.
• Large z-scores will represent observations that are “far” to what is expected. These observations would
be considered ( Unusual, Unexpected, Surprising, Unlikely ).
• Small z-scores will represent observations that are “close” to what we expect. These observations would
be considered ( Usual, Expected, Common, Likely ).
59
Example 4.10. Two statistics classes take an exam. The distribution of the test scores looked relatively
normal. Class A has a mean of 72 and a standard deviation of 3. Class B had a mean of 83 and a standard
deviation of 6. Michele is in Class A. She received a score of 81. Elaine is in Class B. She received a 91.
Elaine obviously has the higher overall score, but who did better with respect to their class? Does either
one of them have an unusually high score compared to their class?
60
4.5
Probability distribution of a z-score
• The observation used in the computation of a z-score are generally the outcome of some random
procedure.
• The observation represents the outcome of some random variable.
• If the probability distribution of the observation has a Normal distribution, then the z-score
– is a random variable,
– has a standard normal distribution.
If X ∼ Normal(µX , σX ) then z =
X − µX
∼ Normal(0, 1)
σX
We can use this idea to make estimates about the probabilities of future events, or about proportions of a
dataset.
Example 4.11. A sample was taken and the following histogram was made. Estimate the proportion of the
data that was within 1 standard deviations of the mean. Which data values appear to be within 1 standard
deviations of the mean?
5
6
7
8
9
10
11
61
Example 4.12. A sample was taken and the following histogram was made. Estimate the proportion of
the data that was within 1 standard deviations of the mean. Which data values appear to be within one
standard deviations of the mean?
10
15
20
25
30
Example 4.13. A sample was taken and the following histogram was made. Estimate the proportion of the
data that was within 2 standard deviations of the mean. Which data values appear to be within 2 standard
deviations of the mean?
7
8
9
10
11
12
13
62
Example 4.14. A sample was taken and the following histogram was made. Estimate the proportion of the
data that was within 3 standard deviations of the mean. Which data values appear to be within 3 standard
deviations of the mean?
2
4
6
8
10
12
14
Empirical Rule: 68-95-99.7
µ+3σ
µ+2σ
x value
µ+1σ
µ
µ−1σ
µ−2σ
µ−3σ
Density
Normal Distribution
63
4.6
Sampling Distributions
Definition 4.25. The sampling distribution of a statistic is the distribution of that statistic based on a
fixed sample size.
Recall. The following statistics are random variables:
• Sample Mean x
¯
• Sample Proportion pˆ
• Sample Standard Deviation s
Remark 4.4. Many other statistics exist.
4.6.1
Central Limit Theorem
Theorem 4.1. Central Limit Theorem Suppose that a random variable X has a mean µX and a standard
deviation σX < ∞, then the (sampling) distribution ( based on a simple random sample of size n ) of x
¯ will
be:
√
• Normally distributed with mean µX and standard deviation σ/ n, if X has a normal distribution.
√
• Approximately Normally distributed with mean µX and standard deviation σX / n, if the n > 30 and
the distribution of X is not heavily skewed.
σ
x
¯ ∼ Normal µX , √
n
64
Example 4.15. The height of adult females is normally distributed with a mean of 205.5 cm and a standard
deviation of 8.6 cm.
1. What is the probability that a randomly selected female will be taller than 210 cm?
2. What is the probability that the average height of 25 randomly selected females will be taller than 210
cm?
3. (α = .01) What heights of females would be considered unusually tall?
4. (α = .01) If 25 women are randomly selected, what would be considered an unusually high average
height?
65
Example 4.16. Suppose that the amount of time that you will wait for a bus, at a particular bus stop, has
a mean of 10 minutes with a standard deviation of 1 minute?
1. What is the probability that on a randomly selected day, you will wait longer than 12 minutes?
2. What is the probability that over 31 randomly selected days you will wait longer than 12 minutes on
average?
3. (α = .05) What would be considered an unusually long wait time?
4. (α = .05) Over the course of 31 randomly selected days, what would be considered an unusually long
average wait?
66
Corollary 4.2. If a population can be split into two disjoint groups, success and failure, and the proportion
of success is equal to p and a sample of size n is taken, where np ≥ 5 and n(1 − p) ≥ 5 then
!
r
p(1 − p)
pˆ ∼ Normal p,
n
Example 4.17. Seventy percent of a town is republican. A random sample of 100 residents will be taken.
What is the probability more than 71% of those sampled will be republicans?
Example 4.18. A coin is flipped 25 times, what is the probability that more than 60% of the flips will be
tails?
Chapter 5
Inference: Confidence Intervals
Idea for a Confidence Interval
0
1
2
3
4
5
67
6
7
8
9
10
68
5.1
Confidence Intervals for a Single Population
Definition 5.1. A Confidence Level 100(1 − α)% indicates that there is a 1 − α probability that a random
procedure produced an acceptable result.
Definition 5.2. An Interval Estimate is a range of numbers, determined by following a random procedure,
used to estimate an unknown population parameter.
Definition 5.3. A 100(1 − α)% Confidence Interval is an Interval Estimate produced by following a
procedure that correctly estimates an unknown population parameter at least 100(1 − α)% of the time, i.e.
the procedure has a 100(1 − α)% Confidence Level.
69
General Procedure for Constructing a Confidence Interval for a Mean or Proportion
1. Decide how confident you want to be in your interval estimate.
2. Decide how precise you want your estimate to be.
3. Using Step 1 and Step 2, determine the necessary sample size n.
4. If necessary, revisit Step 1 and Step 2, if the sample size determined in Step 3 is too large to manage.
5. Take a sample of at least size n.
6. Compute x
¯ or pˆ.
7. Compute your margin of error E.
8. Construct your Confidence Interval.
(Estimate − Margin of Error, Estimate + Margin of Error)
9. State with 100(1 − α)% Confidence that the unknown parameter is captured by the confidence interval.
70
5.1.1
Confidence Interval for a Population Mean
One possible way to produce a confidence interval for a mean. However, it is unrealistic. It assumes that we
know a population standard deviation
σ
σ
x
¯ − z α2 √ < µ < x
¯ + z α2 √
n
n
z=
−z α2
x
¯−µ
√
σ/ n
0
z α2
71
Real Life
• We don’t know the distribution.
• In real life, we don’t know σ.
• We estimate σ with s.
• We estimate the z-score with a t-score:
t=
t=
−t α2
x
¯−µ
√
s/ n
x
¯−µ
√
s/ n
0
100(1 − α)% Confidence Interval for µ
s
s
x
¯ − t α2 √ < µ < x
¯ + t α2 √
n
n
t α2
72
5.1.2
Confidence Interval for a Population Proportion
In a similar manner to the mean, we can make an estimate for a population proportion.
r
r
p(1 − p)
p(1 − p)
pˆ − z α2
< p < pˆ + z α2
n
n
z=
−z α2
qpˆ−p
p(1−p)
n
z α2
0
We ended with a method for estimating the unknown population proportion p. This has the problem that
we need to know the population proportion in order to estimate the population proportion.
100(1 − α)% Confidence Interval for p
r
pˆ − z
α
2
pˆ(1 − pˆ)
< p < pˆ + z α2
n
r
pˆ(1 − pˆ)
n
73
5.1.3
Examples
Example 5.1. Twelve leaves were randomly selected from the ground below a single tree and their length
(cm) was measured. Use the following information to estimate the mean length of all leaves found under
this tree. (95% Confidence)
13.65
15.3
15.45
15.7
11.9
10.4
13.6
16
Histogram of Data
3
2
Frequency
1
15
13
0
11
Sample Quantiles
4
Normal Q−Q Plot
11.30
−1.5 −1.0 −0.5
0.0
0.5
Theoretical Quantiles
1.0
1.5
10
11
12
13
Data
14
15
16
12.2
11.6
10.5
x
¯ = 13.133
s = 2.086
74
Example 5.2. A survey of 17 randomly selected UTM students was conducted. (Not really) They were
each asked if they had ever seen an episode of The Walking Dead. Their responses are recorded below.
A ‘1’ indicates that they said “yes”. A ‘0’ indicates that they said “no”. Estimate with 99% Confidence the
true proportion of UTM students that have seen an episode of The Walking Dead.
0
1
0
1
1
0
1
1
1
1
1
1
0
0
1
1
0
75
5.1.4
Precision
A short Confidence Interval gives a more precise estimate for the unknown population parameter. Precision
is controlled by three things:
• The desired and acceptable precision
• The Confidence Level
• The Sample Size
Example 5.3. A moving company is asked to move 10,000 identical blocks. The moving company wants
to know how much each box weighs in order to determine what equipment is needed to move the blocks.
The owner of the blocks knows that they all weigh about the same amount. Which would be a more useful
guess?
• Between 2 and 300 pounds;
• Between 30 and 40 pounds.
76
Sample Size for Estimating a Population Mean
n=
z
α/2 σ
2
E
( round up )
where σ is
• the known population standard deviation,
• an estimate of the population standard deviation taken from a previous study,
• estimated using the range rule of thumb,
Sample Size for Estimating a Population Proportion
When an estimate of p is known:
n = pˆ(1 − pˆ)
z
α/2
2
E
( round up )
When an estimate of p is unknown:
n = 0.25
z
α/2
E
2
( round up )
77
Example 5.4. You want to estimate the mean SAT score of all college applicants. Possible SAT scores
range from 600 to 2400. How many scores must be sampled if you would like to estimate the population
mean score to within 100 points with 98% confidence?
Example 5.5. Find the sample size needed to estimate the percentage of Republicans among registered
voters in California to within 3 percentage points with 90% confidence.
Example 5.6. A prior Pew Research Center report suggests that 15% of adults have consulted fortune
tellers. Determine the sample size necessary to estimate the percentage of adults that consult fortune tellers
within 3 percentage points with 98% confidence.
78
5.2
Confidence Intervals for a Comparing Two Populations
Many times, it is of interest to compare two populations. We might be interested in the following parameters:
• p1 − p2
• µ1 − µ2
These differences will still be unknown to us, and we will need to estimate them with confidence intervals in
the same manner as with a single population.
(Estimated Difference − Margin of Error, Estimated Difference + Margin of Error)
5.2.1
100(1 − α)% Confidence Intervals for p1 − p2
Margin of Error
r
E = z α2
pˆ1 qˆ1
pˆ2 qˆ2
+
n1
n2
Confidence Interval
(ˆ
p1 − pˆ2 ) − E < p1 − p2 < (ˆ
p1 − pˆ2 ) + E
Example 5.7. A study was conducted to determine the proportion of people who dream in black and white
instead of color. Among 306 people over the age of 55, 68 dream in black and white, and among 298 people
under the age of 25, 13 dream in black and white. Construct a 99% confidence interval estimate for difference
in proportions between the two age groups.
79
5.2.2
100(1 − α)% Confidence Intervals for µ1 − µ2 (Independent)
Margin of Error
s
E = t α2
s21
s2
+ 2
n1
n2
Confidence Interval
(¯
x1 − x
¯2 ) − E < µ1 − µ2 < (¯
x1 − x
¯2 ) + E
Example 5.8. The accompanying table gives results from a study of the words spoken in a day by men
and women. The original data can be found in your textbook, if your curious. Construct a 95% confidence
interval estimate for the difference in mean number of words spoken by men and women. The data collected
from each population looked relatively normal.
Men
Women
n1 = 186
n2 = 210
x
¯1 = 15668.5
x
¯2 = 16215.0
s1 = 8632.5
s2 = 7301.2
80
5.2.3
100(1 − α)% Confidence Intervals for µd = µ1 − µ2 ( Dependent Samples)
Margin of Error
sd
E = t∗ √
n
Confidence Interval
d¯ − E < µd < d¯ + E
Example 5.9. A Sample of students from two classes of statistics were given a sheet of paper with a
straight line drawn on it. Each student was asked to estimate the length of the line in two units of measure,
centimeters and inches. The estimates taken in inches were then converted into centimeters. The two
estimates could then be compared. Construct a 99% confidence interval for the mean of the difference in the
students estimates.
cm
converted
9
8.5
10
10
9
8
8.5
9
8.5
9
13
11.5
9
9.5
8.89
9.525
10.16
7.62
7.62
12.7
11.43
10.16
10.16
11.43
11.43
11.43
11.43
9.525
Chapter 6
Inference: Hypothesis Tests
Definition 6.1. A hypothesis is a claim or statement about a property of a population
Definition 6.2. A hypothesis test is a procedure for making a decision about a property of a population.
Idea for a Hypothesis Test
0
1
2
3
4
5
81
6
7
8
9
10
82
6.1
Hypothesis Tests
General Procedure for a Hypothesis Test
1. Decide which parameter of a population(s) you are interested in.
2. Decide upon a significance level α.
3. Make your claim about the parameter.
4. Determine your hypotheses.
(a) State your claim symbolically
(b) State the “opposite” of your claim symbolically.
(c) One of (4a) or (4b) contains equality. Call this your null hypothesis
(d) Call the remaining of (4a) or (4b) the alternative hypothesis
5. Pick a statistic that will estimate the parameter of interest.
83
General Procedure for a Hypothesis Test Continued
6. Make a rule for deciding which hypothesis the estimate is consistent with.
• For the sake of argument, we assume that the null hypothesis is true.
(a) Determine what it means for your estimate to be “far” from the parameter of interest.
Critical Values
Significance Level
(b) The rule: The estimate is inconsistent with the null hypothesis if
The estimate is “far” from the parameter.
7. Take a sample and compute your estimate.
8. Make a decision by applying your rule.
9. State your conclusion.
P-Value ≤ α.
84
6.1.1
Testing a Claim about a Mean
t=
Test Statistic:
x
¯−µ
√0
s/ n
df = n − 1
Hypotheses:
Left-tailed
H0 : µ = µ0
H1 : µ < µ0
Right-tailed
H0 : µ = µ0
H1 : µ > µ0
Two-Tailed
H0 : µ = µ0
H1 : µ 6= µ0
Rejection Region: Reject H0 if:
Critical Value
t ≤ −tα,n−1
Critical Value
t ≥ tα,n−1
Critical Value
|t| ≥ t α2 ,n−1
P-value
tcdf (−299 , t, df ) ≤ α
P-value
tcdf (t, 299 , df ) ≤ α
P-value
2 × tcdf (|t|, 299 , df ) ≤ α
85
Example 6.1. When 40 people used the Weight Watchers diet for one year, their mean weight loss was 3.0
lb and the standard deviation was 4.9 lb. Use a 0.01 significance level to test the claim that the mean weight
loss is greater than 0.
1. What is the critical value? Do you reject or fail to reject the null hypothesis?
• How “far” must x
¯ be from 0 before we are convinced that the mean weight loss is greater then 0?
• How “far” is x
¯ actually from the hypothesized value of 0?
2. What is the P - Value? Do you reject or fail to reject the null hypothesis?
• If the mean weight loss is equal to zero, how likely is it that I would get an x
¯ of 3.0 or more?
86
Example 6.2. Listed below are brain volumes (cm3 ) of unrelated subjects used in a study. Use a 0.01
significance level to test the claim that the population of brain volumes has a mean equal to 1100.0 cm3 .
963 1027 1272 1079 1070 1173 1067 1347 1100 1204
1. What are the critical values? Do you reject or fail to reject the null hypothesis?
• How “far” must x
¯ be from 1100 before we are convinced that the mean brain volume is not 1100.0
cm3 ?
• How “far” is x
¯ actually from the hypothesized value of 1100?
2. What is the P - Value? Do you reject or fail to reject the null hypothesis?
• If the mean brain volume is 1100.0 cm3 , how likely is it that I would observe an x
¯ at least as
different as
?
87
88
6.1.2
Testing a Claim about a Proportion
z=
Test Statistic:
Hypotheses:
Left-tailed
H0 : p = p0
H1 : p < p0
Right-tailed
H0 : p = p0
H1 : p > p0
Two-Tailed
H0 : p = p0
H1 : p 6= p0
q pˆ−p0
p0 (1−p0 )
n
Rejection Region: Reject H0 if:
Critical Value
z ≤ −zα
Critical Value
z ≥ zα
Critical Value
|z| ≥ z α2
P-value
normcdf (−299 , z, 0, 1) ≤ α
P-value
normcdf (z, 299 , 0, 1) ≤ α
P-value
2 × normcdf (|z|, 299 , 0, 1) ≤ α
89
Example 6.3. In a study of 420,095 Danish cell phone users, 135 subjects developed cancer of the brain
or nervous system. Test the claim that cell phone users develop cancer of the brain or nervous system at a
rate different from the rate of those that do not use cell phone. The cancer rate of non-cell phone users is
0.0340%.
1. What are the critical values? Do you reject or fail to reject the null hypothesis?
• How “far” from 0.000340 must pˆ be before we are convinced that the cancer rate of Danish cell
phone users is not 0.000340?
• How “far” is pˆ actually from the hypothesized value of 0.000340?
2. What is the P - Value? Do you reject or fail to reject the null hypothesis?
• If the cancer rate of Danish cell phone users is 0.000340, how likely is it that I would observe an
pˆ at least as different as
?
90
Example 6.4. A Consumer Reports Research center survey of 427 women showed that 22.0% of them
purchased books online. Test the claim that less than 25% of women purchased books online.
1. What is the critical value? Do you reject or fail to reject the null hypothesis?
• How “far” must pˆ be from 0.25 before we are convinced that the the proportion of women who
purchased books online is less than 0.25?
• How “far” is pˆ actually from the hypothesized value of 0.25?
2. What is the P - Value? Do you reject or fail to reject the null hypothesis?
• If the proportion of women who purchased books online is 0.25, how likely is it that I would
observe an pˆ smaller than
?
91
6.2
Hypothesis Tests for comparing two populations
As with confidence intervals, Hypothesis Test can be used to compare two populations. The following
parameters will be of interest:
• p1 − p2
• µ1 − µ2
These differences will still be unknown to us. The procedures will be similar to those used for single
populations.
What are the differences?
• Two samples will be collected.
• The difference between the parameters will be estimated.
• Our null hypothesis will generally be that the two parameters are the same.
• Our central question will become, how far apart must our estimates be, before we are convinced that
the parameters are different in some way.
92
6.2.1
Testing a Claim about p1 − p2
z=
Test Statistic:
pˆ1 −ˆ
p2
r
p¯(1−¯
p) n1 + n1
1
p¯ =
Hypotheses:
Left-tailed
H0 : p1 = p2
H1 : p1 < p2
Right-tailed
H0 : p1 = p2
H1 : p1 > p2
Two-Tailed
H0 : p1 = p2
H1 : p1 6= p2
2
x1 +x2
n1 +n2
Rejection Region: Reject H0 if:
Critical Value
z ≤ −zα
Critical Value
z ≥ zα
Critical Value
|z| ≥ z α2
P-value
normcdf (−299 , z, 0, 1) ≤ α
P-value
normcdf (z, 299 , 0, 1) ≤ α
P-value
2 × normcdf (|z|, 299 , 0, 1) ≤ α
93
Example 6.5. A study was conducted to determine the proportion of people who dream in black and white
instead of color. Two populations were considered. The first consisted of people over the age of 55, and the
second consisted of people under the age of 25. We want to use a 0.01 significance level to test the claim
that the proportion of people over 55 who dream in black and white is greater than the proportion for those
under 25. Two hundred people over 55 were surveyed, and 54 said that they dream in black and white.
Three hundred people under 25 were surveyed, and 47 said that they dream in black and white.
1. What is the critical value? Do you reject or fail to reject the null hypothesis?
• How “far” must pˆ1 − pˆ2 be from 0 before we are convinced that the proportion of adults over 55
who dream in Black and White is greater than the proportion for the under 25 group?
• How “different” is pˆ1 from pˆ2 ?
2. What is the P - Value? Do you reject or fail to reject the null hypothesis?
• If the proportions are actually the same, how likely is it that I would get a difference of pˆ1 − pˆ2
at least as big as
?
94
6.2.2
Testing a Claim about µ1 − µ2 (Independent)
t=
Test Statistic:
df =
Hypotheses:
Left-tailed
H0 : µ1 = µ2
H1 : µ1 < µ2
Right-tailed
H0 : µ1 = µ2
H1 : µ1 > µ2
Two-Tailed
H0 : µ1 = µ2
H1 : µ1 6= µ2
(A+B)2
A2
B2
n1 −1 + n2 −1
¯1 −¯
x2
rx
2
s2
1 + s2
n1
n2
, A = s21 /n1 , B = s22 /n2
Rejection Region: Reject H0 if:
Critical Value
t ≤ −tα,df
Critical Value
t ≥ tα
Critical Value
|t| ≥ t α2
P-value
tcdf (−299 , t, df ) ≤ α
P-value
tcdf (t, 299 , df ) ≤ α
P-value
2 × tcdf (|t|, 299 , df ) ≤ α
95
Example 6.6. The accompanying table gives results from a study of the words spoken in a day by men (
Pop. 1 ) and women ( Pop. 2 ). The original data can be found in your textbook, if you’re curious. Use a
0.01 significance level to test the claim that the mean number of words spoken in a day by men is less than
that for women.
Men
Women
n1 = 186
n2 = 210
x
¯1 = 15668.5
x
¯2 = 16215.0
s1 = 8632.5
s2 = 7301.2
1. What is the P - Value? Do you reject or fail to reject the null hypothesis?
• If the means are actually the same, how likely is it that I would get a difference of x
¯1 − x
¯2 at least
as small as
2. What would you conclude?
?
96
6.2.3
Testing a Claim about µd = µ1 − µ2 (Dependent)
t=
Test Statistic:
¯ 0
d−d
√
sd / n
df = n − 1
Hypotheses:
Left-tailed
H0 : µd = d0
H1 : µd < d0
Right-tailed
H0 : µd = d0
H1 : µd > d0
Two-Tailed
H0 : µd = d0
H1 : µd 6= d0
Rejection Region: Reject H0 if:
Critical Value
t ≤ −tα,n−1
Critical Value
t ≥ tα,n−1
Critical Value
|t| ≥ t α2 ,n−1
P-value
tcdf (−299 , t, df ) ≤ α
P-value
tcdf (t, 299 , df ) ≤ α
P-value
2 × tcdf (|t|, 299 , df ) ≤ α
97
Example 6.7. A study was conducted to investigate the effectiveness of hypnosis in reducing pain. Results
for randomly selected subjects are given in the accompanying table. The values are before and after hypnosis;
the measurements are in centimeters on a pain scale. It is claimed that the treatment is effective.
Subject
A
B
C
D
E
F
G
H
Before
6.6
6.5
9.0
10.3
11.3
8.1
6.3
11.6
After
6.8
2.4
7.4
8.5
8.1
6.1
3.4
2.0
Difference
1. What is the P - Value? Do you reject or fail to reject the null hypothesis?
• How “far” must d¯ be from 0 before we are convinced that hypnosis is effective?
• How “far” is d¯ actually from 0?
2. What is the P - Value? Do you reject or fail to reject the null hypothesis?
• If hypnosis is not effective, how likely is it that I would get an d¯ greater than
?
98
6.3
Other Types of Tests
Many types of tests exist. They all compare how “closely” our sample matches our expectations, i.e. they
compare how close a statistic is to some assumed parameter. However, in the next two tests, we can make
conclusion about more than just a single parameter.
6.3.1
Goodness of Fit
A goodness of fit test compares many proportions at one time. It can be used to determine how well the
distribution of a sample fits with a given distribution.
Hypotheses:
H0 : p1 = p1,0 , p2 = p2,0 , . . . , pk = pk,0
Ha : at least one proportion is not as claimed.
Test Statistic:
χ2 =
P (O−E)2
E
df = k − 1
Rejection Region at Level α:
Reject H0 if:
Critical Values: χ2 ≥ χ2α,df
P-Values: χ2 cdf (χ2 , 299 , df ) ≤ α
99
Example 6.8. For a recent year, the following numbers are the numbers of homicides that occurred each
month in NYC:
38, 30, 46, 40, 46, 49, 47, 50, 50, 42, 37, 37.
Use a 0.05 significance level to test the claim that homicides in NYC are equally likely for each of the twelve
months.
1. What is the critical value? Do you reject or fail to reject the null hypothesis?
• How “big” must χ2 be before we are convinced that homicides are not equally likely for each
month?
• How “big” is χ2 actually?
2. What is the P - Value? Do you reject or fail to reject the null hypothesis?
• If homicides are equally likely for each month, how likely is it that I would get an χ2 greater than
?
100
Example 6.9. Is the die that we rolled in class unfair? Use a 0.05 significance level to test the claim that
the outcomes are not equally likely.
1. What is the critical value? Do you reject or fail to reject the null hypothesis?
• How “big” must χ2 be before we are convinced that the die is unfair?
• How “big” is χ2 ?
2. What is the P - Value? Do you reject or fail to reject the null hypothesis?
• If the die is still fair, how likely is it that I would get an χ2 greater than
?
101
6.3.2
Contingency Tables - Test for Independence
Hypotheses:
H0 : The variables are independent.
Ha : The variables are dependent.
Test Statistic:
χ2 =
P (O−E)2
E
df = (r − 1)(c − 1)
Rejection Region at Level α:
Reject H0 if:
Critical Values: χ2 ≥ χ2α,df
P-Values: χ2 cdf (χ2 , 299 , df ) ≤ α
102
Example 6.10. In an imaginary study of the “gender effect”, 120 UTM students were observed. Each was
classified by gender (M,F) and by hair color (Light, Dark, Red) Use a 0.05 significance level to test the claim
that hair color is independent of gender. The observed counts are listed below.
Red
Dark
Light
Female
5
10
45
Male
15
30
15
1. What is the critical value? Do you reject or fail to reject the null hypothesis?
• How “big” must χ2 be before we are convinced that hair color is dependent on gender?
• How “big” is χ2 ?
2. What is the P - Value? Do you reject or fail to reject the null hypothesis?
• If hair color is independent of gender, how likely is it that I would get an χ2 greater than
?
What are the expected counts?
103
Example 6.11. In a clinical trial of the effectiveness of Echinacea for preventing colds, the results in the
table below were obtained. Use a 0.10 significance level to test the claim that getting a cold is independent
of the treatment group.
Treatment Group
Placebo
20% Extract
60% Extract
Got a cold
88
48
42
Didn’t get a cold
15
4
10
1. What is the critical value? Do you reject or fail to reject the null hypothesis?
• How “big” must χ2 be before we are convinced that getting a cold is dependent of the treatment
group?
• How “big” is χ2 ?
2. What is the P - Value? Do you reject or fail to reject the null hypothesis?
• If getting a cold is independent of the treatment group, how likely is it that I would get an χ2
greater than
What are the expected counts?
?
104
6.4
Errors in Hypothesis Testing
Hypothesis Tests can be performed properly, and our conclusions may be contrary to what is actually true.
• Sampling variability results in uncertain inferences and the possibilities of making errors in our decisions.
• Statistical procedures are designed to minimize the probability of committing an error.
• The two types of errors are called
– Type I Error - Reject H0 , when H0 is true.
∗ The significance level is the probability of a Type I error.
∗ We choose the significance level, so we choose the Type I error rate.
– Type II Error - Fail to Reject H0 , when H0 is false.
Hypothesis Tests Outcomes
Population
Reject H0
H0 True
H0 False
Type I Error
Correct Decision
Correct Decision
Type II Error
Sample
Fail To Reject H0
Example 6.12. Suppose a test of H0 : µ = 9 vs H0 : µ 6= 9 is performed. Describe what a Type I and Type
II error would be.
105
Example 6.13. Suppose a test of H0 : p = .9 vs H0 : p < .9 is performed. At the end of the test, you
determined that you would fail to reject the null hypothesis. It was later determined that p was actually
equal to .76. Did your test produce an erroneous result? If so, what type of error did you make?
Example 6.14. Suppose a test of H0 : σ ≤ 87 vs H0 : σ > 87 is performed. At the end of the test, you
determined that you would fail to reject the null hypothesis. It was later determined that σ was actually
equal to 73. Did your test produce an erroneous result? If so, what type of error did you make?
Chapter 7
Correlation & Regression
Regression Techniques allow us to describe the relationship between paired random variables.
7.1
Correlation
Definition 7.1. The linear correlation ρ measures the strength & direction of the linear relationship
between a collection of paired random variables.
Definition 7.2. The linear correlation coefficient r measures the strength & direction of the linear
relationship between a collection of paired data values. It is used to estimate the linear correlation ρ.
106
107
7.1.1
Test for Linear Correlation
Hypotheses:
H0 : ρ = 0 There is no linear correlation.
Ha : ρ 6= 0 There is a linear correlation.
Test Statistic:
r=
n
t=
P
n
r
P
x2 −
xy −
P
x
2
P
x
P
r
n
P
y
y2 −
P
y
2
q r
1−r 2
n−2
df = n − 2
Rejection Region at Level α:
Reject H0 if:
Critical Values: t ≥ t α2 ,df or t ≤ −t α2 ,df
P-Values: tcdf |t|, 299 , df ≤ α
108
Example 7.1. Listed below are annual data for various years. The data are weights (metric tons) of lemons
imported from Mexico and U.S. car crash fatality rates per 100,000 population. Estimate the strength of
the linear relationship between the Lemon Import Data and the number of crash fatalities? Is there a linear
correlation between car crash fatalities and lemon imports from Mexico?
Lemon Imports
230
265
358
480
530
Crash Fatalities
15.9
15.7
15.4
15.3
14.9
Example 7.2. One classic application of correlation involves the association between the temperature and
the number of times a cricket chirps in a minute. Listed below are the number of chirps in one minute and
the corresponding temperatures. Estimate the strength of the linear relationship between the two variables.
Is there a correlation between the temperature and the number of times a cricket chirps?
Chirps in 1 min
882
1188
1104
864
1200
1032
960
900
◦
69.7
93.3
84.3
76.3
88.6
82.6
71.6
79.6
Temperature ( F )
109
7.2
Regression
Definition 7.3. Given a collection of paired sample data, the regression line is the straight line that
“best” fits the scatterplot of data. The regression equation describes the regression line.
yˆ = b0 + b1 x
Example 7.3. Listed below are annual data for various years. The data are weights (metric tons) of lemons
imported from Mexico and U.S. car crash fatality rates per 100,000 population. Measure the strength of the
linear relationship between the Lemon Import Data and the Crash fatality data. Additionally, find the best
predicted crash fatality rate for a year in which there are 500 metric tons of lemon imports.
Lemon Imports
230
265
358
480
530
Crash Fatalities
15.9
15.7
15.4
15.3
14.9
Example 7.4. One classic application of correlation involves the association between the temperature and
the number of times a cricket chirps in a minute. Listed below are the number of chirps in one minute and
the corresponding temperatures. Measure the strength of the linear relationship between the two variables.
Find the best predicted temperature at a time when a cricket chirps 950 times in one minute.
Chirps in 1 min
882
1188
1104
864
1200
1032
960
900
Temperature (◦ F )
69.7
93.3
84.3
76.3
88.6
82.6
71.6
79.6
```