Introductory Statistics

Subject: Sciences

Course Level: 1st Year Ungergrad

Resource Type: Textbook

Date Released: 2016-05-11

Description: An introductory statistics textbook that covers descriptive statistics, probability (discrete and continuous), sampling distributions, hypothesis testing (one sample, two sample, ANOVA, chi-squared tests), and simple linear regression.


Date of Review:

Review Details

Comprehensiveness: (The learning object covers all areas and ideas of the subject appropriately.) Yes
• Covers the material. Same content as most introductory stats books except that it is missing o hypothesis test for two means when standard deviations are unknown but equal and for proportions when there is non-zero difference in the hypothesized proportions; o confidence intervals for two samples (mean and proportion) and variance for one sample; o box plots (which is especially problematic because they say they’re going to focus on them at the beginning of chapter 2 then never mention the again). • Provides glossary at the end of each section, but not at end of book or even chapter. But does have a decent search function. • The index is thorough. - Missing any integration with technology. Nothing on how to use a calculator or computer program except for linear regression in Excel. I do understand that since this is an OER they are perhaps choosing not to do this so that it will fit with a wider range of courses, but it would still be nice to have some level of integration.
Content Accuracy: (Content, including diagrams and other supplementary material, is accurate, error-free and unbiased.) No
This is a list of errors or poorly worded statements that could cause confusion. • In section 7.2, “The law of large numbers says that if you take samples of larger and larger size from any population, then the mean of the sampling distribution tends to get closer and closer to the true population mean”. Though this is not technically false, it is missing a lot of details that could lead to a misconception of what the law of large numbers actually says. • There are exercises with incomplete answers. • In section 8.1, say that if you solve for mu in z_1=(\bar x – \mu)/(\sigma/sqrt(n)), you get \mu = \bar x +/- z_1*\sigma/sqrt n. The +/- is incorrect. • In section 8.1, they say in the intro the “level of ignorance admitted” is explained. It is not. • In section 8.1, they switch from z_\alpha in formula for the margin of error to z_\alpha/2 in how to find. Very confusing. • In section 8.3, says there is no “correction factor” when np' and nq' are not both greater than or equal to 5, but there is in the life sciences. • In Section 9.3, they refer to the “perform hypotheses tests of a population mean using a normal distribution or a Student's t-distribution” which suggests that the Student’s t-distribution is not normal. They should say “perform hypotheses tests of a population mean using a standard normal distribution or a Student's t-distribution.” This error is continued throughout the section. • In section 9.3, “We interpret this Z value as the associated probability that a sample with a sample mean of bar X could have come from a distribution with a population mean of H0.” But the z-value is not a probability. That’s the p-value. • In section 9.2 and 9.3, “This is a sampling distribution of \bar X and by the Central Limit Theorem it is normally distributed.” As the sample size used to determine the sampling distribution has not been stated, this is not necessarily a correct statement. In fact, it is a common misconception of students that the sampling distribution is always normal. • In section 9.3, it is suggested that using a p-value to make a decision requires an extra step (i.e. calculating the p-value). But if you use the critical value method, you have the additional step of finding the critical value (which is not needed when using the p-value). Thus both methods require the same number of steps. • In section 9.3, “Take a sample(s) and calculate the relevant parameters” should be “Take a sample(s) and calculate the relevant statistics”. • In section 9.3, Figure 2 only shows half of the p-value. This wouldn’t be an error if the decision rule for a two-tailed test was to reject H0 if p<\alpha/2, but it is not. • In chapter 11, they present the hypothesis test for variance as only two-tailed, but then one of their examples is left-tailed.
Relevance/Longevity: (Content is up-to-date, but not in a way that will quickly make the text obsolete within a short period of time. The content is written and/or arranged in such a way that necessary updates will be relatively easy and straightforward to implement.) Yes
I only saw one minor issue: Example 8.2 – types of phones will get old fast.
Clarity: (The learning object is written in lucid, accessible prose, and provides adequate context for any jargon/technical terminology used.) Yes
• But there are occasional use of overly technical terms (E.g. X ~ N(5,2)) that confuse things. Or the use of technical terms without clear examples (e.g. PDF – not one example provided). There is also a heavy focus on mathematical notation (e.g. using \delta_0 in the difference between two means test). • In section 10.1, the formulas are fairly complicated and use x_1 and x_2, but in the example they switch to x_b and x_g without showing how to use the formula. This would be very confusing for students. - The formula for the correlation coefficient is needlessly complicated for students in an introductory statistics course for non-mathematicians.
Consistency: (The learning object is internally consistent in terms of terminology and framework.) Yes
I wish I could say more than yes or no on this one. My answer is "mostly". For the most part it is fine, but chapters 8 and 9 had a fair number of inconsistencies. • In section 8.3, the switch from using mean and proportion to continuous random variables and binary variables. • In the intro for Chap. 8, they calculate a 95% CI with z_alpha = 2, but later in 8.1 state that z_alpha = 1.96. This would be very confusing for the reader. • In Chapter 8, they say “Up until the mid-1970s, some statisticians used the normal distribution approximation for large sample sizes and only used the Student's t-distribution only for sample sizes of at most 30.” Which suggests that past the 1970s that is not what they did. But in section 9.3, they say “(Remember, use a Student's t-distribution when the population standard deviation is unknown and the sample size is small, where small is considered to be less than 30 observations.)”. It is very unclear when you are supposed to use t or z. This issue is resolved (finally) at the end of section 9.3. • In chapter 9, the language for the decision changes throughout the chapter: reject H0 in introduction, cannot accept H0 in 9.2, can reject H0 in 9.4. In Chapter 9, they show hypotheses as being opposite (e.g. => and < are paired), but in Chapter 11, they switch to H0 as always “=” (i.e. = and < are paired).
Modularity: (The learning object is easily and readily divisible into smaller reading sections that can be assigned at different points within the course (i.e., enormous blocks of text without subheadings should be avoided). The learning object should not be overly self-referential, and should be easily reorganized and realigned with various subunits of a course without presenting much disruption to the reader. The ranking of N/A can be used if the object is already small and therefore would not be used in smaller parts.) Yes
But sometimes this isn’t a good thing. E.g. z-scores are covered in Chap. 6 and 2. Yet there is no referring back to them in Chap. 6 (e.g. hyperlink). This would be helpful for readers to give them a chance to review. That is, some self-references would have helped the text.
Organization/Structure/Flow: (The topics in the learning object are presented in a logical, clear fashion.) No
• There are some ideas that flow nicely. E.g. Chebychev to Empirical rule in Chapter 1. • Chapter 1 jumps from topic to topic without flow. • Chapter 1 and 2 cover similar material but there is no connection made. E.g. frequency table is discussed in chapter 1, but no relationship is made to it when they construct histograms in chapter 2. • Chapter 5 and 6 are both about continuous distributions, but are in two separate chapters. • Chap. 8 had major flow problems. The introduction essentially showed how to make a 95% confidence interval (CI), but without explanation. Then 8.1 goes into the same details. There are two “approaches” for CI presented but they aren’t different (either calculate E independently then add and subtract; or write it out as one big formula). Then section 8.3 jumps around on how we know the binomial distribution is normal (n has to be large, p not close to 0 or 1; np>5, nq>=5; CLT) • Section 10.1 starts with two points without any context or pre-amble. I think they are stating the conditions to use the hypothesis test that will be presented in the section, but that is not clear. They do fix this for later sections. • In section 10.1, there is a really small section on effect size, but without any explanation as to what it is. It is about four lines long and is not explained at all.
Interface: (The learning object is free of significant interface issues, including navigation problems, distortion of images/charts, and any other display features that may distract or confuse the reader.) Yes
Though the search for two words (e.g. “effect size”) resulted in searches for size and effect separately – even with quotations.
Grammatical Errors: (The learning object contains no grammatical errors.) Yes
Cultural Relevance: (The learning object is not culturally insensitive or offensive in any way. It should make use of examples that are inclusive of a variety of races, ethnicities, and backgrounds.) Yes
• Used various names from multiple cultures. • But most examples are American centric and, I would say, middle-class.
Are there any other comments you would like to make about this book, for example, its appropriateness in a Canadian context or specific updates you think need to be made?
My lens I reviewed this book from the perspective of readability (i.e. will a first year student be able to read and understand the text), explanation of concepts (i.e. are the theoretical concepts explained at an appropriate level for non-mathematicians, but with enough explanation that there is more than just procedures/recipes being taught), and applicability (i.e. are the topics being presented as having real-world applications). Overall comments from that lens This textbook is written at readability level that is more appropriate than other texts that I’ve read, but still gets weighted down by heavy formulas and math jargon. Some concepts are explained well and good examples/analogies are used. Other concepts, in particular confidence intervals and hypothesis tests, are not very well explained. For these concepts, the focus is on recipes rather than on understanding what is going on. There are a multitude of exercises for students to do that could demonstrate how statistics and probability apply to the real world, but I found the actual examples in the sections, in particular on probability (chapters 4-7), to be lacking. The real world applications were usually referred to in the introduction for the chapter but were not reinforced in sections. I was surprised at the number of errors and inconsistencies. I think many students would get very confused by these. I could use this book in my class as a resource for examples, but I would hesitate in having my students read sections of it before class. General comments • This is written for an American audience. • What is being asked to do in an exercise is sometimes unclear until you see the solution. • In the online text, it would be helpful if the exercises were numbered so that we can refer to them in a useful way. In the PDF, it would be helpful if the Try-it exercises had answers. • In the online text, it would be helpful, if the “chapter review” either got renamed “section review” or all was combined and put at the end as an actual chapter review. • A chapter review section with exercises that combine the various sections in random order would help students learn how to do question without knowing what type it is. Chapter specific comments • The first two chapters are very well written and easy to follow. • Chapter 3 (Probability) was well-written and had very nice flow to the ideas. But the examples and exercises on in/dependent events and mutually exclusive events, in section 3.2, did not line up to with what was being taught. • Chapter 8 (Confidence intervals) was particularly problematic for me. The introduction was too detailed without enough explanation. The ideas had no flow and there were multiple errors that would confuse students. In particular, in section 8.1, two “approaches” to finding confidence intervals (CIs) are presented, but they aren’t different approaches. They are just two different ways to perform the calculation (as one long formula or broken into two steps). I also found the section full of recipes. E.g. here’s the sentence that you fill in to interpret. But no attempt to go further into what the CI means. Students could be successful here without truly understanding what a CI is. o A common issue I have with textbooks is the known vs. unknown standard deviation break-up of the CIs. I felt this was particularly poorly dealt with here. 8.1 starts with “assume \mu is unknown but \sigma is not”. This is a highly problematic statement. How can you possibly know \sigma without knowing \mu? I get why we teach it this way (as students are comfortable with z), but we need to recognize this is an unrealistic situation and address that up front. Then in 8.2, the authors say 8.1 is unrealistic and that up until the 1970s some statisticians used z for large sample and t for small samples, but at no point clarify what is expected for these students to do. That is, what do statisticians do now? Do they use t for all CI for the mean? What happens in the real world? This isn't resolved until the end of section 9.3 (a chapter and a half later!). o Chap. 7 doesn’t have the sampling distribution for proportion. That’s fine as long as section 8.3 (on CI for proportion) actually addresses this lack. But it doesn’t. o There are no images of the Student-t distribution comparing it to the standard normal distribution. • Chapter 9 (Hypothesis tests with one sample) starts with a great deal of promise. The introduction provides interesting examples and situates hypothesis testing in the realm of scientific inquiry. But then when it gets into the actual idea of hypothesis testing, the promise is lost in the lack of details and the relationship to scientific inquiry is left in the introduction. Thus the promise of relating H0 and Ha to the scientific process is gone. o Cons: ♣ An example of the lack of details is that how to find H0 is clearly stated, but what H0 is isn’t. ♣ Section 9.3 is riddled with errors (see 2 above). ♣ In section 9.3, the steps to perform a hypothesis test are only for the critical value method. ♣ The explanation on how to find p-values is sorely lacking. For example, the one example on how to find the p-value for the Student-t distribution only has the final p-value without any explanation as to how it was found. ♣ Heavy focus on two-tailed tests (this continues into all of the chapters that involve hypothesis tests). o Pros: ♣ There is a great deal of examples on type I and II errors. ♣ There is a great explanation of the premise of hypothesis testing in section 9.5. But why is it at the end? Should start with it! • In the hypothesis testing chapters, the first exercise in the homework or “chapter” review has the students decide what type of test to use, but since the section only does one type of test it is a pointless question. For example, in the one-sample proportion test homework, why are they asking what type of test to use? The answer is one-sample proportion test because that’s the only option. • I like how in the goodness-of-fit test they have examples where the test cannot be used (i.e. fails the criteria). It is nice that they are reinforcing that you can’t use tests willy-nilly.
Level: (For what level would this text be appropriate (i.e. First Year, Second Year, etc)?)
First year
Subject Matter:
Math / Stats
1st Year Ungergrad
Interactive, Downloadable Documents (ie: PDF)

Peer Reviewer Name: Collette Lemieux

License for this resource: CC BY 4.0 Creative Commons