# Introductory Statistics

Subject: Sciences

Resource Type: Textbook

Date Released: 2016-05-11

Description: An introductory statistics textbook that covers descriptive statistics, probability (discrete and continuous), sampling distributions, hypothesis testing (one sample, two sample, ANOVA, chi-squared tests), and simple linear regression.

Date of Review:

## Review Details

 Comprehensiveness: (The learning object covers all areas and ideas of the subject appropriately.) Yes Comments:• Covers the material. Same content as most introductory stats books except that it is missing o hypothesis test for two means when standard deviations are unknown but equal and for proportions when there is non-zero difference in the hypothesized proportions; o confidence intervals for two samples (mean and proportion) and variance for one sample; o box plots (which is especially problematic because they say they’re going to focus on them at the beginning of chapter 2 then never mention the again). • Provides glossary at the end of each section, but not at end of book or even chapter. But does have a decent search function. • The index is thorough. - Missing any integration with technology. Nothing on how to use a calculator or computer program except for linear regression in Excel. I do understand that since this is an OER they are perhaps choosing not to do this so that it will fit with a wider range of courses, but it would still be nice to have some level of integration.
 Content Accuracy: (Content, including diagrams and other supplementary material, is accurate, error-free and unbiased.) No Comments:This is a list of errors or poorly worded statements that could cause confusion. • In section 7.2, “The law of large numbers says that if you take samples of larger and larger size from any population, then the mean of the sampling distribution tends to get closer and closer to the true population mean”. Though this is not technically false, it is missing a lot of details that could lead to a misconception of what the law of large numbers actually says. • There are exercises with incomplete answers. • In section 8.1, say that if you solve for mu in z_1=(\bar x – \mu)/(\sigma/sqrt(n)), you get \mu = \bar x +/- z_1*\sigma/sqrt n. The +/- is incorrect. • In section 8.1, they say in the intro the “level of ignorance admitted” is explained. It is not. • In section 8.1, they switch from z_\alpha in formula for the margin of error to z_\alpha/2 in how to find. Very confusing. • In section 8.3, says there is no “correction factor” when np' and nq' are not both greater than or equal to 5, but there is in the life sciences. • In Section 9.3, they refer to the “perform hypotheses tests of a population mean using a normal distribution or a Student's t-distribution” which suggests that the Student’s t-distribution is not normal. They should say “perform hypotheses tests of a population mean using a standard normal distribution or a Student's t-distribution.” This error is continued throughout the section. • In section 9.3, “We interpret this Z value as the associated probability that a sample with a sample mean of bar X could have come from a distribution with a population mean of H0.” But the z-value is not a probability. That’s the p-value. • In section 9.2 and 9.3, “This is a sampling distribution of \bar X and by the Central Limit Theorem it is normally distributed.” As the sample size used to determine the sampling distribution has not been stated, this is not necessarily a correct statement. In fact, it is a common misconception of students that the sampling distribution is always normal. • In section 9.3, it is suggested that using a p-value to make a decision requires an extra step (i.e. calculating the p-value). But if you use the critical value method, you have the additional step of finding the critical value (which is not needed when using the p-value). Thus both methods require the same number of steps. • In section 9.3, “Take a sample(s) and calculate the relevant parameters” should be “Take a sample(s) and calculate the relevant statistics”. • In section 9.3, Figure 2 only shows half of the p-value. This wouldn’t be an error if the decision rule for a two-tailed test was to reject H0 if p<\alpha/2, but it is not. • In chapter 11, they present the hypothesis test for variance as only two-tailed, but then one of their examples is left-tailed.
 Relevance/Longevity: (Content is up-to-date, but not in a way that will quickly make the text obsolete within a short period of time. The content is written and/or arranged in such a way that necessary updates will be relatively easy and straightforward to implement.) Yes Comments:I only saw one minor issue: Example 8.2 – types of phones will get old fast.
 Clarity: (The learning object is written in lucid, accessible prose, and provides adequate context for any jargon/technical terminology used.) Yes Comments:• But there are occasional use of overly technical terms (E.g. X ~ N(5,2)) that confuse things. Or the use of technical terms without clear examples (e.g. PDF – not one example provided). There is also a heavy focus on mathematical notation (e.g. using \delta_0 in the difference between two means test). • In section 10.1, the formulas are fairly complicated and use x_1 and x_2, but in the example they switch to x_b and x_g without showing how to use the formula. This would be very confusing for students. - The formula for the correlation coefficient is needlessly complicated for students in an introductory statistics course for non-mathematicians.
 Consistency: (The learning object is internally consistent in terms of terminology and framework.) Yes Comments:I wish I could say more than yes or no on this one. My answer is "mostly". For the most part it is fine, but chapters 8 and 9 had a fair number of inconsistencies. • In section 8.3, the switch from using mean and proportion to continuous random variables and binary variables. • In the intro for Chap. 8, they calculate a 95% CI with z_alpha = 2, but later in 8.1 state that z_alpha = 1.96. This would be very confusing for the reader. • In Chapter 8, they say “Up until the mid-1970s, some statisticians used the normal distribution approximation for large sample sizes and only used the Student's t-distribution only for sample sizes of at most 30.” Which suggests that past the 1970s that is not what they did. But in section 9.3, they say “(Remember, use a Student's t-distribution when the population standard deviation is unknown and the sample size is small, where small is considered to be less than 30 observations.)”. It is very unclear when you are supposed to use t or z. This issue is resolved (finally) at the end of section 9.3. • In chapter 9, the language for the decision changes throughout the chapter: reject H0 in introduction, cannot accept H0 in 9.2, can reject H0 in 9.4. In Chapter 9, they show hypotheses as being opposite (e.g. => and < are paired), but in Chapter 11, they switch to H0 as always “=” (i.e. = and < are paired).
 Modularity: (The learning object is easily and readily divisible into smaller reading sections that can be assigned at different points within the course (i.e., enormous blocks of text without subheadings should be avoided). The learning object should not be overly self-referential, and should be easily reorganized and realigned with various subunits of a course without presenting much disruption to the reader. The ranking of N/A can be used if the object is already small and therefore would not be used in smaller parts.) Yes Comments:But sometimes this isn’t a good thing. E.g. z-scores are covered in Chap. 6 and 2. Yet there is no referring back to them in Chap. 6 (e.g. hyperlink). This would be helpful for readers to give them a chance to review. That is, some self-references would have helped the text.
 Organization/Structure/Flow: (The topics in the learning object are presented in a logical, clear fashion.) No Comments:• There are some ideas that flow nicely. E.g. Chebychev to Empirical rule in Chapter 1. • Chapter 1 jumps from topic to topic without flow. • Chapter 1 and 2 cover similar material but there is no connection made. E.g. frequency table is discussed in chapter 1, but no relationship is made to it when they construct histograms in chapter 2. • Chapter 5 and 6 are both about continuous distributions, but are in two separate chapters. • Chap. 8 had major flow problems. The introduction essentially showed how to make a 95% confidence interval (CI), but without explanation. Then 8.1 goes into the same details. There are two “approaches” for CI presented but they aren’t different (either calculate E independently then add and subtract; or write it out as one big formula). Then section 8.3 jumps around on how we know the binomial distribution is normal (n has to be large, p not close to 0 or 1; np>5, nq>=5; CLT) • Section 10.1 starts with two points without any context or pre-amble. I think they are stating the conditions to use the hypothesis test that will be presented in the section, but that is not clear. They do fix this for later sections. • In section 10.1, there is a really small section on effect size, but without any explanation as to what it is. It is about four lines long and is not explained at all.
 Interface: (The learning object is free of significant interface issues, including navigation problems, distortion of images/charts, and any other display features that may distract or confuse the reader.) Yes Comments:Though the search for two words (e.g. “effect size”) resulted in searches for size and effect separately – even with quotations.
 Grammatical Errors: (The learning object contains no grammatical errors.) Yes Comments:
 Cultural Relevance: (The learning object is not culturally insensitive or offensive in any way. It should make use of examples that are inclusive of a variety of races, ethnicities, and backgrounds.) Yes Comments:• Used various names from multiple cultures. • But most examples are American centric and, I would say, middle-class.