Lecture 6: The Logic of Sampling

The Logic of Sampling

This lecture for social science research methods at the University of Maine at Augusta presents resources on sampling to augment Chapter 6 of our textbook,  The Process of Social Research. This lecture uses a hypothetical example having to do with “cat people” and “dog people” to highlight the connection between sampling and statistical significance, a term you encountered in our week on research design.  As we’ll see, the reasonableness of tests of statistical significance have recently been reinforced in a study on the replication of classic psychological research.  In a “Do It Yourself” exercise associated with this lesson, you will put sampling patterns to the test.  Your observational ground: a population of bears!

From Sampling to Statistical Significance

I’ll admit it: phrases like “sampling distribution” and “statistical significance” can seem scary.  But I think a lot of the scariness behind the phrases is cultural, by which I mean that we’ve been taught to think these ideas are scary, taught to think they’re hard.  But they don’t have to be.  It can be as easy as counting up a passel of pets.  The following video starts with material we’ve already covered (variables, operationalization, and direction of effect) and connects them to the ideas of sampling and statistical significance with a friendly example: the idea of “cat people” and “dog people”:

Now that you’re done with your short voyage to the imaginary Research Island, I hope you have a stronger sense of connection between the idea of a sample and the idea of statistical significance.  When tendencies regarding a variable emerge, such differences might be called statistically significant or statistically insignificant based on the probability of such a difference happening due to the kind of random variation that occurs when working with a sample of a certain size.  Social scientists call that probability “p,” and have agreed that a research result should be termed statistically significant when the probability of such a result occurring by chance alone is less than or equal to 5 percent.

Statistically significant differences are vanishingly (5% of the time or less) unlikely to have occurred due to random variation in the makeup of a sample.  They are, therefore, the effects most likely to reflect an actual trend in a broader population.

The Implications of Statistical Significance for Replication

 “Scientists Replicated 100 Psychology Studies, and Fewer Than Half Got the Same Results!”

So reads the headline of an article in Smithsonian magazine that describes an astonishingly ambitious paper published in the journal Science just last year (Open Science Collaboration 2015).  As this graph from the original Science article (Open Science Collaboration 2015: aac4716-6) shows, the Smithsonian headline isn’t exactly fair.  In the graph below, each dot represents a replicated study.  Each dot that is below the dotted line represents a positive original effect but a negative original effect; now that’s an overturning of expectations!

Figure 3 from Science journal article on reproducibility of results in replicated research.

Do you notice what those dots beneath the dotted line have in common?  That’s right: they’re all pink.  What do pink dots represent?  Original research findings that were deemed statistically insignificant due to a low value of “p.”  The green-tinted statistically significant results, on the other hand, were largely replicated.  This is another indication that statistical significance should be noticed when reviewing research results.  Those asterisks aren’t just jewelry; they indicate when the results are likely to be replicated by future research.

DIY Activity #4: Random Bears

For this DIY Activity, I would like you to describe a population, then engage in sampling of a population.  Why? So you can get your hands on data and experience for yourself what samples of different sizes tend to do.  Your population for this week is one of Teddy Bears.  Please complete the following steps:

1.  Make sure you have Microsoft Office installed on your computer, or visit a UMA or University College student computer lab to use a university computer with Microsoft Office installed on it if you do not have your own computer.  Because you are a UMA student, Microsoft Office is now available for you to install on your own computer for absolutely free.  To get yourself a free copy of Microsoft Office, visit this page and follow the instructions posted there.

2.  Log in to our course Blackboard page.  Looking to the left-hand side of the page, find and click on the link “Bear Data.”  Then click on the link “Bear Data for Sampling.” This should download a Microsoft Excel file to your system that contains data on the entire population of participants at a social network of people impersonating teddy bears (a defunct social media website called The Bear Club) as of August 31, 2014.

3.  Open the Excel file.  You will notice that the file contains data on 1,313 accounts with seven variables that are organized into seven columns:

  1. Bear: the id number of a teddy bear account
  2. username: the name of a teddy bear account
  3. # Blog Posts: the number of blog posts made by the account as of August 31, 2014
  4. clan: the online group (called a “clan”) to which a teddy bear account belongs
  5. # Comments Made: number of comments posted by the teddy bear account
  6. # Comments Received: number of comments posted by other accounts to this teddy bear account
  7. Betweenness Centrality: a social network measure indicating how much commentary from bear to bear passes through this teddy bear account

4.  By typing the command =AVERAGE(C2:C1314) into any empty cell in that spreadsheet, obtain the mean number of blog posts for the entire population of teddy bears at The Bear Club.  This number is also called the population average.

5.  Now type the command =RANDBETWEEN(2,1314) into another empty cell in the same spreadsheet.  This command will return a random number between 2 and 1314.  Every time you type this command, you will get a different random number between 2 and 1314.  Use this random number generator to randomly choose bears according to the bears’ row number in the Excel spreadsheet.

6.  Obtain 5 random samples of 2 bears each, and 5 random samples of 5 bears each.  Using the =AVERAGE() command, calculate the average number of blog posts within each sample (you can also do this by hand or by using a calculator if you are uncomfortable with using the =AVERAGE() command, but however you calculate averages, you must share your results).  Then determine how far away from the population average each sample average is.

7.  Draw a conclusion: do the sample means for random samples of 2 bears each tend to be closer to the population average or farther away from the population average than sample means for random samples of 5 bears?

Finally, report the results in a 1 page document you upload in the “DIY Activities” area on our course Blackboard page labeled “DIY Activity #4: Random Bears.”

This walkthrough video demonstrates the essential method for randomly sampling from a population of social media bear profiles using Microsoft Excel.  As you can see, the commands =AVERAGE and =RANDBETWEEN are all that’s required.  If you’re still not sure about exactly what to do, print out the instructions, watch this video and compare the instructions to what you see on the screen:

This DIY Activity is due by October 8.

Class Research Project, Commence! Your First Task…

Finally, this is the week in which we begin our class research project, a project that we will bring from the first stages of the research all the way to completion over the course of the semester.  In this political season, we’ll be considering a frequent sight along Maine roads: the campaign sign.  The first part of our class research project is due by the end of the day on October 15, and is as follows:

To complete the first part of the project, log in to our class Blackboard page, visit the link “Class Research Project,” and consider the campaign signs you will find there. All of the signs there are actual campaign signs for actual candidates for legislative office in the state of Maine in the fall of 2016:

As you have reviewed these signs, think about what characteristics they have that vary.  In other words, I would like you to identify variables that characterize campaign signs.

To receive credit for participation in this part of the class research project, complete the following steps:

1. Name three variables that characterize these campaign signs.

2. Provide a good conceptual definition of each of the three variables.

3. Provide a good operational definition of each of the three variables.

4. Describe the level of measurement of each of the three variables.

5. Upload your work by logging in to our class Blackboard page, visiting the link “Class Research Project,” then clicking on the link in the title “Class Research Project Part I” to find the page on which you may upload your work.

The work you submit should be carried out independently from other students, without their assistance.

I look forward to finding out what you see in these signs!


Open Science Collaboration. 2015. “Estimating the reproducibility of psychological science.” Science 349(6251): aac4716.