In Discover unreasonably great research! And exploit it, I introduced the concept of Unreasonably Great Research, characterized it briefly via a framework, and offered two examples.

This is the second note in a framework-defining series of notes exploring elements of that framework. This one is focused on the science of surveys in general.

In a nutshell

Research should advance our objective knowledge.  

    • It should have a theory that it’s testing. That’s the hypothesis. The test determines the degree to which the hypothesis is right or not. The research should be based on a careful review of earlier work (a literature review) and aim to either retest, refine, further generalize, narrow, or reject an older hypothesis, or advance a new one.
    • Surveys with no stated hypothesis to test usually have implicit hypotheses hidden in their survey questions and exposed in the results writeup. Treat their results as pre-tests or exploratory work, a good way to discover specific hypotheses to test in more rigorous work.
    • Survey-based research usually relates to a specific slice of humanity, called the target population (e.g., all marketing managers in US-based e-commerce firms, all drivers of big-rig trucks in Canada, or all adults allergic to peanuts.) To minimize cost and time required to test the hypothesis, researchers target a smaller group of people they’ll invite to take the researcher’s survey. That group is called the sample
    • The sample should represent the larger target population. Mapping data from the sample to the target population is often difficult because samples can have defects. They might not resemble the target population in a variety of ways. That’s sampling bias or sampling error.

This note will help you evaluate the quality and impact of the survey-based research results you’re reading. When done, you should be able to better:

    • Begin to weigh the value of the results and their potential impact on your actions
    • Evaluate the research’s hypotheses and prior work on which it’s based.
    • Distinguish between scientific and unscientific surveys
    • Discover key sampling biases that devalue survey findings
    • Identify systemic errors that may distort the survey’s results
    • Root out sloppy designs that use data dredging techniques

As I did in the first note in this series, Organizations and people in the “unreasonably great research framework,” I’ll start with the final questions you should ask to assess the scientific foundations of the research claims you’re flooded with, what answers to look for (and why), and some conclusions you should reach based on the answers (or non-answers) you receive.

More notes in this series are coming. They are a prelude to a report card to aid you in judging the quality of the numbers and recommendations with which you’re flooded. The flood comes from the media, vendors, consultants, analysts, peers in your firm and industry, academics, and other authors.

Reference Framework

Unusually great research is:

  • Honest and transparent, scientific, well-designed and executed, based on very large-scale surveys using validated methods, sampling data from a majority of enterprises in the nation, and passing the rigorous tests typically applied to scholarly work (including but not limited to independent peer review.)
  • Contextually framed, starting with extensive literature reviews in economics and other social sciences, encapsulating the history and prior art related to the specific area under study.
  • Not bound by context or conventional wisdom, it often deviates from or redirects earlier findings, conclusions, and beliefs.

Research that fits well into this framework isn’t perfect. Of course not. But it’s phenomenally better than most of what’s usually pumped out by the technology-industry-complex.

Contents

    • Introduction
    • Question shortlist
    • Question “answer key”
    • Details behind questions
    • What’s next

Introduction

I’ll spare you a lot of pain.

This line of research is a very brief introduction, not a complete review of survey research methods. That’s a vast and valuable domain, well beyond my intentions. There are some great references if you want to learn more, ranging from a guide for academic reviewers to a college textbook[1]. This paper covers quantitative survey-based research. Qualitative research ((defined and contrasted with quantitative research here) is an important discipline, but it’s outside the scope of this work.

Question shortlist

Expectation setting: Some people will answer some of the questions posed in this series of framework-defining notes. Do not expect most to answer these questions. These are questions for you to consider, and this should be an essential part of your due diligence efforts.

Read through this shortlist. Following it, I’ve provided a short “answer key” and then more explanations of the questions’ terms and reasoning.

    1. Historical context (literature review) and researcher’s hypotheses

Did the research begin with a review of the existing body of literature, highlighting the process by which the researcher came up with their hypothesis to test? Did the researchers expose a specific hypothesis they designed the survey to test? Or was the survey created as a ‘fishing expedition,’ trying to see if there were any interesting and significant differences to note?

    1. Target population and sample selection

Was the target population clearly defined beforehand?

Did the researchers use a random selection process to select people to invite to take the survey? What randomization technique did they use?

How well did the sample represent the target population? Which segments of the target population were over or under-represented in the sample?

    1. Sample sourcing

How well qualified was the invitee selection process? Was it based on a list of email addresses? A previously recruited panel? How recently were the names and email addresses polled to update the panel members’ qualifications? (E.g., people change jobs from time to time acquire new skills.) How often was the email address list or panel used in other studies? When was the list or panel first put into service?

    1. Sample effort and motivation

How much time did it take an average individual to complete the survey?

What rewards were invitees offered rewards for their survey participation? if they were sourced via a panel, were they compensated for every earlier survey they participated in on that panel?

    1. Response rates

What proportion of the invitees agreed to participate in the survey? What percentage of those who started surveys completed them?

Were response and completion rates within expectations of the researchers? Were any adjustments made based on observed response and complete rates?

    1. Sample screening

What pre-screening was done in the survey process? For example, did the survey start with a question asking the respondent if they were familiar with a particular topic like quantum computing?

Were completed surveys removed from the data set after the surveys data was collected? Under what conditions?

    1. Results weighting

Samples are never perfect. Researchers will try to mathematically compensate for differences between the sample and target population on dimensions such as demographics. Did they do that? If so, do the differences appear to have been caused by errors in sample selection or just random variation?

    1. Peer review

Was the research reviewed by independent experts whose work is related to the research in hand? Are any of their comments available for you to read?

Question “Answer Key”

    1. Historical context (literature review) and researcher’s hypotheses

Hypothesis testing and literature review are generally missing outside the highest quality academic research. If there is no review as a foundation, this potentially speaks poorly to the researcher’s motive. Absent a literature review, look closely for signs of data dredging, the process of running repeated statistical tests to see if any will produce a nominally significant result. This technique (discussed further below) violates the assumptions underlying many of the standard statistical tests. Data dredging is generally a failure mode.

    1. Target population and sample selection

Sample selection is often not random (probabilistic), hence, it’s not “scientific” and further probing will often turn up an unbalanced analysis of various population segments. For example, inviting all subscribers to a digital magazine only maps onto the attributes of all subscribers to that magazine, not a larger population. Most survey-based research touted by consulting firms, analysts, vendors, and other members of the technology-industry-complex does not map well to the claimed target population.

    1. Sample sourcing

Panels present many problems, and email lists are not great either, but they typically have fewer problems.

    1. Sample effort and motivation

People paid to respond to multiple surveys are far more likely to provide unreliable information.

    1. Response rates

The proportion of invitees that responded positively to the survey invitation and the proportion that completes the survey are important to consider.  In the next note in this series, I’ll cover more details related to the design of the survey instrument (the questionnaire, interview script, or web-based tool.) Poor design and execution can lead to cognitive confusion, respondent fatigue, and inconsistent data.

    1. Sample screening

Most studies use pre-screeners to improve targeting a preferred set of respondents. Pre-screeners create big problems when they encourage incorrect generalizations to the broader market.

    1. Results weighting

Weighting doesn’t convert a sample of convenience to a random sample.

    1. Peer review

Peer review is not perfect. Nonetheless, it’s essential to have multiple levels of peer review including, for example, (a)  the researcher’s close peers, (b) independent subject matter experts outside the researcher’s organizational context, and (c) full transparency to attract even broader review. Arguably, (a) and (b) are essential and (c) is good to have. None of these assertions are absolute.

More Details behind the questions

Data dredging 

Data dredging is the act of continuing to fish through data to find any “significant results.” Research that presents results created via data dredging should be ignored. It’s a misuse of data analysis tools to find patterns in data that can be presented as statistically significant, dramatically increasing and understating the risk of false positives.

Data dredging is also known as p-hacking, snooping, fishing, significance-chasing, and double-dipping. It’s “trying multiple things until you get the desired result,” even unconsciously. It may be the first statistical term to rate a definition in the online Urban Dictionary, where the usage examples are telling: “That finding seems to have been obtained through p-hacking, the authors dropped one of the conditions so that the overall p-value would be less than .05”, and “She is a p-hacker, she always monitors data while it is being collected.”

There’s more to it than performing many statistical tests on the data and only reporting those that come back with significant results, but that’s a good starting point in an exploration of a survey-based report.” Source: Regina Nuzzo, Nature, 2014

Scientific versus unscientific samples 

    • Probabilistic samples (also known as scientific samples) use a random selection process to recruit people (aka respondents) to take a survey.
    • Samples of convenience (unscientific samples, aka “opt-in” or “walk-up” surveys) avoid the cost of creating an unbiased random sample. For example, a university sociology professor might use students in her current classes to represent the attitudes of all teenagers in the country. Or a magazine editor might recruit respondents from his magazine’s subscriber list for a study of satisfaction with the metaverse experience. Both cases would be unscientific samples of convenience that cannot ordinarily be used to project anything about the larger target population.
    • Polling and other market research firms recruit people to respond to multiple surveys. They solicit and organize them into groups or panels based on specific attributes, such as branch managers in consumer banking, rookie product managers in enterprise software, and retired truck drivers. Panels sometimes look like probabilistic samples – it depends on the definition of the target population – but those looks may be deceivingOnline panels may be unrepresentative subsets of the target population. Some panels are constructed using random sampling methods, but they require follow-up to measure the degree to which the panel properly represents the target population.
    • Scientific samples can turn into unscientific samples if there are significant differences between responders and non-responders or people who abandon the survey part way through versus those who complete it.

Sampling biases and other sampling related errors

These are biases or errors in selecting people to invite to participate in a survey. Sampling biases reduce or reflect the reduction in alignment between the sample and the target population. (Cognitive biases are different. I’ll cover those biases in the upcoming note on survey design and execution.) 

    • If the sample significantly underrepresents specific segments of the population, then the statistical power of the conclusions is diminished considerably. Overrepresentation of some segments means other segments have been underrepresented. Overweighting an underrepresented segment can’t completely compensate for the imbalance.
    • If recruiting has completely excluded essential parts of the target population – whether intentionally or not – then the sample doesn’t map to the target population, and it’s at best a survey of convenience. It does not generalize to the target population.
    • Most technology-related surveys use pre-survey questions (screeners) to disqualify some people from the survey. For example, surveys related to AI adoption and investment often disqualify people who indicate they don’t know what AI is or aren’t familiar with their firm’s usage of AI.
    • Screeners can turn a scientific sample into an unscientific sample. They’re most often used when studying early-stage technology, and they often result in gross misrepresentation of the larger population.
    • Screeners are a significant source of selection bias.

Other errors 

Survey process dropouts: Low survey response rates or high incomplete rates are a warning of potential problems with:

    • The invite sent to prospective respondents
    • The structure or content of the survey itself (the survey instrument)
    • The administration of the survey (for example, interviewer skills and attitudes or tools)

Researchers may compensate for survey process dropouts by expanding the number of invites sent out, but that begs for more work to determine what underlying issues drove the observed results and whether the dropouts distorted the sample to population alignment.

What’s next?

There are other issues related to the science of survey work that I’m going to defer that to the upcoming note on survey design and execution.

(c) 2022 Tom Austin, All rights reserved

 

[1] For reference purposes: