lions and tigers and academic writing…oh my

MacBook Pro on top of brown table


Here’s a familiar story: A scientist sits down to write a manuscript and, rather than creating a crisp and coherent presentation of information, meanders aimlessly for several pages instead. This story is a tragedy.

How can we avoid such tragic situations? Try thinking of your manuscripts like a great novel or a blockbuster movie.

To demonstrate what I mean, let’s look at one specific type of story structure–the 3-act structure–and how it can be applied to a generic scientific manuscript.

A brief overview of the 3-act structure

The 3-act structure is the most common plot structure for stories. This structure not only has a beginning, middle, and end, but also has a familiar cadence and predictability in the events that occur during these different stages. This structure is battle-tested. It is rhythmic. And, most relevant to this post, this structure is a familiar framework onto which you can hang your scientific arguments.

Act I, the exposition: This is the part of a story where the settings and the characters are established. It situates the reader in the world in which the story will unfold. It familiarizes the reader with the typical state of affairs. And it gives the reader a reason to care about the fate of your main characters. 

After the lay of the land is established, Act I ends with an “inciting incident.” The inciting incident upends the typical state of affairs, disrupts what has just been established, and heaves the story in an unanticipated direction. This incident could be a positive event such as a desperately poor character winning the lottery or a catastrophe such as a boat wreck that leaves the characters stranded on an uninhabited island. The important thing is that after the inciting incident occurs, the typical state of affairs is disrupted and the protagonist is unwillingly propelled into action. They now must address the changes that have been initiated by the inciting incident. 

Notice that an inciting incident would not have any weight if the reader did not first care about the protagonist or if it did not cause a jarring shift from the exposition. The inciting incident creates a problem that must be solved. 

Act II, rising action: The protagonist understands the problem before them and establishes a goal to solve it. This part of the story typically involves roadblocks and setbacks–an easily-solved problem would not make for a compelling story. During this act, the protagonist’s life is further and further disrupted as they desperately try to solve the problem initiated by the inciting incident. These roadblocks provide tension in the story. The plot sharpens focus, the stakes are ratcheted higher and higher with each setback, and the protagonist must achieve their goal or else all will be lost. 

The plot barrels down the road towards a high-stakes scene where the protagonist will finally have an opportunity to solve the problem initiated by the inciting incident once and for all.

Act III, the resolution: This act includes the climax–the make-or-break scene where the protagonist lays it all on the line and will either achieve their goal or not. Think of the big fight scene of pretty much any action film. Or the scene where an ex-lover busts open the church doors and disrupt a wedding ceremony in a last-ditch effort to win back their ex-lover. Or a perilous escape on a rickety raft from the island? Is the fight won or lost? Does the guy get the girl? Are the castaways rescued? The climax is a compact, focused scene, where the problem initiated by the inciting incident comes to a head. 

Act III also includes the denouement: The post-climax scene where the story’s theme is reiterated and any loose ends are tied up. 

Applying this to academic writing

Now let’s look at how the 3-act structure can be applied to academic writing. To do so, I will present the plot to The Wizard of Oz alongside a hypothetical replication study. 

Act I: Set the scene and get the reader to care about the topic. Introduce the inciting incident. 

The Wizard of Oz: Dorothy lives a lonely life on a farm in Kansas and she feels unappreciated by her family. A tornado carries her away from her home and she lands in Oz. 

Hypothetical Replication Study: Present the strongest argument for a previously-published effect and why the reader should care about this effect. Then, as strongly as possible, introduce some motivation for conducting a replication. 

The power of both of these examples comes from the juxtaposition of the first sentence with the second. There should be a contrast. An incompatibility. A problem that must be solved. 

Landing in Oz greatly disrupts Dorothy’s life. She is now forced to confront the consequences of this incident. Similarly, the motivation for conducting the replication must be compelling enough that one cannot look at the previously-published effect the same way without some sort of resolution.

Without a stark contrast, the impetus will not be apparent to the reader. The impact of the inciting incident on the exposition makes it clear what action must be taken. The reader is intrigued.  

Act II: Solving the problem caused by the inciting incident 

The Wizard of Oz: Dorothy must get to the Emerald City, period. Everything else in her life takes a backseat to this goal. Along the way, she encounters challenges: Trees throw apples at her; there are lions and tigers and bears, oh my!; and a field of poppies puts her to sleep. Despite these twists and turns, she keeps plodding towards her goal of reaching the Emerald City. But even reaching the Emerald City turns out not to be the end. There is one more challenge. Dorothy must go to the Witch’s castle, get past the flying monkeys, and confront the Witch. The whole story has been leading to this point. Tensions are sky-high. The viewer knows the confrontation with the Witch is the make-or-break moment for Dorothy. 

Hypothetical Replication Study: In a replication study, the challenges and roadblocks are the methods. If an effect exists, then it must overcome several obstacles the researcher puts in its path. These roadblocks take the form of double-blinding, attention checks, high statistical power, accounting for as many lurking confounds as possible, precisely executing the methods, etc. And if that’s not enough, the researcher throws down preregistrations, open data, sensitivity analyses, and flying monkeys (maybe not the flying monkeys). The goal is to sharpen the focus of the study, maximize the diagnosticity of the methods, and set up the high-stakes presentation of the results. As with The Wizard of Oz, the reader feels that everything has been leading up to this point. What will the results reveal?

Act III: The climax and tying up loose ends. 

The Wizard of Oz. Dorothy defeats the Witch (“I’m melting…melting”) and finds a way home (“There’s no place like home, there’s no place like home, there’s no place like home”). Dorothy overcomes the challenges put forth by the inciting incident. Also, all the loose ends are tied up: The Tin Man gets a heart, the Cowardly Lion gets a medal for bravery, the Scarecrow gets a diploma, and Dorothy gains a new appreciation for her family. 

Hypothetical Replication Study. It’s all been leading to this, the big scene. What were the results? What are the implications? The author should create a beautiful figure to display the most important finding. The exciting part of science is the results are not predetermined: In science, the guy does not always get the girl, the good guys do not always defeat the bad guy, etc. The data are what they are. 

In the Discussion, the author reiterates the theme of the study and ties up loose ends. Was there anything unexpected? Does the author want to tease a sequel? 

Wrapping up

Writing the overall flow of information in a scientific manuscript is hard. But framing this process in familiar terms may help think about how this information could be best presented. If you’re having a hard time thinking of the overall plot of your academic manuscripts, try thinking of the 3-act structure. Your next academic manuscript might just be a page-turner. 


self-editing your writing

If you have a draft of your manuscript, congratulations! You have slain the “monster of blank pages,” the fearsome beast that intimidates even the most capable of writers. I know you are tired, perhaps even exhausted. Rest, for now, but do not stop here. Your hero’s journey is not complete, you are merely at the beginning of the next stage of your saga. 

You must now turn your attention towards editing your draft–the gentle sanding process that makes your final paper smooth to the touch. 

One way of editing your final paper is to ask another reader for feedback. However, prematurely asking another person to look over your first draft is a sure-fire way to receive feedback about minor grammatical errors, misspellings, and trifles that you could have identified yourself had you put in the effort. Such readers get mired in muck and will be unable to fully provide the feedback you are seeking. Beyond inefficiency, this basic editing work is the responsibility of the author. Thus, sending your draft to a reader too soon is offloading the hard work of editing–your work–onto another person. 

Before sending your manuscript to another person, it is your duty to edit, edit, edit. Unfortunately, editing your own writing is hard. And doing it well is even harder.

Where do you begin? This is a difficult question to answer for novice writers. And sadly, I, a relatively more experienced writer, cannot fully give you an answer. What I can offer is a glimpse into my self-editing process. I sincerely hope this pulling back of the curtain allows you to benefit from my experiences and may give you ideas on how you can move forward. 

Here is my self-editing process. 

  1. Give it time. If possible, complete your draft and let it sit. Watch a movie. Go about your weekend. The longer the better. Then you can revisit your draft with fresh eyes. In your first rereading of your manuscript, focus on the “big picture”: Is the major organization of information logical? Does one section naturally lead to the other? Does each paragraph within those sections naturally lead to the next? Perhaps there are gaping holes that must be filled or tangents that must be cut. In your first pass, focus on the forest, not the trees. Do not get bogged down in the minutiae of editing sentences or phrases that may be axed. 

Once your sections are in order, and the paragraphs within those sections are also in order, I then read for flow. I look for long sentences that can be broken into shorter, punchier sentences. I add transition words such as “however,” “consequently,” “nevertheless,” “moreover,” “accordingly,” etc. These transition words let the reader know how the different parts of your manuscript are logically related to one another. It holds the reader by the hand and guides them through your manuscript.

This first stage is all about kneading your manuscript until it takes the shape you want. It is not uncommon to delete or cut-and-paste entire paragraphs (or sections!) during this stage. You may need to write, step away, and revisit your writing several times before moving on. Yes, this step takes time. 

Now set down your ax and grab a chisel, it is time to move in closer. For me, I tackle the mechanical parts of writing next. 

2. Separate editing tasks. Take several passes through your manuscript only focusing on one thing at a time. First, look for quick word swaps: Swap out “in order to” with “to”; “whether or not” with “whether”; and “due to the fact that” with “because.” Second, look for intensifiers such as “very” and “really.” Most of the time these intensifiers can merely be deleted. Other times, you need to find a stronger adjective to say what you really mean. Third, search for the word “that” and think about whether each one is necessary. They often are not. Fourth, search for implied words. For example, “the participants were in the process of completing” can be more concisely written as “the participants were completing.” Fifth, search for places where you have multiple adjectives and replace those with one better-suited adjective. 

This step is not about the flow of your paper, it is about clearing away the clutter. It is not uncommon to snip and shave off several lines, one word at a time, from a manuscript during this stage. Remember, take several passes through your manuscript while focusing on one thing at a time in each pass. Hence, this step also takes time. A lot of time. But the result is a well-groomed manuscript. 

3. Use a spelling and grammar checker. Find the squiggly lines in your manuscript. Think about what is being suggested. Understand why a word or phrase got flagged. But do NOT blindly accept the recommended changes because these programs are not always correct (computers are dumb). Address each place in your writing that has been flagged. 

Next, change it up. Try to look at your own writing as a reader would.

4. Read your paper aloud. Your ear will let you know when your writing sounds off. Really pay attention to places where your speaking stumbles because this is an indication your writing needs a tweak. Do this until your writing flows. 

5. Listen. Microsoft Word has a “read-aloud” function. Have the document read to you at least once. Seriously. Close your eyes and listen to your paper. Again, let your ear tell you when a phrase sounds off. 

6. Go Old School. Print the document and go through it with highlighters and colored pens. These are not anachronistic office supplies, these are the tools of successful writers. Holding your paper between your fingers allows you to really notice things that your mind ignores when you are staring at a screen. 

7. Plant a tree to repent for Step 6. 

8. Incorporate changes. Re-open your manuscript on a computer and incorporate all your changes from Step 6. 

9. Change the aesthetics of your manuscript. Make your font two sizes larger. Change the font. Change the color of the font. Go to a new environment. If you write in 12-point Times New Roman black font from your bedroom, try reading your manuscript in 14-point Garamond in navy blue font while you sit at Starbucks. Changing the aesthetics of a familiar manuscript will call attention to the mistakes that are hiding in plain sight. 

10. Reread. Change your manuscript back to the original formatting. Reread your manuscript. 

11. Repeat. Repeat steps 1-10 again. Seriously. 

Once you’ve completed these steps, you are ready to share your writing with others.

A PowerPoint presentation is available here. Feel free to use, share, or modify this presentation any way you want.

the publication recipe

From 8 o’clock in the morning until 5 o’clock in the evening an undergrad research assistant would sit behind the angular metal desk inside of the door of the Harris Aggression Lab. Their job was basically that of a human turnstile: They shepherded the parade of participants through the procedures of the lab’s latest study all day long and ensured the flow of participants coming out don’t bump into the flow of participants going in. To maximize efficiency, the schedule was staggered so that as soon as one of the eight booths was vacated, another participant could fill the void. Like a finely-tuned V8 engine, participants went in-and-out and in-and-out and in-and-out of the eight booths at a perfectly staggered pace. 

The lab’s days were filled with the sound of data being collected–the swooshing sound of the door opening and closing sandwiched between the undergrad research assistant’s friendly-sounding “hello” and a “goodbye.” Thousands of participants would make their way through the Harris Aggression Lab over the course of an academic year going through more-or-less the same routine. Hello. Swoosh. Swoosh. Goodbye. Sometimes two at a time. Hello. Hello. Swoosh. Swoosh. Swoosh. Swoosh. Goodbye. Goodbye. 

These participants would press thousands of buttons on the keyboard and leave behind thousands of rows and columns of numbers that would go into the ether of the lab’s computer system. And these thousands of rows and columns of data were the raw materials the graduate students would use to manufacture the lab’s scientific publications.

Henry Ford would be proud at the efficiency of this operation. 

“Numbers in, publications out,” Bear would flippantly hum as he hammered out the draft of the lab’s next manuscript. Writing up the lab’s publications became his specialty. It was another one of those things that most people found complex, but he found it to actually be quite simple. He also took a sadistic pleasure in how he was able to pass off a few hour’s of mundane writing as serious scholarship, like he was pulling one over on the people who actually took the process of publishing academic articles seriously. Sometimes he felt bad about it, his not taking the publication process serious and all, but never enough to stop him. The peer reviewers and editors were all other academics who probably got into psychology because they thought they were going to be nobly doing science, but they ended up with a crummy job in some small public university they’d never heard of before and sitting in meetings all day. They’d probably get gripes from their students and their department chairs all day and then go home and get it from their spouse. They’d probably be flattered to be invited to peer review an article because it made them feel like they were still doing science and they could say things that seemed to matter. And now Bear was poking fun in his own way through the manuscripts coming out of the Harris Aggression Lab, even though they didn’t know that Bear was poking fun. He just couldn’t help it, the not taking the publication process seriously that is. 

“Do you cook?” Bear asked Cal on one of their long walks along the Mississippi River front. “Because writing these publications is like following a recipe. Once you figure out the recipe, it becomes easy. And I’m not talking about baking bread from scratch either. I’m talking about starting with sliced bread and making toast.” Cal didn’t mind that Bear didn’t care too much about his work, but it really bothered him how brazen and explicit he was about it. At least when Bear was quiet Cal could ignore him. 

Academic articles had a predictable format of Introduction, Methods, Results, and Discussion sections, which the lab acronymized as IMRaD. Boy, they loved their acronyms as much as they liked their military jargon. 

Sometimes they would use the acronym as a verb: “Do you even IMRaD, bro?” And sometimes they used it as a noun. “You start with the IMRaD,” Bear would explain like he was revealing the secret behind how a card trick is done. Even further than the basic IMRaD structure, the Harris Aggression Lab had amazingly similar content within each of these sections so that it became, well, like a recipe that one had to follow, step-by-step, to bake up the next publication. “And then you just add in the information that you know the peer reviewers and the editor is looking for and nothing else. When you do this,” Bear would explain, “you realize that 90% of the manuscript is just filler words. You can write it in your sleep. You just have to focus on 10% of the stuff that really matters.” A thing as complex as an academic article came easily to him, so he talked as if it came similarly as easy to others. “In the Introduction you want to include some sort of ‘real world’ anecdote so it seems like the study is not merely an academic exercise. You know, mention a school shooting or a crime or something to grab the reader’s’ attention. Then they think the article’s real important and all. In the Methods you write something along the lines of the driving simulator analogy. In the Results you want to make a really nice figure. The peer reviewers will mostly just look at the figure anyways, they don’t check the numbers, so take your time to make the figure extra nice. And in the Discussion you want to add a limitation, such as ‘this was a sample of college students’, and say that ‘more research is needed.’ You know, a little blah blah so that nobody can claim that you are overgeneralizing your results and all. It’ll come easy-peesy once you’ve gone through this process a few times.” 

That was it. The big secret. Follow those steps and the manuscript will fly through the peer-review process and your CV grows one publication longer. Or you end up with toast. Or something. 

“Not to belabor the cooking analogy,” Bear started up again, “but it’s sort of like a health inspector saying that a kitchen is clean based off of how good the food looks. They don’t really want to check the equipment and they don’t really want to check the expiration dates on the ingredients. They just want to ensure the food coming out of the kitchen looks pretty.” 

Cal had a puzzled look on his face. It was an involuntary look of bewilderment. If he was quicker he would have just agreed with whatever Bear was saying just to get him to stop talking. 

“Because the health inspector also is a restaurant owner.” Bear continued to clarify. “If they judge the other restaurant owners easily, then the other owners will judge them easily and they can all keep cooking as fast as they want. It’s a big ol’ food cooking party. And the best part is, that unlike real restaurants, there are no customers. Everybody can make as much food as you can. And then all the restaurant owners can get together once in a while and all enjoy some back-slapping fun over a few beers as they toast about how good their food looks.” As pleased as Cal was that Bear seemed to be passionate about something, Cal really wished Bear would stop talking now. Cal just didn’t want to hear that science was a paint-by-the-numbers activity. 

As much as you want to believe there is more of an art to the writing process, or that the peer-review process is a better quality filter, it was impossible to argue with results. The Harris Aggression Lab flatout knew how to publish scientific articles at a speedy rate. And when the lab’s reputation and livelihood is dependent on the quantity of publications, it seems rational to build the lab’s operation on meeting their needed quota of publications. 

“There’s more to it than that though, right?” Cal asked Bear hoping that Bear knowingly left out the parts about how doing good science is a purposeful endeavor.

“Sorry, but no. This is what we get paid to do: Turn numbers into publications.”

It was an impersonal response. And perhaps it was too brutally honest for Cal to process at the moment. Cal wanted the numbers to actually to be meaningful because that would make their work meaningful and that would mean that he wasn’t just another drone in a factory churning out widgets for “the man.” But it wasn’t. Their work as graduate students was literally just turning rows and columns into scientific publications. It was so far detached from the reality of what they claimed to be studying that it didn’t matter whether those numbers represented aggression or love or hate or the yield of corn per acre in the fields that surrounded Bridgeport. 

“That doesn’t sound very satisfying. That doesn’t sound like science, it sounds like factory work.”

The more Cal thought about it, the worse it got. At least people working in a factory know that they’re just making widgets. There’s an honesty to that work. If researchers are just doing factor work too, then they are also delusional because they don’t even know they’re doing factory work. 

“Again with this soul-nourishing crap,” Bear let out an exasperated exhale at his idealistic friend. “It must be exhausting to constantly be disappointed. Look. I hate to be the one to tell you, but we are basically doing factory work here.” 

Once published, the Harris Aggression Lab would take the extra step of making hashtags for each of their studies, creating figures specifically formatted for sharing on social media, and oftentimes preparing short videos where Dr. Harris himself would give the 30-second summary of the study (which always appeared with a big green “Donate Now! button that would take you to the donation portal on lab’s website). 

The lab also applied their research skills into monitoring the amount of social media likes they got with their posts. They tinkered with how to word things to get maximum impact, which times of the day were best to post new studies, etc. All this information about social media impact was itself compiled into spreadsheets, those numbers would be crunched, and the lab obsessed about business-sounding jargon such as “impact metrics” and “market reach” and “engagement ratios.” The lab also obsessed about finding new metrics to obsess over. To an outsider, it may be hard to tell whether this was a lab that studied aggression or if this was a lab that studied how to effectively market aggression research. The lab was not merely the manufacturers of their product, they also were their own marketing department. And more than most labs, there was a fairly direct link between the lab’s online presence and their ability to bring in funding. 

Another consequence of the emphasis on social media marketing was that as the summary of the articles got more concise and more “shareable,” which is what they called their formatting of the articles for social media consumption, the more the nuance got lost and the sharper the take-home message became. Real data can be messy. Studies don’t always work out perfectly. Scientific publications are filled with nuance and qualifications and phrases the researchers use to hedge their certainty. Although scientifically responsible, the hair-splitting nuance of scientific publications makes the take-home messages wordy, limp, and not punchy. 

The nuance was sanded out of the lab’s videos and social media posts with clever omissions and lawyerly wording that was technically true but obscured any results that were not quite “on brand.” The end result was a highly-confident final product that was devoid of any qualifications or hesitation. The social media posts were filled with definitive and confident action verbs. The take-home from the studies was boiled down to information that people could digest in the small windows of time they used for browsing the information in their social media feed. The information was made bite-sized and ready to reshare or retweet. A click of the button was all it took to amplify the lab’s latest study from the Harris Aggression Lab demonstrating that violent media causes aggressive behaviors.

a first cup of coffee

Cal laid on a pile of blankets in the middle of his boxy studio apartment. He swiped the face of his phone and the glow from the screen pierced the darkness: 5:12 AM. He checked a weather app: Rain. Perfect, he loved rainy days.

Today was the start of a new chapter in his life. He’d received his bachelor’s degree in psychology a few weeks earlier, moved to Bridgeport yesterday afternoon, and would be starting a graduate program in Social Psychology at Wisconsin State University in August. His wide-open eyes stared into the blackness of his apartment as he fantasized of using a “Dr.” and a “Ph.D.” as bookends to his otherwise plain name. He was giddy that in a few years he would be Dr. Calvin Olson, Ph.D.

Cal flipped a switch and the apartment lit up. It was still dark outside, so the glass on his curtainless windows became mirrors that stared back at him. He slowly rotated his head from left to right as he surveyed his apartment: No furniture, no bed, no television; just six cardboard boxes packed with clothes, a laundry basket full of books, and the bivouac of blankets and pillows he’d slept on the night before. He didn’t even attempt to put his things away when he moved in yesterday, he just neatly lined up boxes along the wall, which was a proactive step to prolong how long he could live without unpacking. He was in Bridgeport to devote himself to science, at least that’s what he told himself, which was a vague enough excuse to avoid doing anything he didn’t want to, such as unpack. He also liked the stereotype of the monastic graduate student who sacrificed material comforts in a single-minded pursuit of his passion. So his stuff would remain in the moving boxes both as a rationalization for procrastinating and because he eagerly embraced the role of an overly-devoted graduate student.

Cal broke the silence of his empty apartment with a primal yawn that started as a groan and ended as a grunt. After years of living with roommates, it felt liberating, almost taboo, to make noise at this early hour. So he exercised his new freedom by talking to himself, which, ironically, meant he was talking merely because there was nobody to hear what he was saying.

“Let’s get cleaned up,” he cheerily said to the empty apartment. After showering, he slowly ran his hand across a 3-day beard. “To shave or not to shave,” he said to the man in the mirror. “Nope,” the man answered. A little scruff is more fitting with the “starving graduate student” look he was going for.

Cal picked up the half-read book that was written by his soon-to-be advisor, Violent Media by Dr. Jack Harris, and threw it into his backpack. Just looking at this book roused a fluttering sense of inspiration that tickled his belly. He also tossed a new moleskin notebook into his backpack, which was anachronistic way to track his thoughts amongst the techy university crowds that preferred digital note-taking, but Cal joked that he wanted to document his scientific career in a manner that would be preserved for eternity like the da Vinci codices. Finally, he tugged his phone charger out of the wall, shoved it into his backpack as he left his apartment.

“Today’s gonna be a great day.” He wasn’t sure if he actually said that aloud or if he merely thought it as he gently closed his apartment door.

The rain pounded the sidewalk as he rounded his shoulders and shortened his stride so he could hide himself under the canopy of his umbrella. The chimes from the campus clock tower reverberated throughout the valley: Gong! Gong! Gong! Gong! Gong! Gong! 6 AM. The raindrops firmly pelted Cal’s umbrella as he sloshed his way through downtown Bridgeport: Dat! Dat! Dat!

Just around the corner from Cal’s apartment was a lighted storefront in an otherwise dusky downtown Bridgeport. He peered through the rain to steady his course towards a blinking black-and-red neon sign like a ship using a lighthouse to navigate stormy waters. “Just head towards the light,” he reminded himself as he tipped his umbrella into the wind. The sign in the window read The Grind.

The Grind was a small cafe that sat on the eastern edge of campus, which itself sat on the eastern bank of the Mississippi River. The brick walls were filled with shelves of old books of no particular genre and in no particular order. WSU paraphernalia and black-and-white photos of campus peppered the walls as if to imply this place has existed as long as the university itself. The drinks were served in kitschy coffee cups that were collected at yard sales over the years. It was the type of place where strangers felt familiar and the customers shared their coffee with the ghosts of the alumni’s youth.

This coffee shop has the circadian rhythm of a spry old man; it is quiet when the sun is rising and setting and lively during the day. It was confident with its identity and was comfortable merely remaining the same within a quickly-changing world rather than chasing fads.

The mornings are filled with early-risers enjoying a cup of coffee. A handful of professors and graduate students getting in a few peaceful hours of work before the rest of the world awakens. Eventually, as the sun rises completely above the eastern bluffs that overlook Bridgeport, the trickle of customers slowly becomes a non-stop parade. The sounds of individuals ordering a morning cup of coffee gradually intensifies into a crescendo of the indistinct din of a small crowd. The caffeine and the bustle creates a hubbub of energy that lasts throughout the day. The air is filled with the clickety-clack of fingers striking keyboards and the yackety-yack of conversation between friends. There is a non-stop ballet of customers and baristas exchanging money and coffee. Then, as the sun gets low and the shadows of the WSU clocktower get long, the pace inside The Grind slows again. The evening customers typically consist of friends who prefer the quiet cafe atmosphere to the loudness of the local bars and an easily-overlooked student who is in the early stages of an all-nighter. Eventually the lights of the cafe goes dark until the next morning when it starts all over again.

Like many businesses surrounding universities, the busyness of The Grind also ebbs and flows with the university’s calendar. Each autumn, the leaves on the trees that fill the bluffs of southwest Wisconsin become brilliantly red and orange, the college football season begins, and thousands of students return to campus full of optimism and hormones. For nine months out of the year, the buzz of students swarming around downtown Bridgeport fills the air. The Grind serves as both a workspace and social gathering place for the WSU community. However, the coming and going of students is like geese migrating north and south with the changing seasons. Students leave a few weeks after the snow melts in the spring and, for a few months, the town is quiet. During the summer The Grind is merely filled with the smell of coffee and old books; a quiet place where it’s possible for a thinker to think a thought and a writer to write a word. Customers stay and sit, they conversate, they commune, they do not have someplace else to be. And then, when a new academic year begins, the students return to Bridgeport and the whole cycle begins anew.

And so it continues. The atmosphere of The Grind predictably changes both with the clock and the calendar. Students ride this carousel–the daily up-and down and yearly round-and-round–for a few laps until a new group of students get their turn.

Cal sat at a tall table near the window at the front of the cafe. His coffee was served in a Charlie Brown Christmas mug, which amused him considering that it was early June. “OK. Time to get to work,” he said to himself as he opened Violent Media to the page marked with a neon pink post-it note. A passage struck him. He pulled out his moleskin notebook and wrote “Exposure to violent media causes increases in aggressive thoughts, feelings, and behaviors, Jack Harris, Violent Media, page 124.” He nodded in satisfaction at this literary gem he’d discovered and silently mouthed the words as he reread the quote he’d just written down.

As much as he wanted to focus, the excitement of being in such a place kept pulling his attention from his book. Occasionally, Cal would study the old photos hanging on the wall and reflect on how he will soon be part of the long WSU tradition. Perhaps in 50 years there would be students looking at a photo of him. Perhaps his would be a generic face of WSU’s history. Perhaps the photo of Cal would someday look as outdated as the black-and-white photos look to him now. Perhaps someday he would produce a thought worth quoting and future students would point at the picture of the famous Calvin Olson. Perhaps. Someday.

He explored the shelves when he needed to stretch his legs. He’d lean to the side to better read the vertical spines of the books. He pulled out a ragged paperback about ice fishing and read a few lines. Then he closed his eyes and imagined how a blind person would experience the book. The old pages felt fragile like dried leaves and, if he concentrated really hard, he could detect the fishy smell of bluegill from the hands of the original owner. But just holding the book gave him a warm glow of inspiration that he wanted to capture and preserve. Each book was precious. The creation of the hard work of the author thinking each thought and crafting each sentence. He was holding a piece of art. The bookshelves of old books was a museum full of masterpieces. Cal wanted to hide from the world and hold each book–reading was quiet, mindful, and slow; the world was loud, distracting, and fast–even though he knew he wouldn’t have the time to read them once the semester started.

The bell attached to the back of the door would jingle and the sound of rain battering the sidewalk got louder whenever somebody entered. Cal peered over the top of his book to examine each new customer. If somebody looked his way, he lowered his glance to avoid eye contact and pretended to read. For amusement, he used these customers as characters in stories that he would play in his mind’s eye. For example, a middle-aged woman, perhaps 50, casually walked in and ordered a Chamomile tea. She looked bookish and Bohemian. She carried what appeared to be a homemade purse from which a sturdy book peered out the top. She must be an academic, which would make sense being this close to campus. Cal’s mind went to work filling in the blanks. Perhaps she was a world-renowned expert on the Oregon Trail. No wait, perhaps she was an expert in the prohibition-era Mafia or Jane Austen novels. Perhaps she was a chemist. A great chemist! And she was an inspiration to other female chemists because female scientists are rare, especially in the hard sciences. She probably would have interesting stories about being a female in a male-dominated profession. Cal knew she was probably none of these things, but it excited him to imagine that she might be one of those things. He could be ten feet away from a world-famous chemist!

In a moment of indulgence, Cal closed his eyes and soaked in the inspiration that seemed to radiate into his soul from every direction. Coffee! Books! Reading for work! Tickling his mind with idle thoughts of impressive-sounding titles and imagining being in the presence of world-renowned scientists! He was in heaven. Cal daydreamed this was how he would feel every day for the next five years, even though he knew it was just that, a silly daydream. He knew there was hard work and long days ahead. But indulging in these fantasies, if only for a moment, gave him an energy and yearning for the long-days and drudgery of graduate school. And that’s how he spent his morning.

Although the rain lightened, Cal still needed an umbrella as he marched back to his apartment. He hung his wet umbrella and his damp shoes in the shower. He sat on his pile of blankets in the middle of the mostly-empty apartment and ate a box of crackers as he continued reading Violent Media. It was important he knew this book inside-and-out, front-to-back, cover-to-cover, upside-down and downside-up, because Dr. Jack Harris was going to be his advisor over the next few years and he had a meeting with him tomorrow. Cal desperately wanted to make a good first impression. He watched all the videos of Dr. Harris he could find online, followed his social media accounts, and, now, was reading his newly-released book.

The summer rain created a peaceful backdrop of white noise and Cal didn’t know a soul in Bridgeport who could possibly interrupt his reading. He had a new book, hours of uninterrupted silence, and he was giddy with the limitless daydreams of graduate school that were unencumbered by reality. He was free to think and dream.  

When his mind would wander from fatigue, Cal snapped it back to the task at hand. “Focus!” Cal would tell himself. Sometimes he would jokingly yell it in a funny voice to break the silence of his apartment. “Focus!” And he’d laugh at how he chose to use his freedom. Then he would remind himself that he was now in graduate school and that he should not indulge in such immature thoughts.

There was a chapter about how listening to songs with violent lyrics made people angry, a chapter about how reading about how getting rejected on social media increased aggressive thoughts, and a chapter about how violent movies made people behave aggressively. The chapters were imbued with real-life instances of violence, like murders and school shootings, and then described the science of how violent media was implicated in each of these tragedies. The one-two combination of anecdotes pulling on your emotions and science pushing on your logic made for a persuasive narrative. The book ended on a beautifully optimistic note of how reducing media violence could contribute to a better and more peaceful world. Just like society should strive to not expose future generations to hazardous toxins in our environments, we should not pollute their minds with violence through media that is so pervasive in our 21st century social environments. If the awesome power of media was harnessed and focused in a positive direction, it could be used to produce so much good in this world. This optimism warmed Calvin’s heart like the first sip of a hot cup of coffee in a cold winter’s day. The way that Dr. Harris had ended a book about violence on such an upbeat note was masterful weaving together of pessimism and optimism like a maestro who had full command of the range of orchestra’s sounds that could be woven together into a beautiful concerto.  

Although it was still cloudy and grey outside–like he watched his day unfold in a black-and-white movie–his daydreams were in technicolor. This was the best day of Cal’s life so far. Cal felt the weight of the book in his hands as he read the final pages and nodded in satisfaction. He looked at Dr. Harris’ picture in the book jacket one last time before slowly closing it and laying his head down. He couldn’t believe that he would actually get to meet Dr. Harris tomorrow.

Improving my writing through reading: Simons (2014)

Linda Skitka recently shared a great exercise for helping students improve their writing.*

skitka tweet

This “reverse engineering” exercise sounds helpful. I can feel when writing is executed well and when it is not; however, I have never tried to verbalize why the writing is executed well. Thus, before asking my students to complete this exercise, I thought I would do a test run. Also, rather than outlining the paper like Skitka recommended in her tweet, I merely tried to identify what features of the paper worked well.

Below is my first attempt at reverse engineering the writing of an academic article. The purpose of this exercise is not to engage with the arguments of an article per se. Rather, the purpose is to identify features of the writing that I believe worked well, to abstract a lesson from those features, and to put those lessons in my writing toolbox. In short, the purpose of this exercise is to improve my writing voice by honing my reading ear.

You can think of this post as me trying to “think out loud” while I am reading an article.

Selecting an article

I can certainly bring to mind several articles that I feel are written poorly. And I can certainly bring to mind a handful of articles that I feel are written well. An article that is on my “well-written” list is Simons (2014). I assign this article in my undergraduate social psychology lab courses both because it is relevant to the course objectives and because the writing is crisp and economical. For these reasons, I thought this was a suitable article for a test run of this exercise.

Here are my thoughts about why I think Simons (2014) is well written.

Strong start and strong finish

Simons (2014) starts and finishes strong. Here is the first paragraph.


The article could have started with the second sentence: “The idea that direct replication…”. But notice how many little pieces of information are actually in the second sentence: There is a claim that “direct replication undergirds science,” there is support for that argument, and there are many concepts such as “robust,” “competent,” and “statistical power” that a reader must parse. If the article started with the second sentence, it would take the reader a moment to gain her bearings and understand the direction she is headed.

However, Simons (2014) starts with the brief and strong declaration that “Reproducibility is the cornerstone of science.” This first sentence boldly grabs the readers attention and says “we are starting right here!” It is a firm and unambiguous beginning.

Here is how Simons (2014) ends the introductory section.


Without even reading the content of this passage, it is visually obvious there are three main points being made. And having each point on a separate line implies these are important points; so important that the author does not want them to be lost in the jumble of a “normal” paragraph. The format of this paragraph screams out “hey, there is some important information right here!”

If you look at the full-text of the article, you also will notice there is a separate section devoted to each one of these points. This combination of three bullet points that prelude three corresponding sections provides a strong logical structure to the entire manuscript. To see why this works well, imagine if there was a fourth bullet point that did not have an accompanying section later in the article. The manuscript would appear incomplete or sloppy to the reader.

Finally, here is how Simons (2014) ends the article.


This article ends with an equally brief and strong declaration as the first sentence: “Direct replication is the only way to make sure our theories are accounting for signal and not noise.” This last sentence strongly and concisely summarizes the arguments that were made in the article. Also notice the use of “our” when describing theories. This subtly helps the author end the article on a positive note. This is not an author lecturing a group of “others,” but this is a person who is communicating information to a group of readers that includes himself.

Strategic use of levity

Simons (2014) includes this passage when discussing “hidden moderators.”


Understanding the hidden moderator argument is not important here. The important point is that Simons (2014) uses rather absurd examples such as phases of the moon and how much corn participants ate as a child. There is a bit of playfulness in this passage that is rare in the typically-dry academic articles. Authors can risk appearing not serious if they overused such absurd examples. However, this is the only place in the manuscript this sort of levity is employed, which makes it feel as if it was strategically placed rather than intellectually shallow filler.

The use of absurd examples also is a calculated choice. If plausible examples were used, no matter how explicitly they were labeled with disclaimers of being “just examples,” there would be some readers who infer those were actual examples that were referring to actual arguments made by actual people. The reader would then expend mental effort engaging with these plausible examples, which would detract from the general point that is being made. The use of absurd examples allows the reader to easily dispense with the specifics of these examples and to keep their full mental effort onto the heart of the general argument.

Simons (2014) also had a choice here to completely omit the highlighted text. It would have been grammatically correct to put a period after the word “infinite” and no intellectual information would have been lost. However, providing concrete examples, even absurd ones such as phases of the moon, helps the reader understand the abstract concept of “an infinite number of possible moderators.”

Address the arguments head-on

The final notable feature of Simons (2014) is that he does not dance around arguments that are counter to his thesis. Nor does he set up easily-defeated straw men. He fairly articulates a challenge to his central thesis and addresses it head-on. Here is a paragraph where this is nicely executed.


Notice that Simons (2014) points out areas of agreement (e.g., “Cesario is right…”) and uses the entire first half of the paragraph describing an argument in a way that is fair, free of mockery, and is not a caricature. Then, once this argument is laid out, the paragraph pivots on the word “But…”. Not only does the second half of the paragraph directly address the first half, the word “But” provides a clear logical relationship between the two halves: The first half is an argument and the second half is a counter argument. It is critical that the points of the argument is laid out fairly and that the counter-argument directly addresses those points because it allows the reader to feel as if the take-home message is due to an actual engagement with the ideas and not due to an unfair and one-sided framing of the arguments.


Why is Simons (2014) well written? My opinion is that the writing is plain, it has a strong start and finish, it uses helpful and strategically-placed examples, and it addresses the main arguments and counter-arguments head-on and logically.

The other conclusion from this exercise is that it was fun to introspect as I read a well-written article and, based on my experience, my students will now be doing this exercise this semester.


* This post is about trying to improve writing skills. Any comments about how my writing is less than exemplary is merely pointing out information I already know. I just hope that something (anything) I say is helpful to one other person.



Scientific Theories and Improvised Explosive Devises

Dr. Harris was a skillful instructor. He explained abstract concepts in a way that made his students feel as if they were part of an intelligent conversation rather than pupil-shaped furniture in the room being lectured to. His most impassioned lecture was on testing scientific theories: To the students in the WSU Social Psychology program, this lecture was known as “The Roadside Bomb Lecture”.

Dr. Harris would deliver this lecture on the first day of his social cognition course each fall semester. All the social psychology graduate students, regardless of whether they were currently enrolled in the course or not, showed up to hear the renowned Dr. Harris preach about the gospel of falsificationism and risky predictions. As Cal would soon learn, attending this lecture was one of the many unwritten rituals that existed within the tribe of WSU Social Psychology graduate students.

Room 153 of the WSU Psychology Building looked like a small movie theater. The rows of seating gradually descending to a stage in front that ran all the way from the left wall to the right. A WSU logo was projected onto the screen that took up the entire front wall. The room gave the instructor full control over the dimming of the lights and the loudness of the audio system that fully surrounded the students. To instructors who had a knack for flair and showmanship, students could have a full-sensory experience. And, for this reason, Dr. Harris chose this room for his classes.

Dr. Harris stoically stood behind the podium that sat off to the side of the stage and watched students enter through the door in the back of the room. He was well aware of how unusual it was that students who had already sat through this course were returning to see this lecture one more time. But his awareness of how unusual this was fed Dr. Harris’ already-inflated ego.  

The instant the clock turned 10 AM, Dr. Harris pressed a button on the remote he held in his right hand. A title slide projected onto the screen: “Theory Testing In Social Psychology”, which indicated that it was time to begin. He broke the silence with an authoritative voice. “Scientific theories allow one to make predictions about the world: If this theory is true, then we ought to expect some particular observation. If gravity pulls objects towards the center of the Earth, then we ought to expect things to fall.” Dr. Harris slowly raised a pen that was clutched in his fist up to his eye level, and then opened his hand and let the pen fell to the floor. “Gravity,” then he paused for dramatic effect, his arm fully extended, and his fingers spread wide, “works every time,” he said with a chuckle. The class laughed along with him to acknowledge the joke.

“We use our theories to make predictions. We then gather some observations and compare those observations to our predictions. Sometimes these predictions we get from our theories are correct and sometimes these predictions are incorrect. Sometimes it is a hit and sometimes it is a miss. Researchers really learn something when our predictions are incorrect.” Dr. Harris emphasized the last word to ensure his students didn’t mis-hear him. “Getting it wrong lets us know that our theory needs some work; that our ideas are not as good as we once thought. Counter-intuitively then, the goal of a good study is to put your theories in grave danger of making incorrect predictions. We want our theories to be put at risk of being wrong, we want to expose their weakest parts, to make them vulnerable, because that is when we have the greatest potential to learn. Did our theory make a correct prediction despite dire odds? Or is there some aspect of our theory that needs tweaking? If you really want to make progress we need to design studies where our theories really ‘stick their neck out there’ or, as Popper says, to make ‘risky predictions’. The theories that survive several attempts at being falsified, the ones that have survived several risky predictions, the ones that have proved their mettle time and again, are the ones that are most useful. The more situations, and more dire those situations, that these theories have survived, the more useful the theories.”

This was all abstract philosophy of science. It mapped onto the Popper and Lakatos that all of the students would eventually read as part of their course assignments. This is the point at which most instructors would try to find an example from the field of social psychology to “bring it back” to the course or they would stop and move onto a different topic altogether. But the real genius of Dr. Harris was in his storytelling; his ability to use a well-placed and well-timed anecdote. To hover over a topic long enough for students to understand, but also not so long that it felt tedious and repetitive.

Dr. Harris also had a knack for intuiting how well the students understood a concept. Students may think they understand a concept, but they also hadn’t fully developed their skills in distinguishing true understanding from the ability to parrot back the information. Dr. Harris seemed to know when students really and truly understand a concept even better than themselves. Even the most precise words pass through the filtering process of individuals’ subjective interpretations. And this subjectivity creates a situation where each student is looking at the same concept through their own idiosyncratic lens. And when you use one abstract concept to elucidate another abstract concept, which is used to elucidate another abstract concept, the idiosyncratic understanding of the students compounds itself until it is easy to be in a situation where nobody is sure whether they are talking about the same thing anymore. The students’ ideas drift off into the land of academia that is commonly divorced from real world phenomena all without their awareness of being adrift. But Dr. Harris could cut through the fuzziness of students’ understanding. He always had a story that peeled away the layers of abstractness and painted a concrete mental image. An idea so clear and crisp that students felt they could reach out and touch it. He communicated the most academic sounding concepts in a way that was intuitive and relatable. He would tell a story that was on the surface not about social psychology to help students relate these concepts to social psychology.

And so the real lecture began.

“Who knows where the term ‘Shock-and-Awe’ comes from?” Dr. Harris asked the class. A handful of hands went into the air. Then Dr. Harris paced back-and-forth across the stage at the front of the classroom. He explained that the term comes from the initial air bombing campaign of Operation Iraqi Freedom. He gave a brief history of the prelude to Operation Iraqi Freedom, which ended with him animatedly pointed to the ceiling as if he was pointing to an actual airplane and threw his arms into the air to mime the buildings exploding.

“OK, who knows what an Improvised Explosive Device is?” No hands went up. After a minute of silence, Dr. Harris joked that “he felt old” and that students should “ask your parents”. The students laughed again to acknowledge the joke.

Dr. Harris again slowly paced back-and-forth across the stage in the front of the room. He clasped his hands behind his back and looked down at his feet as he talked in his slow and authoritative voice. “In 2003, a coalition of countries that was primarily led by the U.S. military invaded Iraq. After the Shock-and-Awe bombing, the U.S.-led coalition rolled into Iraq with overwhelmingly superior military might. They came into Iraq from Kuwait, which is just south of Iraq, and it was a race to Baghdad. They had the best tanks, the most impervious armored vehicles, the fastest helicopters, etc. This was the most technologically sophisticated military in the world at the time. The coalition military forces were the best-trained too.

The coalition military forces expected to overcome the Iraqi defenses, there was no doubt about that. However, the surprising part was that there was so little resistance during this traditional military campaign. There was some resistance, but much, much less than what was expected. For the most part, the Iraqi army folded quickly. I talked to one of the soldiers who was in this initial invasion–he was a tank driver who later came back to school to use up his G.I. Bill–and he had a story where they rolled their tank up to a bunker and found Iraqi military uniforms and cups of coffee that were still warm. Think about that. The Iraqi soldiers saw the coalition military forces coming, knew there was no chance in hell they were going to get out of this alive, and left like five minutes before the Americans got there. The Iraqi soldiers took off their uniforms, grabbed their AK-47s, and just receded into the civilian population. Some of these Iraqi soldiers went home to their families and some would go on to fight as insurgents.

So after the coalition military forces overtook Baghdad, they pretty much had free reign in Iraq. They set up a bunch of bases and started going about the business of rebuilding the country. These bases the coalition military forces set up were surrounded by the infantry and artillery. They built up 20-foot dirt berms around the perimeter. They had drones circling overhead. They were pretty much impenetrable.  

Everything was going just as planned.

So you have the most technologically-advanced military fighting against a bunch of rag-tag insurgents who had AK-47s who had no real military and no uniforms. It should have been no problem for the coalition military forces to move around Iraq and go about the business of rebuilding the country, right?” Dr. Harris paused for dramatic effect. Unseen by the students in the class, Dr. Harris rolled his thumb across across the remote he was holding behind his back. The lights dimmed. Then a video of a convoy of U.S. Army vehicles driving down a road in the middle of a dessert was projected onto the screen. Boom! An explosion on the side of the road threw up a cloud of dust that completely filled the screen, and the video stopped abruptly. The lights slowly came back on as smoothly as they were dimmed. Then, in a solemn voice, Dr. Harris said “Two Americans died in that explosion”.  

This lecture was undoubtedly entertaining. The students were witnessing a master at his craft.

“What does this have to do with theory testing?” The classroom was quiet. The students who were hearing this lecture for the first time eagerly waited for Dr. Harris to answer his own question. This was the part that the other students had returned to hear. “Here’s how this is relevant to theory testing. The coalition military forces trained their soldiers with a particular theory in mind. They had a theory about how the war was going to proceed. So they trained their soldiers and armored their vehicles for that theoretical war. The coalition military had their combat troops who were the “front line” troops and their combat support troops who were behind the front lines. The combat troops trained heavily for situations where they would exchange fire with the enemy and the combat support troops did not. The combat troops had armored vehicles and the combat support troops did not. So you build up these bases, surround them by combat troops, and put the combat support troops inside. This made sense for the theoretical war where there is a ‘front line’ and a ‘behind the front line’.

The Iraqi military knew they could not compete with the coalition military in this theoretical war, so their soldiers abandoned these traditional tactics. They became guerilla fighters; they became insurgents. Then, once they were insurgents, they knew they could not overtake the coalition military bases in Iraq. These insurgents, who were hell-bent on fighting at all costs mind you, had to use irregular, non-traditional, and unconventional ways of fighting.

So what did these insurgents do? They pretty much left the coalition military forces’ bases alone. They didn’t bother the tanks that guarded the front gates of these bases. They let the coalition military forces fly their planes in-and-out of the country without much ado. Instead, they watched and studied; they poked and prodded. What blind spots were the coalition military forces overlooking? What vulnerabilities were left unchecked?

The insurgents found that, although the coalition military forces bases were well-defended, the convoys, especially convoys of combat support troops, were not well-defended. These convoys would leave the bases as if they were merely driving from point A to point B. From one well-defended base to another well-defended base. But when the troops moved from one base to another, that is, when they were not hidden behind the perimeter of the well-defended bases, they were exposed and vulnerable. Thousands of troops would step out from behind the berms and the tanks, they would drive their under-armored vehicles through the Iraqi countryside.

To make matters worse, the Iraqi countryside was filled with trash that these convoys would just drive right past. A truck driver’s attention was only so big, so the drivers’ focus was to be vigilant in the towns and merely keep it on the road in between towns. The insurgents found what they were looking for: Convoys of unarmored vehicles would mindlessly drive right past, or nearly right over, trash on the side of the road day in and day out.

So the insurgents started using these makeshift bombs called Improvised Explosive Devices. They buried these bombs in the shallow sand, in a piece of trash, or roadkill on the side of the road. The roads with these bombs were indistinguishable from any other stretch of road in the whole country. The coalition military forces would drive their vehicles right up to these explosive devices and the unarmored underbelly of these vehicles would be exposed. Boom! It was extremely bothersome and dangerous for these convoys. It’s not like you can stop a whole convoy everytime you come across a piece of trash. And if there was a bomb that went off, there may or may not be an insurgent nearby. All you could do was stop the convoy and try to minimize the damage.”

The entire class was nodding along and hanging onto every word Dr. Harris spoke. Cal was always a diligent note-taker, but this lecture was too entertaining; he didn’t even try to write down any notes, he just set his pen down and enjoyed the show.

“At this point you might wonder why these attacks were not anticipated. Why weren’t the coalition military forces trained for this sort of fighting? And why were so many vehicles left unarmored? The coalition military forces spent billions of dollars on their military training and equipment. They hired the best contractors who hired the best engineers. The coalition military forces educated their generals at the best universities in the world. And yet, these mistakes seem so obvious in hindsight. Why? How could so many smart people get it so wrong?” Dr. Harris paused for rhetorical effect. “The insurgents found these weaknesses because of their mindset,” Dr. Harris said as he pointed to the side of his head. “They would have never found these weaknesses if they weren’t thinking like insurgents. They would have never found these weaknesses if they weren’t desperate. If they didn’t have that primal mindset of a predator chasing the prey, a mindset that you cannot simulate”. The class nodded along in silence.

“The Iraqi insurgents were only able to identify these weak spots in these convoys because they were ruthless and desperate to find something, anything they could exploit. They desperately tried to find a chink in the armor of a seemingly solid defense. This was not an academic exercise for the insurgents of what they speculated was a possible weakness; they were not trying to win a defense contract by showing how tough the armor was, they were trying to win a war by showing that there was at least one flaw in the overall armor. Finding a weakness…one weakness…was the difference between winning and losing. If you chose to fight, one weakness, one flawed assumption, was the difference between life and death. And when these insurgents found a vulnerability, they mercilessly exploited it to full effect. They wreaked havoc.” A long pause fell over the class as this information settled in.

“So what did the coalition military forces do in response? They modified their theory of how the war was going to be fought. They addressed their once-hidden assumptions that the insurgents had so violently exploited. They re-armored all their vehicles and re-trained all their soldiers for the new realities of unconventional war. They tweaked their convoy protocols to address the threat of IEDs. Their updated and tweaked convoy protocols were provisionally considered “the new standard”. However, at some point the coalition military had to go beyond training or assuming their armor was fixed. At some point they had to send another convoy out the gates and get feedback from these pestering insurgents. Which of these convoys that were following the new protocols were safe? What was the new weakness in the armor that the insurgents had found? And they would incorporate that feedback into ways they could improve. This led to more re-armoring in the newly-found weak spots. It led to re-revised convoy protocols. Etc.

And this went on and on. The insurgents would desperately try to find new weaknesses and new ways to exploit these weaknesses. The coalition military would adjust. Then new weaknesses would be found and new adjustments would be made. It was iterative. The coalition military and the insurgents went back-and-forth, conjecture-and-refutation. In other words, these insurgents made the coalition military better, stronger, and safer because they kept pointing out the flaws in the coalition military’s thinking. It made the armor stronger and placed more smartly. The desperation and ruthlessness of the insurgents found weak spots in the coalition military that the smartest generals couldn’t anticipate. These insurgents made the coalition military severely pay for leaving these vulnerabilities exposed. And these insurgents wouldn’t let the coalition military get complacent. There was a constant stress and tentativeness that would never let the coalition military to feel like their convoys were perfected. There was always some as-yet-undiscovered vulnerability. But remember, this evolution in the coalition military’s strategy was only possible because of the insurgents’ mindset. There is no substitute for the harshness and ruthlessness of war to force you to evolve. There is no substitute for the vigilance of soldiers fighting under different banners trying to kill one another to bring these hidden assumptions into awareness.

This is the mindset that you need to have when testing scientific theories. Ruthless. Tireless. Merciless. There is no compassion when it comes to trying to expose flaws in our theories. Find the weakness in an idea and blow it up with whatever tools you have available. If you don’t have a tool, make one.” Dr. Harris made a hand gesture like a bomb exploding as he said this last sentence.

“You are all going to learn to think like scientists. Testing theories like a scientists involves two things. First, you need to patiently study our theories for the weakness and then maximally exploit it where it will do the most damage. What is the weak spot? What is the hidden assumption? Where is the place that others have overlooked? And how can you maximally exploit that assumption? What tools, what weapons, do you have access to? If you don’t have a weapon, make one. What materials do you have to make a weapon?

Second, it’s more than trying to think of theories’ weaknesses. It’s more than trying to address these weaknesses. When we are talking about ideas, and not people’s lives, you want to maximally expose your theories to scrutiny. If you really want to test your vehicle’s armor, you don’t just look at your vehicles and say ‘it looks safe to me’ and you don’t want to drive it through where the enemy isn’t. If you really, really want to know whether your vehicle’s armor works, you want to drive it right through where your enemy is strongest. If you really want to know where your armor is weakest, you want to let the most ruthless assholes desperately trying to find a way, any way, to kill your drivers with whatever weapons they have. As a scientist, you want to say ‘here are my methods, here are my data…do your worst’.  If you do that, you will learn fast. Scientists who fail to do that are cowardly. They are depriving themselves and others of the full knowledge of whether their claims can hold up to ruthless scrutiny. It would be like a shipbuilder who builds a ship, but then is too cowardly to actually put it into the water.”

Dr. Harris scanned the room as if he looked into each student’s eyes. “This is key,” Dr. Harris slowed his voice to convey the seriousness of his upcoming admonition, “I am talking about ruthlessly criticising a theory, an idea. I am not telling anybody to criticise the individual who proposed the theory. All too often we hear stories of brash young researchers who want to make a name for themselves. Individuals who want to break away from the crowd and make a splash. They want to win Twitter for the day. They don’t do it by proposing a clever idea. They don’t do it by contributing something positive to the field. They try to make a name for themselves by tearing down a high-profile target. This is why we know the names of Lee Harvey Oswald and John Wilkes Booth. Because these individuals made their name by tearing down people. It is a short-term gain and a long-term loss. Do not do it. Don’t be those people.”

The message was clear: Be an insurgent for others’ ideas, not the humans behind the idea.


Which behavior is more aggressive?

First, some mental stretching

If you compared two aggressive behaviors, would you be able to tell which one was more aggressive?

Let’s test your intuition. Consider these two scenarios.

(a) Person B insults Person A. Person A punches Person B in the face.

(b) Person 2 insults Person 1. Person 1 punches Person 2 in the face.

Which behavior was more aggressive, Person A’s behavior or Person 1’s behavior? Most people would say these are equivalently aggressive behaviors based on the information that is provided. Let’s try a different scenario; same game, different behaviors.

(a) As part of an experiment that is ostensibly about how tactile experiences affect cognitive performance, Person A assigns Person B to hold their hand in ice water for 45 seconds

(b) As part of an experiment that is ostensibly about how tactile experiences affect cognitive performance, Person 1 assigns Person 2 to hold their hand in ice water for 45 seconds.

Which behavior was more aggressive, Person A’s behavior or Person 1’s behavior? Again, your intuition would probably say these are equivalently aggressive behaviors based off the information that is provided. I mean, 45 seconds is the same as 45 seconds, right? Let’s try a final example.

(a) As part of an experiment that is ostensibly about how tactile experiences affect cognitive performance, Person A assigns Person B to hold their hand in ice water for 45 seconds.

(b) As part of an experiment that is ostensibly about how tactile experiences affect cognitive performance, Person 1 assigns Person 2 to hold their hand in ice water for 30 seconds.

Which behavior was more aggressive, Person A’s behavior or Person 1’s behavior? Your intuition would probably say that Person A’s behavior is more aggressive than Person 1’s. After all, 45 seconds is greater than 30 seconds.

OK, now that your intuition is warmed up, let’s poke and prod these ideas a little bit.

Aggression is the combination of several necessary things

What is aggression? There must be several factors present for aggression to occur. Aggression is commonly defined as a “behavior that is done with the intent to harm another individual who wants to avoid receiving the harm” (Baron & Richardson, 1994). Thus, for aggression to occur, there needs to be a behavior that, if successfully executed, would cause harm. This behavior also must have been done with intent (e.g., it is not an accident) and with the belief the recipient wanted to avoid the behavior.

Each of these features are necessary for aggression to occur; if one of the features is not present, then there is no aggression. For example, there must be a behavior (i.e., it is insufficient to merely desire to harm another individual). Further, a behavior can cause harm and not be aggressive if it is unintentional (e.g., accidentally dropping a hammer on somebody’s foot). And even an intentional behavior can cause harm and not be aggressive if it is believed the recipient does not want to avoid the behavior (e.g., two adults who engage in BDSM can intentionally cause tissue damage to one another as part of consensual sexual activities).

The amount of aggression is not defined based on the extremity of the consequences of the behavior

So what specifically does it mean when determining which of two aggressive behaviors was more aggressive? Does more aggression correspond to a more harmful behavior? Does it merely correspond to more intention to cause harm regardless of the actual harm? Both? As far as I can tell, this is an unanswered question (at least in the social psychology literature on aggression that I am familiar with).

Go back to the scenarios in the warm-up exercise. Suppose you intuited that having both individuals hold their hand in ice water for 45 seconds was equally aggressive and you intuited that having an individual hold their hand in ice water for 45 seconds is more aggressive than having an individual hold their hand in ice water for 30 seconds. If this indeed was your intuition, then it seems like your intuition is that more actual harm (defined here as how long one has to hold their hand in ice water) corresponds to more aggression.

However, it is not difficult to see how this “more actual harm = more aggression” correspondence can break down. Imagine a bar. You know, a place where everybody knows your name. Suppose Norm is really pissed at Sam. Norm throws an empty beer stein at Sam and strikes him in the head. Sam’s head hurts, but is otherwise OK. Now suppose that Woody is only slightly peeved at Cliff. Woody gives Cliff a firm and assertive push. Cliff stumbles backwards, accidentally falls, strikes his head against the bar, and ends up dying before the ambulance arrives. Woody, who only wanted to give a little push, is horrified that he killed Cliff.

Clearly dying is more harm than a headache, thus, Cliff has clearly been more severely harmed than Sam. However, is it really the case that an assertive push is more aggressive than throwing a heavy glass beer stein at an individual’s head? Probably not. Or at least that’s not what your intuition might say. It’s just that in this scenario the less intuitively aggressive behavior (i.e., the push) resulted in a more harmful consequence than the more intuitively aggressive behavior (i.e., throwing the beer stein).

The previous two paragraphs highlight an important concept to grasp. When comparing two aggressive behaviors, it is not the actual amount of harm, but it is the intended amount of harm, that determines which behavior was more aggressive. This would mean that Norm throwing the beer stein at Sam is more aggressive than Woody pushing Cliff even though Woody’s behavior resulted in more actual harm. Why? Because Norm intended to harm Sam more than Woody intended to harm Cliff.

So Woody’s behavior caused more harm than Norm’s even though Norm’s behavior was more aggressive than Woody’s. But Woody’s behavior only caused more harm because of all of the unintended stuff that happened after the behavior (i.e., Cliff stumbled and fatally hit his head on the bar). We cannot judge which of two behaviors is more aggressive based off the unforeseen and unintended things that happen after an intended behavior is executed. For example, what if Sam takes an aspirin for his headache, but the aspirin is actually a poison that slowly and painfully kills Sam over the course of several miserable days. Would Norm’s behavior now become equally as aggressive as Woody’s because Sam and Cliff both ended up dying? After all, Norm’s behavior caused Sam to take the aspirin/poison.

What does this mean for determining which behavior is more aggressive? 

Because we cannot evaluate the aggressiveness of a behavior based on the actual consequences, the best way to determine which of two behaviors is more aggressive is to compare the level of harm that was intended by the aggressor. There will be a series of events that unfolds once these intentions are manifested as actual behaviors, but only the consequences that are foreseen and intended by the aggressor can be used to evaluate the aggressiveness of the behaviors.

So Norm’s behavior is more aggressive than Woody’s because Norm intended to cause more harm than Woody. Full stop. For the purposes of determining who was more aggressive, it does not matter what unforeseen and unintended consequences follow. These are not relevant for evaluating the aggressiveness of the behavior.

As a mental exercise, you could evaluate Norm and Woody’s behaviors at the conclusion of their intended behaviors to determine who behaved more aggressively. For Norm, you could look at the amount of harm that was caused at the moment the intentionally-thrown beer stein hit Sam’s head. For Woody, you could look at the amount of harm that occurred at the point of the intentional push. If that is the behaviors these two intended to do, then this is what their aggressiveness ought be evaluated on. Everything that happens after their intended behavior is executed (e.g., Cliff falls and hits his head; Sam ingests poison) does not matter for evaluating who was more aggressive.

Now let’s return to the final scenario of the warm-up exercise. Here it is so you don’t have to scroll back up.

(a) As part of an experiment that is ostensibly about how tactile experiences affect cognitive performance, Person A assigns Person B to hold their hand in ice water for 45 seconds.

(b) As part of an experiment that is ostensibly about how tactile experiences affect cognitive performance, Person 1 assigns Person 2 to hold their hand in ice water for 30 seconds.

I believe most people would intuit that Person A was behaving more aggressively than Person 1 because 45 seconds is longer than 30 seconds. This would mean that we believe that Person A intended to cause more harm than Person 1. But aggression researchers must be careful not to “affirm the consequent“.  That would go something like this.

If Person A intends to cause more harm, then Person A will assign a longer time;

Person A assigns a longer time;

Therefore, Person A intended to cause more harm.

We only can assume that Person A intended to cause more harm than Person 1 if we also assume that their intentions map on directly to their behavior (e.g., there is a linear relationship between their intention and the amount of time selected in the cold water task). Further, when comparing the amount of aggression between individuals, it is necessary to assume that individuals’ intentions are similarly manifested into behaviors. However, it’s possible, for example, that Person A has a high tolerance for pain and assumes that holding your hand in ice water for 45 seconds is not that bad. It’s also possible that Person 1 has a low tolerance for pain and thinks that holding your hand in ice water for 30 seconds would be excruciating. In this case, Person 1’s 30 seconds would be intended to be more harmful than Person A’s 45 seconds.

What should aggression researchers do?

Option 1: Describe the results in a way that does not refer to participants’ intentions. For example, you could merely say “Person A assigned their recipient to hold their hand in ice water for a longer amount of time than Person 1”. Do not say “Person A was more aggressive than Person 1” because that is making a statement about their intentions.

Option 2: If you want to make claims about aggression, then you must explicitly state your assumptions. For example, you could say “For the ice water task, we are assuming that the amount of time assigned will directly correspond to participants’ intended harm to the recipient.” If you accept this assumption, then you could conclude that the behaviors that cause more actual harm are more aggressive. However, people are not required to accept your assumptions.

Option 3: You can measure aggression within-participants. It may not hold that two separate individuals’ responses directly correspond to their intentions. That is, what one person intends when they, for example, assign another person to hold their hand in ice water for 30 seconds may not be the same thing as another person intends when they also choose 30 seconds. In other words, their intentions can differ even if their behaviors are identical. However, imagine comparing two behaviors by the same individual. It seems way more plausible that the more harmful behavior for that individual is more aggressive than the less harmful behavior for that individual.

Is this effect smaller than the SESOI? Evaluating the hostile priming effect in the Srull & Wyer (1979) RRR

I was recently involved with a Registered Replication Report (RRR) of Srull & Wyer (1979). In this RRR, several independent labs collected data to test the “hostile priming effect”: An effect where exposing participants to stimuli related to the construct of hostility causes participants to subsequently judge ambiguous information as being more hostile. The results from each lab were combined into a random-effects meta-analysis. The result is a high-quality estimate of the hostile priming effect using a particular operationalization of the procedure.  

In both the original study and the RRR, participants completed a sentence descrambling task where they saw groups of 4 words (break, hand, his, nose) and had to identify 3 words that created a grammatically-correct phrase (break his nose). Participants in the RRR were randomly assigned to one of two conditions: A condition where 20% of the descrambled sentences described “hostile” behaviors or a condition where 80% of the descrambled sentences described “hostile” behaviors. All participants then (a) read a vignette about a man named Ronald who acted in an ambiguously-hostile manner, (b) reported their judgments of Ronald’s hostility, and (c) reported their judgments of the hostility of a series of unrelated behaviors.

Thus, the RRR had one between-participants condition (i.e., the 20% hostile sentences vs. the 80% hostile sentences) and two outcome variables (i.e., hostility ratings of Ronald and hostility ratings of the behaviors). We expected to observe more hostile ratings from those who were in the 80% hostile condition than those who were in the 20% hostile condition.

The full report of the RRR can be found here.

I want to discuss the result of the meta-analysis for ratings of Ronald’s hostility. On a 0-10 scale, we observed an overall difference of 0.08 points, a pretty small effect. However, because the RRR had so much statistical power, the 95% confidence interval for this effect was 0.004 to 0.16, which excludes zero and was in the predicted direction. This works out to a standardized mean difference of d = 0.06, 95%, CI[0.01, 0.12]. What should we make of this effect? Does this result corroborate the “hostile priming effect”? Or is this effect too small to be meaningful?

Here are my thoughts on this effect and my efforts to determine whether it is meaningful. However, just because I was an author on this manuscript should not bestow my opinions with any special privilege. I completely expect people to disagree with me.

First, the argument in favor of the detection of the hostile priming effect

Some people will point to a meta-analytic effect of d = 0.06, 95% CI[0.01, 0.12] and argue this ought to be interpreted as a successful demonstration of the hostile priming effect. The logic of this argument is simple: Because participants were randomly assigned to groups, a nil effect (i.e., an effect of 0.00) is a theoretically meaningful point of comparison. And because the 95% confidence interval does not include zero, one could claim the observed effect is “significantly” different from a nil effect. In other words, the observed effect is significantly greater than zero in the predicted direction.

To some, the magnitude of the effect does not matter. It only matters that an effect was detected and was in the predicted direction.

Arguments against the detection of the hostile priming effect

Without arguing about the magnitude of the effect, one can make at least two arguments against the idea that we detected the hostile priming effect. Essentially, these arguments are based on the idea that you can make different decisions about how you construct the confidence intervals, which would affect whether they include zero or not.

First, one could point out that there were two outcome variables and two meta-analyses. If you want to maintain an overall Type 1 error rate of 5%, one ought to adjust for the fact that we conducted two hypothesis tests. In this case, each adjusted confidence interval would be wider than the unadjusted 95% confidence interval. This would make the adjusted 95% confidence interval for the ratings of Ronald’s hostility contain zero, which, by the same logic as described in the previous section, would be interpreted as an effect that is “not significantly” different than zero.  

Second, you could argue that a 95% confidence interval is too lenient. Because of the resources that were invested in this study, perhaps we ought to adopt a more stringent criterion for detecting an effect such as a 99% confidence interval. Adopting, for example, a 99% confidence interval would make the interval wider and would then include zero.  

It is important to keep in mind that decisions on how to construct confidence intervals should be made a priori. In the RRR, we planned to construct 95% confident intervals separately for each of the outcome variables. Sticking to our a priori data analysis plan, the 95% confidence interval for the ratings of Ronald’s hostility excludes zero. For this reason, I don’t believe these arguments are very persuasive.

Is the observed effect too small to be meaningful?

Let’s assume that we accept that a hostile priming effect was detected. So what? A separate way to evaluate the effect for Ronald’s hostility is to ask: Is the detected effect meaningful? To answer this question we need to establish what we mean by “meaningful”. In other words, we need to establish what is the Smallest Effect Size of Interest (SESOI).

Once a SESOI is established, one can create a range of effects that would be considered smaller than what is “of interest.” Then we can test whether our observed effect is smaller than what would be “of interest” by conducting two one-sided significance tests against the upper- and lower-bounds of the SESOI region using the TOSTER package (see Lakens, Scheel, & Isager, 2018). If the observed result is significantly smaller than the upper-bound of this range and is significantly larger than the lower-bound of this range, then one can conclude the effect is smaller than the SESOI. Equivalently, one can construct a 90% confidence interval and see whether the 90% confidence interval falls completely between the lower and upper bounds of the SESOI.

Here are 6 ways that I created the SESOI for the ratings of Ronald’s hostility. [Disclaimer: I constructed these SESOI after knowing the results. Ideally, these decisions should be made prior to knowing the results. This would be a good time to think about what SESOI you would specify before reading what comes next].

1) What is the SESOI based on theory? The first way to determine the SESOI is to look to theory for a guide. As far as I can tell, the theories that are used to predict priming effects merely make directional predictions (e.g., Participants in Group A will have a higher/lower value on the outcome variable than participants in Group B). I cannot see anything in these theories that would allow one to say, for example, effects smaller than a d of X would be inconsistent with the theory. Please let me know if anybody has a theoretically-based SESOI for priming effects.

2) What is the SESOI implied by the original study? A second way to determine the SESOI is to look at what effect was detectable in the original study. Srull and Wyer (1979) included 8 participants per cell in their study. Notably, the original study included several other factors, and seemed to be primarily interested in the interactions among these factors, and the RRR was interested in the difference of two cells. Fair enough. Nevertheless, we could infer the SESOI based on what effect would have produced a significant effect given the sample that was included in the original study.

To determine what effects would not have been significant in the original study, we can estimate what effect would correspond to 50% power. An effect smaller than this would not have been significant, an effect exactly this magnitude would have produced p = .05, and an effect larger than this effect would have been significant in the original study. With n = 8 participants/cell, a one-tailed α = .05, and 1 – β = .50, the original authors would have needed an effect of d +/- 0.86 to find a p­-value < α. The effects from the RRR is significantly greater than d = -0.86 (z = 32.58, p < .001) and is significantly less than d = +0.86 (z = -28.59, p < .001).

3) What is the SESOI based on prior research? A third way to determine the SESOI is to look at the previous literature. In 2004, DeCoster and Claypool conducted a meta-analysis on priming effects with an impression formation outcome variable (an interesting side note: the effect size computed for Srull & Wyer [1979] was d > 5 and was deemed a statistical outlier in this meta-analysis). The meta-analysis concluded there is a hostile priming effect of about a third of a standard deviation, d = 0.35, 95% CI[0.30, 0.41] (more interesting side notes: This meta-analysis did not account for publication bias and also includes several studies that were authored by Stapel and were later retracted for fraud. Due to these two factors, it seems likely that this effect size is upwardly biased). Nevertheless, we can at least point to a number to create an SESOI and know where it came from. Perugini, Gallucci, and Costantini (2014) suggest using the lower limits of a previous meta-analysis to be conservative.

The lower limits of the 95% CI for the DeCoster and Claypool meta-analysis is d = 0.30. The effects from the RRR is significantly greater than d = -0.30 (z = 12.8, p < .001) and is significantly less than d = +0.30 (z = -8.41, p < .001).

4) What is the SESOI based on my subjective opinion? A fourth way to determine the SESOI is to merely ask yourself “what is the smallest effect size that I think would be meaningful?” To me, in the context of an impression formation task using a 0-10 scale, I would put my estimate to be somewhere around one-quarter of one point on the rating scale. In other words, I would consider a mean difference of 0.25 points to be the minimally-interesting difference. Of course, people can disagree with me on this.

The standard deviation for ratings of Ronald’s hostility was 1.44 units, which means that 0.25 units is an effect of d = (0.25/1.44) 0.17. The effects from the RRR is significantly greater than d = -0.17 (z = 8.20, p < .001) and is significantly less than d = +0.17 (z = -3.81, p < .001).  

5) What is the SESOI that represents the amount of resources that others are likely to invest in their future studies? A fifth way to determine the SESOI is to ask “how large of an effect would be needed to be routinely detectable by future researchers?” The answer to this question comes from determining the resources that future researchers would be likely to invest in detecting this effect. For me, I think that researchers would be willing to invest 1,000 participants into a study to trying to detect the hostile priming effect. Effects that require more than 1,000 participants would likely be deemed too expensive to routinely study. That is based on my gut and people are free to disagree.

If researchers were willing to collect n = 500 participants/cell, then they would be able to detect an effect as small as d = 0.16 with the minimum recommended level of statistical power (using a one-tailed (because of the directional prediction) α = .05, and 1 – β = .80). The effects from the RRR is significantly greater than d = -0.16 (z = 7.85, p < .001) and is significantly smaller than d = +0.16 (z = -3.46, p < .001).  

6) What is the SESOI based on an arbitrarily small effect size? Finally, we can determine the SESOI by using an arbitrary convention like Cohen’s suggestion that a d = 0.20 represents a small effect. Or, more stringently, we could follow Maxwell, Lau, and Howard (2015)’s suggestion to consider a d +/- 0.10 to be trivially small.

The effects from the RRR is significantly greater than d = -0.10 (z = 5.73, p < .001) and is NOT significantly less than d = +0.1 (z = -1.34, p = 0.09).  

A Summary of the SESOI analyses

Let’s put it all together into one visualization. Look at the figure below. The blue diamond on the bottom represents the meta-analytic effect of d = 0.06 for the hostile priming effect. The vertical blue dashed lines represent the 90% confidence interval for the hostile priming effect. Notice that the 90% confidence interval just excludes zero.

The horizontal red lines represent the “ranges of equivalence” that I specified above. Each of the horizontal red lines are centered around zero. If the red line is wider than both vertical dashed blue lines, then we would conclude that the observed effect is smaller than the SESOI.

equivalence testing figure

Consistent with the analyses in the previous section, we can see the horizontal red lines extend past the 90% confidence intervals except for the arbitrarily small effect size of d +/- 0.10. Thus, by most standards, we would consider the observed effect to be smaller than the SESOI.

So What Do We Conclude?

For one of the two outcome variables in the RRR, we detected a hostile priming effect in the predicted direction. Further, this detected effect is not significantly smaller than an arbitrarily small effect of d = 0.10 (but then again, our study was not designed to have high power to reject such a small SESOI).

However, when we construct the SESOI in any other way, this detected effect is significantly smaller than the SESOI. It would take several thousands of participants to routinely detect a hostile priming effect of this magnitude, which makes it likely too resource expensive to make this effect part of an ongoing program of research.

But the question that we really want answered is “what does this effect mean for theory?” Unfortunately (and frustratingly), the theories that predict such priming effects are too vague to determine whether an observed effect of d = 0.06 is corroborating or not, which means that intelligent people will still disagree on how to interpret this effect.


Code for equivalence tests and figure:

# here is the code for conducting the equivalence tests for the Srull &amp; Wyer RRR
# code written by Randy McCarthy
# contact him at with any question

# implied by the original study

TOSTER::TOSTmeta(ES=0.0621, se=0.0283, low_eqbound_d=-0.86, high_eqbound_d=0.86, alpha=0.05)

# from LL of decoster and claypool (2004) 

TOSTER::TOSTmeta(ES=0.0621, se=0.0283, low_eqbound_d=-0.3, high_eqbound_d=0.3, alpha=0.05)

# from my subjective judgment

TOSTER::TOSTmeta(ES=0.0621, se=0.0283, low_eqbound_d=-0.17, high_eqbound_d=0.17, alpha=0.05)

# from amount of resources likely to be invested

TOSTER::TOSTmeta(ES=0.0621, se=0.0283, low_eqbound_d=-0.16, high_eqbound_d=0.16, alpha=0.05)

# from an arbitrarily small effect

TOSTER::TOSTmeta(ES=0.0621, se=0.0283, low_eqbound_d=-0.10, high_eqbound_d=0.10, alpha=0.05)


# plotting the equivalence tests 

equivRanges &lt;- ggplot() +
  xlim(-1.5, 1.5) +
  xlab(&quot;&quot;) +
  geom_point(aes(x = 0.06, y = 0.05),
             color = &quot;blue4&quot;,
             size = 5,
             shape = &quot;diamond&quot;) +
  scale_y_continuous(name = &quot;&quot;, limits = c(0, 1), breaks = NULL) +
  geom_vline(aes(xintercept = 0),
             color = &quot;black&quot;,
             size = 1) +
  geom_vline(aes(xintercept = c(0.01, 0.11)),
             color = &quot;blue4&quot;,
             size = 1,
             linetype = &quot;dashed&quot;) +
  geom_segment(aes(y    = c(0.9, 0.7, 0.5, 0.3, 0.1),
                   yend = c(0.9, 0.7, 0.5, 0.3, 0.1),
                   x    = c(-0.86, -0.30, -0.17, -0.16, -0.10),
                   xend = c(0.86, 0.30, 0.17, 0.16, 0.10)),
               color = &quot;red&quot;,
               size = 1.5) +
  geom_label(aes(y = c(0.95, 0.75, 0.55, 0.35, 0.15),
                 x = 0,
                 label = c(&quot;50% Power of Original Study&quot;,
                           &quot;ll of CI From Previous Meta-Analysis&quot;,
                           &quot;Randy&#039;s Subjective Opinion&quot;,
                           &quot;Economic Argument&quot;,
                           &quot;Arbitrarily Small Effect&quot;)),
             size = 4,
             nudge_x = -0.5) +
  ggtitle(&quot;Equivalence Testing&quot;) +
  xlab(&quot;Standardized Mean Difference of &#039;Hostile Priming Effect&#039;&quot;) +
<span id="mce_SELREST_start" style="overflow:hidden;line-height:0;"></span>

"Open Science" is risky

Open science practices are “risky”. Not in the sense that they are potentially dangerous, but in the sense that they make it easier for you to be wrong. You know, theoretically “risky”.

Theoretical progress is made by examining the logical implication of a theory, deducing a prediction from the theory, making observations, and then comparing the actual observations to the predicted observations. One way to infer theoretical progress is the extent to which our predicted observations get closer and closer to our actual observations.

Actual observations that are consistent with the predicted observations are considered corroborating. In this case, we tentatively and temporarily maintain the theory. Actual observations that are inconsistent with the predicted observations are considered falsifying of the theory. In this case, we should modify or abandon the theory. The new theory can then be submitted to the same process. Much like long division will iteratively hone in on the quotient, many iterations of this conjecture-and-refutation process will slowly increase the ability of the theory to make accurate predictions.

One key aspect of the predictive impressiveness of a theory is the class of observations that are considered “falsifiers”. That is, the predictive impressiveness of a theory comes from how many observations are forbidden by the theory. Predictions that have lots of potentially falsifying observations are considered “risky”.

As an intuitive example, suppose I have a theory to predict where a ball will land on a roulette wheel. I could predict the color of the pocket where the ball would land (Rouge ou Nior) or that the ball would land on an even/odd number (Pair ou Impair). In this bet, a successful prediction forbids about 50% of the possible outcomes. Other bets are riskier though. The riskiest bet on a single spin is to predict the ball will land on a single number. In this bet, a successful prediction forbids 37/38 possible outcomes. The payoffs from these different bets reflects the riskiness of the predictions. The riskier bet (e.g., predicting ball will land on black 15) pays off more than the less risky bet (e.g., predicting a ball will merely land on a black pocket). Correspondingly, a theory that correctly predicts the riskier bets is considered to be more predictively impressive than a theory that correctly predicts the less risky bets.

Our scientific theories are much the same. A theory that makes a vague prediction (e.g., Group A will have slower reaction times than Group B) will have less predictive impressiveness than a theory that makes more specific predictions (e.g., Group A will respond 350-500 ms slower to stimuli than Group B). So we can increase the predictive impressiveness of our scientific theories by having the predicted observations be more precise.

However, unlike predicting the outcomes of a roulette wheel, the predictive impressiveness of scientific theories is not exclusively evaluated on the precision of the predicted observations. Predictive impressiveness also comes from characteristics of the process. That is, our scientific theories not only predict outcomes, but also attempt to explain why those outcomes occur. Even if the predictive outcomes are the same, an observation can become riskier if we constrain possible reasons for why the outcome occurred.

Suppose a researcher has a theory that drinking coffee increases alertness. A study may randomly assign participants to be in the coffee group or the no coffee group. And the outcome variable may be how quickly participants respond to stimuli on a screen as a proxy for alertness. Even if the predicted outcome is the same (i.e., the coffee group will respond faster than the no coffee group), the prediction can be riskier by ruling out other possible reasons for an observation. That is, all else being equal, a study that demonstrates that the coffee group responds faster than the no coffee group will be more impressive if there are certain characteristics of the methods such as double-blinding participants and experimenters, giving the no coffee group decaf as a placebo, etc. The reason these methodological characteristics increase the predictive impressiveness of the theory is that they rule out other plausible explanations for the observations. For example, if the predicted result only occurred for studies that were not double-blind, then the observations are likely due to demand effects and not due to coffee, which would be damning for your original theory. In short, we add these methodological characteristics in order to constrain alternative plausible explanations for our observations, which increases the class of observations that would be considered falsifying.

The same logic applies to “open science” practices such as pre-registration and the open sharing of data and stimuli. These methodological characteristics cannot turn a bad study into a good study, but these features make it easier for others to find errors in your data, errors in your choice of statistical analyses, weaknesses in your chosen stimuli, etc. In other words, these practices make it easier for you to be wrong because you have provided would-be critics with all of the information they need to root out errors in your claims. Open science practices say to the world “Prove me wrong. And to help you, I am going to try and make it as easy as possible for you to find a mistake that I made.”

All else being equal, studies whose claims are both consistent with a theory and whose methods have been maximally exposed to daylight are stronger than studies whose claims are merely consistent with a theory.


Multi-Site Collaborations Provide Robust Tests of Theories

According to Popper (1959) “We can say of a theory, provided it is falsifiable, that it rules out, or prohibits, not merely one occurrence, but always at least one event” (p. 70). I argue that, all else being equal, multi-site collaborations more robustly test theories than studies done at a single site at a single time by a single researcher because the data from a multi-site collaboration more robustly represent the theoretically falsifying event.

Let’s break down the key concepts of this argument.

What is a multi-site collaboration?

A multi-site collaboration is a study that involves a team of researchers at several locations who each test the same hypothesis. Often these collaborations use the same data collection procedures and same stimuli. Their individual results are then pooled together, often times in a meta-analysis, regardless of the results from any of the individual labs.*

Thus, the features necessary to test the hypotheses are the same across all labs. But there are inevitably some lab-to-lab differences in the specifics of the samples, the physical setting of the lab, the precise time the data are collected, etc.

Good exemplars of multi-site collaborations are the ManyLabs projects (see here or here) or Registered Replication Reports (see here or here).

Occurrences vs. Events

The next key concept is the distinction between occurrences and events. In the first sentence I said that a scientific theory must forbid at least one event. Popper considered a specific instance of a researcher deducing a hypothesis, operationalizing the theoretically-necessary features, and making an observation to be an occurrence. Each occurrence includes the features of a study that are deduced from the theory. And each occurrence takes place in the presence of a unique and idiosyncratic combination of other factors such as the specific time and specific location of a study. An event, on the other hand, represents the class of all possible occurrences that are equally deducible from the theory (an event = occurrence1, occurrence2, occurrence3, …occurrencek).

Thus, occurrences are confounded with the idiosyncratic combination of other factors at a specific time and specific location, whereas events transcend those factors. Events represent only what can be logically deduced from a theory; occurrences also contain the infinite other factors that are inevitably present when an event is instantiated. Thus, the more robustly we can create events, the more robustly we can test our theories.

An example

Suppose I have a theory that “listening to a song with violent lyrics increases the accessibility of aggressive cognitions”. This is a legitimate scientific theory because it allows you to deduce which events are consistent with the theory and which events are inconsistent with the theory. Namely, those who listen to songs with violent lyrics should have an increase in aggressive thoughts and should not have a similar level or a decrease in aggressive thoughts.

Suppose Researcher A conducts a study. This study will include the necessary features to test a hypothesis that was deduced from a theory. For example, Researcher A may hypothesize that listening to Johnny Cash’s Folsom Prison Blues (a song with violent lyrics) would cause them to complete more word stems (e.g., KI _ _) with aggressive words (e.g., KILL) than non-aggressive words (e.g., KISS; a measure of the accessibility of aggressive cognitions). The results from this study would be an occurrence. Thus, in addition to the deduced theoretically-necessary features to test a hypothesis, this single occurrence is confounded with an idiosyncratic combination of theoretically-irrelevant factors. For example, the observations in this single study occur in the presence of participants’ interaction with the experimenter, what the 3rd participant ate for breakfast yesterday, the ambient temperature of the room, the position of the stars when the last participant completed the study, etc., etc., etc.

Now suppose Researcher B also conducts a study. This researcher also deduces the features that would be theoretically necessary to test the hypothesis. Suppose this researcher follows Researcher A’s approach and uses Johnny Cash’s Folsom Prison Blues as the song with violent lyrics and also uses the word-fragment completion task as the measure of aggressive thoughts. The results from this study also would be an occurrence. Thus, this study includes the features of the study that were deduced from a theory and occurs in the presence of an idiosyncratic combination of theoretically-irrelevant variables. Further, the idiosyncratic combination of theoretically-irrelevant variables are different for Researcher A and Researcher B. That is, the observations made by Researcher B will likely occur in the presence of different interactions with the experimenter, a different breakfast by the 3rd participant, a different ambient temperature of the room, a different position of the stars when the last participant completed the study, etc., etc., etc.

Because the combination of theoretically-irrelevant factors differ for each occurrence, the occurrence made by Researcher A will not be equivalent to the occurrence made by Researcher B in all possible ways. This non-equivalence is what people often refer to when they say “there is no such thing as an exact replication”: Two studies always differ in some aspects (such people often point to the inarguable presence of differences between occurrences and imply those occurrences do not belong to the same event class). However, and critically, each of the occurrences in this example are equally deducible from the theory. So each of these occurrences belong to the same event class, which means they are equally useful for potentially falsifying the theory.

In fact, because a single occurrence is confounded by the combination of theoretically-relevant and theoretically-irrelevant factors that are present when a single observation is made, any individual occurrence is ambiguous: Was the observation due to the theoretically-necessary variables? Or was the observation due to a freaky alignment of other factors that will never be recreated?

With a single study at a single site, we can assume that an occurrence was due to the theoretically-necessary variables and we can assume that it was not due to a freaky alignment of other factors. It is up to individuals as to whether or not they want to accept those assumptions. To empirically test whether an event is consistent or inconsistent with a theory, we need observations from several occurrences. That is, we need several observations that maintain the deduced theoretically-necessary features, but differ in the theoretically-irrelevant features that confound each individual observation, in order to disentangle the former from the latter.

Putting it all together

Let’s go back to our example. The observation made by Researcher A is an occurrence. The observation made by Researcher B is an occurrence. Because these occurrences were equally deducible from the theory, these occurrences belong to the same event. It is necessary to observe several occurrences to disentangle the effects due to the theoretically-deduced factors from the theoretically-irrelevant factors.

Multi-site collaborations involve several researchers who each make observations across a range of occurrences. That is, multi-site collaborations involve observations being made across a range of idiosyncratic combinations of theoretically-irrelevant factors. Collectively, these individual occurrences better approximate the class of events that are used to test theories than any individual occurrence. Thus, all else being equal, multi-site collaborations provide more robust tests of our theories than a single study done at a single location at a single time.

I argue that Researcher A and Researcher B should agree on what study is logically deduced from their theory, each collect data following the same agreed-upon protocol (i.e., each make an occurrence within the same event class), combine their data into a common analysis regardless of how their individual data come out, and plan to grab a beer together at the next conference.**

For this, and for many other reasons, I hope that multi-site collaborations become commonplace in psychological science.

Where to begin? 

Are you wanting to get involved in some multi-site collaborations? Here are some places to begin.

StudySwap: an online platform where you can find other collaborators.
The Psychological Science Accelerator: a network of labs who have committed to devoting some of their research resources to multi-site collaborations.
Registered Replication Reports: multi-site collaborations of replications of previously-published research.

*I believe the lack of inclusion bias in the meta-analyses from multi-site collaborations is probably the greatest methodological strength of these studies. However, this post is focusing on a different benefit of multi-site collaborations.

**This last part is a crucial feature of a successful multi-site collaboration.