Fairness and Bias

A closer look at BookCorpus, the text dataset that helps train large language models for Google, OpenAI, Amazon, and others

BookCorpus has helped train at least thirty influential language models (including Google’s BERT, OpenAI’s GPT, and Amazon’s Bort), according to HuggingFace.

But what exactly is inside BookCorpus?

This is the research question that Nicholas Vincent and I ask in a new working paper that attempts to address some of the…

New research on Twitter’s timeline curation algorithm sheds light on how it shapes what we’re exposed to.

How does Twitter’s algorithm change what users see in their timelines? In a new research study from the Computational Journalism Lab, we present evidence of several shifts that result from Twitter’s timeline algorithm. Specifically, compared to the old-fashioned chronological timeline, Twitter’s algorithm:

  • ↘️ Showed fewer external links,
  • ✨ Elevated lots…

Following The Markup’s example, I split this blog into the main findings and this “show your work” piece.

This piece summarizes the technical details from my forthcoming paper auditing Twitter’s timeline curation algorithm. …

Might it be time to create an “FDA for algorithms?”

In the United States, there is currently no federal institution that protects the public from harmful algorithms.

We can buy eggs, get a vaccine, and drive on highways knowing there are systems in place to protect our safety: the USDA checks our eggs for salmonella, the FDA checks vaccines for…

Literally just an easy recipe for basic cinnamon granola, with pretty pictures.

Why does it always take so long to scroll to the actual recipe? I do not have a cute story other than I have been experimenting with granola recipes for two years, from the New York Times to random blogs to classic books like “Joy of Cooking” and “How to…

Posting to Facebook feels like trying to entertain a UFC stadium, while posting to Medium feels like an open mic.

As the pandemic swept across the world and we all started spending more time on Facebook and other apps, I decided to stop lurking all the time and start participating more. …

There are currently thousands of propaganda websites masquerading as local news websites across the United States, as the New York Times reported in October 2020 and the Columbia Journalism Review reported in August 2020.

The network of websites spells disaster for the news ecosystem on a number of levels, especially…

The event illustrates how TikTok’s algorithms can make mass political communication more accessible, but it is still no democratic utopia.

Over the summer, I crunched the numbers on about 80,000 TikTok videos pertaining to the prank on Trump’s re-election rally in Tulsa. My main interest was understanding how TikTok’s algorithms may have played a role in promoting the prank. …

Applying an important lesson from Dr. Ruha Benjamin’s book, “Race After Technology” — there may be a difficult truth beneath the glitch.

If you’ve seen The Matrix, you likely remember the déjà vu scene, in which Neo notices a black cat walk by twice:

Even watching the animated GIF can induce some disturbing chills. And that sense of disturbance…

Breaking down a “data visceralization” with principles from Data Feminism, a book by Catherine D’Ignazio and Lauren Klein.

As articulated by authors Catherine D’Ignazio and Lauren Klein, Data Feminism is “a way of thinking about data, both their uses and their limits, that is informed by direct experience, by a commitment to action, and by intersectional feminist thought.” It has seven core principles:

  1. Examine power
  2. Challenge power
  3. Elevate…

Jack Bandy

PhD student studying AI, ethics, and media. Trying to share things I learn in plain english. 🐦 @jackbandy

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store