But what exactly is inside BookCorpus?
This is the research question that Nicholas Vincent and I ask in a new working paper that attempts to address some of the “documentation debt” in machine learning research — a concept discussed by Dr. Emily M. Bender and Dr. Timnit Gebru et al. in their Stochastic Parrots paper.
While many researchers have used BookCorpus since it was first introduced, documentation remains sparse. The original paper that introduced the dataset described it as…
How does Twitter’s algorithm change what users see in their timelines? In a new research study from the Computational Journalism Lab, we present evidence of several shifts that result from Twitter’s timeline algorithm. Specifically, compared to the old-fashioned chronological timeline, Twitter’s algorithm:
This piece summarizes the technical details from my forthcoming paper auditing Twitter’s timeline curation algorithm. The main findings are in this blog and the full details are in the research paper, but here I will summarize the following:
Sock-puppet auditing involves emulating…
In the United States, there is currently no federal institution that protects the public from harmful algorithms.
We can buy eggs, get a vaccine, and drive on highways knowing there are systems in place to protect our safety: the USDA checks our eggs for salmonella, the FDA checks vaccines for safety and effectiveness, the NHTSA makes sure highway turns are smooth and gentle for high speeds.
Why does it always take so long to scroll to the actual recipe? I do not have a cute story other than I have been experimenting with granola recipes for two years, from the New York Times to random blogs to classic books like “Joy of Cooking” and “How to Cook Everything.” I landed on a synthesized recipe that provides a simple and delicious “base” granola. Here it is:
As the pandemic swept across the world and we all started spending more time on Facebook and other apps, I decided to stop lurking all the time and start participating more. The widespread resonance of the term “doomscrolling” made me wonder: why do we spend so much time scrolling through these feeds if they make us miserable?
There are currently thousands of propaganda websites masquerading as local news websites across the United States, as the New York Times reported in October 2020 and the Columbia Journalism Review reported in August 2020.
The network of websites spells disaster for the news ecosystem on a number of levels, especially if the sites receive a lot of attention. As Renée Diresta articulated in this WIRED piece, there is an important distinction between “free speech” and “free reach.” Free speech entails Brian Timpone’s ability to write and publish “propaganda ordered up by dozens of think tanks, political operatives, corporate executives and…
Over the summer, I crunched the numbers on about 80,000 TikTok videos pertaining to the prank on Trump’s re-election rally in Tulsa. My main interest was understanding how TikTok’s algorithms may have played a role in promoting the prank. This post summarizes findings from my workshop research paper, which was presented at the RecSys 2020 workshop on responsible recommendation.
If you’ve seen The Matrix, you likely remember the déjà vu scene, in which Neo notices a black cat walk by twice:
Even watching the animated GIF can induce some disturbing chills. And that sense of disturbance is no coincidence: as Trinity quickly explains to Neo, this minor “glitch” involving the black cat is actually an important sign. It indicates that the agents of the Matrix have changed something in the program, rearranging the reality that Neo, Trinity, Morpheus, and others must face.
As articulated by authors Catherine D’Ignazio and Lauren Klein, Data Feminism is “a way of thinking about data, both their uses and their limits, that is informed by direct experience, by a commitment to action, and by intersectional feminist thought.” It has seven core principles:
In this post, I will illustrate some principles from Data Feminism by breaking down this unemployment chart recently published by ProPublica.
To apply the first two core principles from Data Feminism (examine power and challenge power)…
PhD student studying AI, ethics, and media. Trying to share things I learn in plain english. 🐦 @jackbandy