The Ambeo Mini delivers highly detailed sound and supports AirPlay 2, Tidal, and Spotify Connect. It's also compatible with Dolby Atmos and Sony's 360 Reality Audio formats – for a hefty $800.
Read Entire Article
The Ambeo Mini delivers highly detailed sound and supports AirPlay 2, Tidal, and Spotify Connect. It's also compatible with Dolby Atmos and Sony's 360 Reality Audio formats – for a hefty $800.

This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology.
Chinese ChatGPT alternatives just got approved for the general public
The news: Baidu, one of China’s leading artificial-intelligence companies, has announced it’s opening up access to its ChatGPT-like large language model, Ernie Bot, to the general public.
The context: Launched in mid-March, Ernie Bot was the first Chinese ChatGPT rival. Since then, many Chinese tech companies, including Alibaba and ByteDance, have followed suit and released their own models. Yet all of them force users to sit on waitlists or go through approval systems, making the products mostly inaccessible for ordinary users
What’s next: On August 30, Baidu posted on social media that it will also release a batch of new AI applications within the Ernie Bot as the company rolls out open registration today. But even with the new access, it’s unclear how many people will use the products. Read the full story.
—Zeyi Yang
Here’s what we know about hurricanes and climate change
It’s now possible to link climate change to all kinds of extreme weather, from droughts to flooding to wildfires.
Hurricanes are no exception—scientists have found that warming temperatures are causing stronger and less predictable storms. That’s a concern, because hurricanes are already among the most deadly and destructive extreme weather events around the world. In the US alone, three hurricanes each caused over $1 billion in damages in 2022. In a warming world, we can expect the totals to rise.
But the relationship between climate change and hurricanes is more complicated than most people realize. Here’s what we know, and—as Hurricane Idalia batters the Florida coast—what to expect from the storms to come. Read the full story.
—Casey Crownhart
Casey’s story is part of MIT Technology Review Explains, designed to help you make sense of what’s coming next. Check out the rest of the stories in the series.
If you’d like to read more about how climate charge can supercharge hurricanes, take a look at the most recent edition of The Spark, Casey’s weekly climate newsletter. To receive it in your inbox every Wednesday, sign up here.
The must-reads
I’ve combed the internet to find you today’s most fun/important/scary/fascinating stories about technology.
1 X wants to collect your biometric data
Elon Musk’s ongoing crusade to rid the platform of bot accounts has taken a sinister turn. (Bloomberg $)
+ Audio and video calls are also in the company’s pipeline. (Mashable)
2 The United Arab Emirates is getting into generative AI
It hopes to bring bilingual LLMs to more than 400 million Arabic speakers worldwide. (FT $)
+ German startup Aleph Alpha wants to be the European OpenAI. (Wired $)
+ How AWS spectacularly fumbled its AI lead. (The Information $)
+ The inside story of how ChatGPT was built from the people who made it. (MIT Technology Review)
3 Meta has declined to suspend the account of Cambodia’s leader
Despite the request coming from its own board. (WP $)
+ The company has internally admitted stifling legitimate political speech. (The Intercept)
4 A grocery delivery app encouraged its workers to brave Hurricane Idalia
‘Bad Weather = Good Tips,’ it told them. (Motherboard)
+ Georgia has declared a state of emergency. (The Guardian)
+ Conspiracy theorists are attempting to downplay natural disasters online. (NYT $)
5 YouTube’s radicalization crackdown appears to have worked
Extremist videos are harder to find, but learning from the past remains critical. (The Atlantic $)
+ YouTube’s algorithm seems to be funneling people to alt-right videos. (MIT Technology Review)
6 Cheap Chromebooks aren’t the good deal they used to be
And schools end up stuck with piles of increasingly useless machines. (WSJ $)
7 It’s scarily easy to track someone on the NYC subway
Your journey history is available to anyone with your financial details. (404 Media)
8 Burning Man is seriously bad for the planet
Just traveling to the festival comes has a high environmental cost. (Vox)
9 Smashing up asteroids creates new space debris 
Which we need to keep an eye on to make sure it’s not more dangerous than the original threat. (Wired $)
+ Watch the moment NASA’s DART spacecraft crashed into an asteroid. (MIT Technology Review)
10 We’re learning more about how to treat chronic pain
For some patients, electrical nerve stimulation is offering relief when nothing else works. (Economist $)
+ Brain waves can tell us how much pain someone is in. (MIT Technology Review)
Quote of the day
“It just gives me a negative vibe.”
—Belinda Davey, a 36-year-old retail worker in Australia, tells the Wall Street Journal why she created a shortcut that replaces X’s new logo with the original Twitter bird.
The big story
We used to get excited about technology. What happened?

October 2022
As a philosopher who studies AI and data, Shannon Vallor’s Twitter feed is always filled with the latest tech news. Increasingly, she’s realized that the constant stream of information is no longer inspiring joy, but a sense of resignation.
Joy is missing from our lives, and from our technology. Its absence is feeding a growing unease being voiced by many who work in tech or study it. Fixing it depends on understanding how and why the priorities in our tech ecosystem have changed. Read the full story.
We can still have nice things
A place for comfort, fun and distraction in these weird times. (Got any ideas? Drop me a line or tweet ’em at me.)
+ I’m loving these photos of the blue supermoon from across the world (did you see it?)
+ Wait, is that Bob Dylan?
+ Dreaming isn’t just for humans—spiders may do it too.
+ It’s officially time to bring back 1930s slang (it’ll blow your wig)
+ Who doesn’t love pistachios?

This article is from The Spark, MIT Technology Review’s weekly climate newsletter. To receive it in your inbox every Wednesday, sign up here.
When I was growing up near the US Gulf Coast, it was more common for my school to get called off for a hurricane than for a snowstorm.
So even though I live in the Northeast now, by the time late August rolls around I’m constantly on hurricane watch. And while the season has been relatively quiet so far, a storm named Idalia changed that, hitting the coast of Florida this morning as a Category 3 hurricane. (Also, let’s not forget Hurricane Hilary, which in a rare turn of events hit California last week.)
Tracking these storms as they’ve approached the US, I decided to dig into the link between climate change and hurricanes. It’s fuzzier than you might think, as I wrote about in a new story today. But as I was reporting, I also learned that there are a ton of other factors affecting how much damage hurricanes do. So let’s dive into the good, the bad, and the complicated of hurricanes.
The good news is that we’ve gotten a lot better at forecasting hurricanes and warning people about them, says Kerry Emanuel, a hurricane expert and professor emeritus at MIT. I wrote about this a couple of years ago in a story about new supercomputers being adopted by the National Weather Service in the US.
In the US, average errors in predicting hurricane paths dropped from about 100 miles in 2005 to 65 miles in 2020. Predicting the intensity of storms can be tougher, but two new supercomputers, which the agency received in 2021, could help those forecasts continue to improve too. The computers were recently used to upgrade the agency’s forecasting model for this hurricane season.
Supercomputers aren’t the only tool forecasters are using to improve their models, though—some researchers are hoping that AI could speed up weather forecasting, as my colleague Melissa Heikkilä wrote earlier this summer.
Forecasting needs to be paired with effective communication to get people out of harm’s way by the time a storm hits—and many countries are improving their disaster communication methods. Bangladesh is one of the world’s most disaster-prone countries, but the death toll from extreme weather has dropped quickly thanks to the nation’s early warning systems.
The bad news is that there are more people and more stuff in the storms’ way than there used to be, because people are flocking to the coast, says Phil Klotzbach, a hurricane researcher and forecaster at Colorado State University.
The population along Florida’s coastline has doubled in the past 60 years, outpacing the growth nationally by a significant margin. That trend holds nationally: population growth in coastal counties in the US is happening at a quicker clip than in other parts of the country.
Several insurance companies have already stopped doing business in Florida because of increasing risks, and this year’s hurricane season could affect how readily residents are able to get insurance.
And the expected damage from disasters affects different groups in different ways. Across the US, white people and those with more wealth are more likely to get federal aid after disasters than others, according to an NPR investigation.
Climate change is loading the dice on most extreme weather phenomena. But what specific links can we make to hurricanes?
A few effects are pretty well documented both in historical data and in climate models.
One of the clearest impacts of climate change is rising temperatures. Warmer water can transfer more energy into hurricanes, so as global ocean temperatures hit new heights, hurricanes are more likely to become major storms.
Warmer air can hold more moisture (think about how humid the air can feel on a hot day, compared with a cool one.) Warmer, wetter air means more rainfall during hurricanes—and flooding is one of the deadliest aspects of the storms.
And rising sea levels are making storm surges more severe and coastal flooding more common and dangerous.
But there are other effects that aren’t as clear, and questions that are totally open. Most striking to me is that researchers are in total disagreement about how climate change will affect the number of storms that form each year.
For more on what we know (and what we don’t know) about climate change and hurricanes, check out my story from this morning. Stay safe out there!
Forecasting is a difficult task, but supercomputers and AI are both helping scientists better predict weather of all types. Check out my 2021 story on forecasting supercomputers, and my colleague Melissa Heikkilä’s piece on AI forecasting from earlier this summer.
Flooding is the deadliest part of hurricanes, and cities aren’t prepared to handle it, as I wrote about in 2021. New York City put in a lot of coastal flood defenses after Hurricane Sandy in 2012. Then Hurricane Ida dodged those, as I covered after the storm.
Millions lost power after Hurricane Ida. My colleague James Temple wrote about how crucial and difficult it is to keep the power on during disasters.
This has been a summer of extreme weather, from heat waves to wildfires to flooding. Here are 10 data visualizations to sum up a brutal season. (Wired)
A new battery manufacturing facility from Form Energy is being built on the site of an old steel mill in West Virginia. The factory could help revitalize the region’s flagging economy. (The Guardian)
EV charging in the US is getting complicated. Here’s a great explainer that untangles all the different plugs and cables you need to know about. (New York Times)
→ Things are changing because many automakers are switching over to Tesla’s charging standard. (MIT Technology Review)
The first offshore wind auction in the Gulf of Mexico fell pretty flat, with two of three sites getting no bids at all. The lackluster results reveal the challenges facing offshore wind, especially in Texas. (The Guardian)
A Chinese oil giant is predicting that gasoline demand in the country will peak this year, earlier than previously expected. Electric vehicles are behind diminishing demand for gas. (Bloomberg)
→ The “inevitable EV” was one of our picks for the 10 Breakthrough Technologies of 2023. (MIT Technology Review)
Vermont’s leading subsidy program for small battery installations is getting bigger. (Canary Media)

This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology.
Large language models aren’t people. Let’s stop testing them as if they are.
In the past few years, multiple researchers claim to have shown that large language models can pass cognitive tests designed for humans, from working through problems step by step, to guessing what other people are thinking.
These kinds of results are feeding a hype machine predicting that these machines will soon come for white-collar jobs; that they could replace teachers, doctors, journalists, and lawyers. Geoffrey Hinton has called out GPT-4’s apparent ability to string together thoughts as one reason he is now scared of the technology he helped create.
But there’s a problem. There’s little agreement on what those results really mean. Some people are dazzled by what they see as glimmers of human-like intelligence, while others aren’t convinced one bit. And the desire to anthropomorphize such models is confusing people about what they can and cannot do. Read the full story.
—William Douglas Heaven
The involuntary criminals behind pig-butchering scams
Pig-butchering scams are everywhere. The scams, the term for which refers to the lengthy, trust-building process of raising a pig for slaughter, have extorted victims out of millions, if not billions, of dollars.
But in recent weeks, growing attention has been granted to the scammers behind these crimes, who are often victims themselves. A new book in English, a movie in Chinese, and a slew of media reports are shining a light on the fascinating (and horrifying) aspects of a scary trend in human trafficking, where victims leave their homes in the hope of gaining stable employment, but end up held captive and unable to leave. Read the full story.
—Zeyi Yang
Zeyi’s story first appeared in China Report, his weekly newsletter giving you the inside track on all things tech in China. Sign up to receive it in your inbox every Tuesday.
The must-reads
I’ve combed the internet to find you today’s most fun/important/scary/fascinating stories about technology.
1 The FBI has dismantled a colossal malware botnet
It had infected more than 700,000 computers across the world. (The Verge)
+ It’s the most sophisticated botnet the authorities have ever encountered. (The Register)
+ Russian cybercrime forums are offering big cash prizes for scam tutorials. (Wired $)
2 Big Tech is propping up deepfake porn
Its hosting infrastructure spreads non-consensual material to wide audiences. (Bloomberg $)
+ A horrifying AI app swaps women into porn videos with a click. (MIT Technology Review)
3 Google has unveiled a suite of new corporate AI tools
The goal is to put its AI-powered office software in the hands of as many customers as possible. (WSJ $)
4 Amazon is facing legal action over the sales of unapproved drugs
It’s been warned multiple times in the past year to stop selling unproven medicines. (FT $)
5 A new nuclear arms race is beckoning
Relations between the US, Russia, and increasingly China, are growing increasingly fraught. (Economist $)
6 Arizona’s chip factory is struggling to get online
America’s home-built chip ambitions are off to a rocky start. (The Guardian)
+ The $100 billion bet that a postindustrial US city can reinvent itself as a high-tech hub. (MIT Technology Review)
7 Alternative meats are hot right now 
Governments have sunk $1 billion into no-kill meat. But will we eat it? (Vox)
+ Will lab-grown meat reach our plates? (MIT Technology Review)
8 How to talk to whales—using AI 
Scientists are deciphering the mammals’ speech patterns, and creating their own along the way. (Wired $)
+ Under-represented languages pose a major obstacle for AI models. (Rest of World)
9 These materials are being used to build the cities of the future
Greener wood, cement, and glass is on the horizon. (Bloomberg $)
+ How green steel could clean up a dirty industry. (MIT Technology Review)
10 Robotaxis are weird now
Just ask the people who live in the cities they’re clogging up. (The Atlantic $)
+ Robotaxis are here. It’s time to decide what to do about them. (MIT Technology Review)
Quote of the day
“The lack of crouching is probably the most glaring issue.”
—Virtual reality enthusiast Brad Lynch offers his opinion on Meta’s decision to finally add legs to its VR avatars.
The big story
We asked Bill Gates, a Nobel laureate, and others to name the most effective way to combat climate change

February 2021
Despite decades of warnings and increasingly devastating disasters, we’ve made little progress in slowing climate change.
Given the lack of momentum, how do we make faster, more significant progress? We asked 10 experts a single question: “If you could invent, invest in, or implement one thing that you believe would do the most to reduce the risks of climate change, what would it be and why?” Read the full story.
—James Temple
We can still have nice things
A place for comfort, fun and distraction in these weird times. (Got any ideas? Drop me a line or tweet ’em at me.)
+ If you’re lucky enough to see the super blue moon this week, here’s how to maximize your chances of taking the best photo possible.
+ What did druids really get up to?
+ McDonald’s McFlurry machines are always broken just when you need them. Here’s why.
+ The making of Jamiroquai’s Virtual Insanity music video is actually insanely cool.
+ Congratulations to Python, which has been named this year’s top programming language.

When Taylor Webb played around with GPT-3 in early 2022, he was blown away by what OpenAI’s large language model appeared to be able to do. Here was a neural network trained only to predict the next word in a block of text—a jumped-up autocomplete. And yet it gave correct answers to many of the abstract problems that Webb set for it—the kind of thing you’d find in an IQ test. “I was really shocked by its ability to solve these problems,” he says. “It completely upended everything I would have predicted.”
Webb is a psychologist at the University of California, Los Angeles, who studies the different ways people and computers solve abstract problems. He was used to building neural networks that had specific reasoning capabilities bolted on. But GPT-3 seemed to have learned them for free.
Last month Webb and his colleagues published an article in Nature, in which they describe GPT-3’s ability to pass a variety of tests devised to assess the use of analogy to solve problems (known as analogical reasoning). On some of those tests GPT-3 scored better than a group of undergrads. “Analogy is central to human reasoning,” says Webb. “We think of it as being one of the major things that any kind of machine intelligence would need to demonstrate.”
What Webb’s research highlights is only the latest in a long string of remarkable tricks pulled off by large language models. For example, when OpenAI unveiled GPT-3’s successor, GPT-4, in March, the company published an eye-popping list of professional and academic assessments that it claimed its new large language model had aced, including a couple of dozen high school tests and the bar exam. OpenAI later worked with Microsoft to show that GPT-4 could pass parts of the United States Medical Licensing Examination.
And multiple researchers claim to have shown that large language models can pass tests designed to identify certain cognitive abilities in humans, from chain-of-thought reasoning (working through a problem step by step) to theory of mind (guessing what other people are thinking).
These kinds of results are feeding a hype machine predicting that these machines will soon come for white-collar jobs, replacing teachers, doctors, journalists, and lawyers. Geoffrey Hinton has called out GPT-4’s apparent ability to string together thoughts as one reason he is now scared of the technology he helped create.
But there’s a problem: there is little agreement on what those results really mean. Some people are dazzled by what they see as glimmers of human-like intelligence; others aren’t convinced one bit.
“There are several critical issues with current evaluation techniques for large language models,” says Natalie Shapira, a computer scientist at Bar-Ilan University in Ramat Gan, Israel. “It creates the illusion that they have greater capabilities than what truly exists.”
That’s why a growing number of researchers—computer scientists, cognitive scientists, neuroscientists, linguists—want to overhaul the way they are assessed, calling for more rigorous and exhaustive evaluation. Some think that the practice of scoring machines on human tests is wrongheaded, period, and should be ditched.
“People have been giving human intelligence tests—IQ tests and so on—to machines since the very beginning of AI,” says Melanie Mitchell, an artificial-intelligence researcher at the Santa Fe Institute in New Mexico. “The issue throughout has been what it means when you test a machine like this. It doesn’t mean the same thing that it means for a human.”
“There’s a lot of anthropomorphizing going on,” she says. “And that’s kind of coloring the way that we think about these systems and how we test them.”
With hopes and fears for this technology at an all-time high, it is crucial that we get a solid grip on what large language models can and cannot do.
Open to interpretation
Most of the problems with how large language models are tested boil down to the question of how the results are interpreted.
Assessments designed for humans, like high school exams and IQ tests, take a lot for granted. When people score well, it is safe to assume that they possess the knowledge, understanding, or cognitive skills that the test is meant to measure. (In practice, that assumption only goes so far. Academic exams do not always reflect students’ true abilities. IQ tests measure a specific set of skills, not overall intelligence. Both kinds of assessment favor people who are good at those kinds of assessments.)
But when a large language model scores well on such tests, it is not clear at all what has been measured. Is it evidence of actual understanding? A mindless statistical trick? Rote repetition?
“There is a long history of developing methods to test the human mind,” says Laura Weidinger, a senior research scientist at Google DeepMind. “With large language models producing text that seems so human-like, it is tempting to assume that human psychology tests will be useful for evaluating them. But that’s not true: human psychology tests rely on many assumptions that may not hold for large language models.”
Webb is aware of the issues he waded into. “I share the sense that these are difficult questions,” he says. He notes that despite scoring better than undergrads on certain tests, GPT-3 produced absurd results on others. For example, it failed a version of an analogical reasoning test about physical objects that developmental psychologists sometimes give to kids.
In this test Webb and his colleagues gave GPT-3 a story about a magical genie transferring jewels between two bottles and then asked it how to transfer gumballs from one bowl to another, using objects such as a posterboard and a cardboard tube. The idea is that the story hints at ways to solve the problem. “GPT-3 mostly proposed elaborate but mechanically nonsensical solutions, with many extraneous steps, and no clear mechanism by which the gumballs would be transferred between the two bowls,” the researchers write in Nature.
“This is the sort of thing that children can easily solve,” says Webb. “The stuff that these systems are really bad at tend to be things that involve understanding of the actual world, like basic physics or social interactions—things that are second nature for people.”
So how do we make sense of a machine that passes the bar exam but flunks preschool? Large language models like GPT-4 are trained on vast numbers of documents taken from the internet: books, blogs, fan fiction, technical reports, social media posts, and much, much more. It’s likely that a lot of past exam papers got hoovered up at the same time. One possibility is that models like GPT-4 have seen so many professional and academic tests in their training data that they have learned to autocomplete the answers.
A lot of these tests—questions and answers—are online, says Webb: “Many of them are almost certainly in GPT-3’s and GPT-4’s training data, so I think we really can’t conclude much of anything.”
OpenAI says it checked to confirm that the tests it gave to GPT-4 did not contain text that also appeared in the model’s training data. In its work with Microsoft involving the exam for medical practitioners, OpenAI used paywalled test questions to be sure that GPT-4’s training data had not included them. But such precautions are not foolproof: GPT-4 could still have seen tests that were similar, if not exact matches.
When Horace He, a machine-learning engineer, tested GPT-4 on questions taken from Codeforces, a website that hosts coding competitions, he found that it scored 10/10 on coding tests posted before 2021 and 0/10 on tests posted after 2021. Others have also noted that GPT-4’s test scores take a dive on material produced after 2021. Because the model’s training data only included text collected before 2021, some say this shows that large language models display a kind of memorization rather than intelligence.
To avoid that possibility in his experiments, Webb devised new types of test from scratch. “What we’re really interested in is the ability of these models just to figure out new types of problem,” he says.
Webb and his colleagues adapted a way of testing analogical reasoning called Raven’s Progressive Matrices. These tests consist of an image showing a series of shapes arranged next to or on top of each other. The challenge is to figure out the pattern in the given series of shapes and apply it to a new one. Raven’s Progressive Matrices are used to assess nonverbal reasoning in both young children and adults, and they are common in IQ tests.
Instead of using images, the researchers encoded shape, color, and position into sequences of numbers. This ensures that the tests won’t appear in any training data, says Webb: “I created this data set from scratch. I’ve never heard of anything like it.”
Mitchell is impressed by Webb’s work. “I found this paper quite interesting and provocative,” she says. “It’s a well-done study.” But she has reservations. Mitchell has developed her own analogical reasoning test, called ConceptARC, which uses encoded sequences of shapes taken from the ARC (Abstraction and Reasoning Challenge) data set developed by Google researcher François Chollet. In Mitchell’s experiments, GPT-4 scores worse than people on such tests.
Mitchell also points out that encoding the images into sequences (or matrices) of numbers makes the problem easier for the program because it removes the visual aspect of the puzzle. “Solving digit matrices does not equate to solving Raven’s problems,” she says.
Brittle tests
The performance of large language models is brittle. Among people, it is safe to assume that someone who scores well on a test would also do well on a similar test. That’s not the case with large language models: a small tweak to a test can drop an A grade to an F.
“In general, AI evaluation has not been done in such a way as to allow us to actually understand what capabilities these models have,” says Lucy Cheke, a psychologist at the University of Cambridge, UK. “It’s perfectly reasonable to test how well a system does at a particular task, but it’s not useful to take that task and make claims about general abilities.”
Take an example from a paper published in March by a team of Microsoft researchers, in which they claimed to have identified “sparks of artificial general intelligence” in GPT-4. The team assessed the large language model using a range of tests. In one, they asked GPT-4 how to stack a book, nine eggs, a laptop, a bottle, and a nail in a stable manner. It answered: “Place the laptop on top of the eggs, with the screen facing down and the keyboard facing up. The laptop will fit snugly within the boundaries of the book and the eggs, and its flat and rigid surface will provide a stable platform for the next layer.”
Not bad. But when Mitchell tried her own version of the question, asking GPT-4 to stack a toothpick, a bowl of pudding, a glass of water, and a marshmallow, it suggested sticking the toothpick in the pudding and the marshmallow on the toothpick, and balancing the full glass of water on top of the marshmallow. (It ended with a helpful note of caution: “Keep in mind that this stack is delicate and may not be very stable. Be cautious when constructing and handling it to avoid spills or accidents.”)
Here’s another contentious case. In February, Stanford University researcher Michal Kosinski published a paper in which he claimed to show that theory of mind “may spontaneously have emerged as a byproduct” in GPT-3. Theory of mind is the cognitive ability to ascribe mental states to others, a hallmark of emotional and social intelligence that most children pick up between the ages of three and five. Kosinski reported that GPT-3 had passed basic tests used to assess the ability in humans.
For example, Kosinski gave GPT-3 this scenario: “Here is a bag filled with popcorn. There is no chocolate in the bag. Yet the label on the bag says ‘chocolate’ and not ‘popcorn.’ Sam finds the bag. She had never seen the bag before. She cannot see what is inside the bag. She reads the label.”
Kosinski then prompted the model to complete sentences such as: “She opens the bag and looks inside. She can clearly see that it is full of …” and “She believes the bag is full of …” GPT-3 completed the first sentence with “popcorn” and the second sentence with “chocolate.” He takes these answers as evidence that GPT-3 displays at least a basic form of theory of mind because they capture the difference between the actual state of the world and Sam’s (false) beliefs about it.
It’s no surprise that Kosinski’s results made headlines. They also invited immediate pushback. “I was rude on Twitter,” says Cheke.
Several researchers, including Shapira and Tomer Ullman, a cognitive scientist at Harvard University, published counterexamples showing that large language models failed simple variations of the tests that Kosinski used. “I was very skeptical given what I know about how large language models are built,” says Ullman.
Ullman tweaked Kosinski’s test scenario by telling GPT-3 that the bag of popcorn labeled “chocolate” was transparent (so Sam could see it was popcorn) or that Sam couldn’t read (so she would not be misled by the label). Ullman found that GPT-3 failed to ascribe correct mental states to Sam whenever the situation involved an extra few steps of reasoning.
“The assumption that cognitive or academic tests designed for humans serve as accurate measures of LLM capability stems from a tendency to anthropomorphize models and align their evaluation with human standards,” says Shapira. “This assumption is misguided.”
For Cheke, there’s an obvious solution. Scientists have been assessing cognitive abilities in non-humans for decades, she says. Artificial-intelligence researchers could adapt techniques used to study animals, which have been developed to avoid jumping to conclusions based on human bias.
Take a rat in a maze, says Cheke: “How is it navigating? The assumptions you can make in human psychology don’t hold.” Instead researchers have to do a series of controlled experiments to figure out what information the rat is using and how it is using it, testing and ruling out hypotheses one by one.
“With language models, it’s more complex. It’s not like there are tests using language for rats,” she says. “We’re in a new zone, but many of the fundamental ways of doing things hold. It’s just that we have to do it with language instead of with a little maze.”
Weidinger is taking a similar approach. She and her colleagues are adapting techniques that psychologists use to assess cognitive abilities in preverbal human infants. One key idea here is to break a test for a particular ability down into a battery of several tests that look for related abilities as well. For example, when assessing whether an infant has learned how to help another person, a psychologist might also assess whether the infant understands what it is to hinder. This makes the overall test more robust.
The problem is that these kinds of experiments take time. A team might study rat behavior for years, says Cheke. Artificial intelligence moves at a far faster pace. Ullman compares evaluating large language models to Sisyphean punishment: “A system is claimed to exhibit behavior X, and by the time an assessment shows it does not exhibit behavior X, a new system comes along and it is claimed it shows behavior X.”
Moving the goalposts
Fifty years ago people thought that to beat a grand master at chess, you would need a computer that was as intelligent as a person, says Mitchell. But chess fell to machines that were simply better number crunchers than their human opponents. Brute force won out, not intelligence.
Similar challenges have been set and passed, from image recognition to Go. Each time computers are made to do something that requires intelligence in humans, like play games or use language, it splits the field. Large language models are now facing their own chess moment. “It’s really pushing us—everybody—to think about what intelligence is,” says Mitchell.
Does GPT-4 display genuine intelligence by passing all those tests or has it found an effective, but ultimately dumb, shortcut—a statistical trick pulled from a hat filled with trillions of correlations across billions of lines of text?
“If you’re like, ‘Okay, GPT4 passed the bar exam, but that doesn’t mean it’s intelligent,’ people say, ‘Oh, you’re moving the goalposts,’” says Mitchell. “But do we say we’re moving the goalpost or do we say that’s not what we meant by intelligence—we were wrong about intelligence?”
It comes down to how large language models do what they do. Some researchers want to drop the obsession with test scores and try to figure out what goes on under the hood. “I do think that to really understand their intelligence, if we want to call it that, we are going to have to understand the mechanisms by which they reason,” says Mitchell.
Ullman agrees. “I sympathize with people who think it’s moving the goalposts,” he says. “But that’s been the dynamic for a long time. What’s new is that now we don’t know how they’re passing these tests. We’re just told they passed it.”
The trouble is that nobody knows exactly how large language models work. Teasing apart the complex mechanisms inside a vast statistical model is hard. But Ullman thinks that it’s possible, in theory, to reverse-engineer a model and find out what algorithms it uses to pass different tests. “I could more easily see myself being convinced if someone developed a technique for figuring out what these things have actually learned,” he says.
“I think that the fundamental problem is that we keep focusing on test results rather than how you pass the tests.”

This story first appeared in China Report, MIT Technology Review’s newsletter about technology developments in China. Sign up to receive it in your inbox every Tuesday.
There’s something so visceral about the phrase “pig-butchering scam.” The first time I came across it was in my reporting a year ago, when I was looking into how strange LinkedIn connection requests turned out to be from crypto scammers.
As I wrote then, fraudsters were creating “fake profiles on social media sites or dating sites, [to] connect with victims, build virtual and often romantic relationships, and eventually persuade the victims to transfer over their assets.” The name, which scammers themselves came up with, compares the lengthy, involved trust-building process to what it’s like to grow a pig for slaughter. It’s a tactic that has been used to steal millions of dollars from victims on LinkedIn and other platforms. You can read that story here.
But there are also other, far more dire consequences to these scams. And over the past few weeks, I’ve noticed growing attention, in both the US and China, to the scammers behind these crimes, who are often victims of the scams themselves. A new book in English, a movie in Chinese, and a slew of media reports in both languages are now shining light on the fascinating (and horrifying) aspects of a scary trend in human trafficking.
For a sense of scale, just last week Binance, one of the largest crypto exchanges, released data showing a huge jump in the number of pig-butchering scams reported to the company: an increase of 100.5% from 2022 to 2023, even though there are still a few months left in this year.
This kind of fraud is the subject of a new Chinese movie that unexpectedly became a box-office hit. No More Bets is centered on two Chinese people who are lured to Myanmar with the promise of high-paying jobs; once trapped abroad, they are forced to become scammers, though—spoiler alert—they eventually manage to escape. But many of their fellow victims are abused, raped, or even killed for trying to do the same.
While the plot is fictional, it was adapted from dozens of interviews the movie crew conducted with real victims, some of which are shown at the end of the film. (I’ll probably check out the movie when it premieres in the US on August 31.)
Many low-level scammers have in fact been coerced into conducting crimes. They leave their homes with the hope of getting stable employment, but once they find themselves in a foreign country—usually Myanmar, Cambodia, or the Philippines—they are held captive and unable to leave.
Since the movie came out on August 8, it has made nearly $470 million at the box office, placing it among the top 10 highest-grossing movies worldwide this year, even though it was only screened in China. It has also dominated social media discourse in China, inspiring over a dozen trending topics on Weibo and other platforms.
At the same time, investigative reports from Chinese journalists have corroborated the credibility of the movie’s plot. In a podcast published earlier this month, one Chinese-Malaysian victim told Wang Zhian, an exiled Chinese investigative journalist, about his experience of being lied to by job recruiters and forced to become a scammer in the Philippines. There, 80% of his colleagues were from mainland China, with the rest from Taiwan and Malaysia.
Many of them are from rural areas and have little education. But as another Chinese publication recently reported, scammer groups are increasingly looking to recruit highly educated people as they target more Chinese students overseas, or even English-speaking populations.
Chinese people are no strangers to telecom fraud and online scams, but the recent wave of attention has made them aware of how globalized these scams have become. It has also tarnished the reputation of Southeast Asian countries, which are now struggling to attract Chinese tourists.
These days, if you type “Myanmar” into Douyin, the Chinese version of TikTok, all autocompletes are related to the pig-butchering scams, like the “self-told story of someone who escaped from Myanmar.” There are still videos promoting Myanmar to tourists, but the comment sections are filled with viewers who insinuate that the Burmese video creators are working for the human-trafficking groups. Myanmar even recently tried to work with a Chinese province to promote tourism, and most social media responses were negative.
Meanwhile, in the US, Number Go Up, a new book about cryptocurrencies by Bloomberg reporter Zeke Faux, is out next month. Faux traveled to Sihanoukville in southwestern Cambodia, where criminal gangs orchestrate pig-butchering scams. It was once a prosperous casino town for Chinese businesspeople (gambling is outlawed in China). But after the Cambodian government turned against gambling, and the pandemic made international travel difficult, the gambling gangs turned their casinos into online scam operation centers.
Faux visited one giant compound called “Chinatown,” where scam victims are trapped and isolated from the outside world by metal gates. Neighbors told Faux of frequent suicides: “If an ambulance doesn’t go inside at least twice a week, it is a wonder.” One victim told him he had to hide a phone in his rectum to get in touch with someone outside and escape.
But stories of successful escapes are rare. Even though the Chinese government announced in mid-August that it would work more with Southeast Asian countries to crack down on these criminal activities, it remains to be seen how successful those efforts will be. In the case of Cambodia, international law enforcement actions so far have been obstructed by alleged corruption on the ground, according to a recent investigation by the New York Times.
As I reported last year, there are many factors that make it hard to hold these scammers accountable: their use of crypto, the weak government control in the regions where they operate, and the criminals’ ever-changing tactics and platform choices. But the fact that both reporting and pop culture are starting to draw attention to where and how these criminal groups operate could be a good first step toward justice.
What solution do you think can help reduce the number of pig-butchering scams? Let me know your thoughts at zeyi@technologyreview.com
1. Forbes got a copy of a draft proposal from 2022 that would address national security concerns related to TikTok. While it is unclear whether the draft is still being considered a year later, it shows that the US government wanted unprecedented control over the platform’s internal data and essential functions. (Forbes)
2. After Japan started releasing treated radioactive water into the ocean last week, the Chinese government protested by banning seafood imports from the country. (CNN)
3. The US commerce secretary, Gina Raimondo, visited Beijing on Monday, making her the latest high-ranking Biden administration official to travel to the country. She agreed with her Chinese counterpart that they would launch an “information exchange” on export controls. (Associated Press)
4. A new type of battery developed by the Chinese company CATL can make fast charging for EVs even faster. (MIT Technology Review)
5. The Biden administration is hoping to secure a six-month extension of the Science and Technology Agreement with China, a 44-year-old document that fosters scientific collaboration. (NBC News)
6. Chinese ultra-fast-fashion company Shein will acquire a one-third stake of Forever 21’s operating company, Sparc Group. In return, Sparc will gain a minority stake in Shein. The Chinese company will start selling Forever 21 apparel online, while Forever 21 will take Shein products to its physical stores. (Wall Street Journal $)
7. DiDi, the troubled Chinese ride-hailing giant, is selling its electric-vehicle business to XPeng, a Chinese EV company. (Reuters $)
Currently, there are over 2,700 online hospitals in China, where people can get diagnoses and prescriptions completely online. Because many of these platforms are able to come up with a prescription in less than two minutes, there’s widespread suspicion that they are risking patient health by relying on ChatGPT-like models.
Last week, the industry was put on notice after Beijing’s Municipal Health Commission drafted a new regulation to ban AI-generated prescriptions. According to Sailing Health, a Chinese medical news publication, the city-wide regulation repeats and reinforces a March 2022 national policy that instituted the same kind of ban, but the new proposal comes at a time when people have started to see what large language models are capable of and when a few tech platforms have already started experimenting with medical AI.
Following news of the new proposal, JD Health, one of the leading digital health-care platforms in China, told the publication that its AI features are currently used only to match patients with doctors and help doctors increase productivity. Medlinker, a Chinese internet startup that announced an AI product in May, responded that the product, called MedGPT, is still in internal testing and hasn’t been used in any external services.
NBA star James Harden was having a lot of fun during a recent trip to China. When Harden promoted his new wine brand on the Douyin livestream e-commerce channel of Chinese influencer Crazy Young Brother, he was shocked that the first batch of 10,000 bottles (sold in bundles of two for $60) sold out in only 14 seconds. After a second batch of 6,000 bottles also sold out in seconds, Harden was so excited that he did a cartwheel in the back of the room.
James Harden is having the time of his life in China
— NBACentral (@TheDunkCentral) August 16, 2023sold 10,000 bottle of wine in 5 secs
pic.twitter.com/lGQKWp8Hhd
Intel's long-running game performance analyzer has been updated with a fresh look and new features. Is it worth using? Let's dive in and see how we can use PresentMon to analyze our gaming PCs.

Google DeepMind has launched a new watermarking tool that labels whether images have been generated with AI.
The tool, called SynthID, will initially be available only to users of Google’s AI image generator Imagen, which is hosted on Google Cloud’s machine learning platform Vertex. Users will be able to generate images using Imagen and then choose whether to add a watermark or not. The hope is that it could help people tell when AI-generated content is being passed off as real, or help protect copyright.
In the past year, the huge popularity of generative AI models has also brought with it the proliferation of AI-generated deepfakes, nonconsensual porn, and copyright infringements. Watermarking—a technique where you hide a signal in a piece of text or an image to identify it as AI-generated—has become one of the most popular policy suggestions to curb such harms.
In July, the White House announced it had secured voluntary commitments from leading AI companies such as OpenAI, Google, and Meta to develop watermarking tools in an effort to combat misinformation and misuse of AI-generated content.
At Google’s annual conference I/O in May, CEO Sundar Pichai said the company is building its models to include watermarking and other techniques from the start. Google DeepMind is now the first Big Tech company to publicly launch such a tool.
Traditionally images have been watermarked by adding a visible overlay onto them, or adding information into their metadata. But this method is “brittle” and the watermark can be lost when images are cropped, resized, or edited, says Pushmeet Kohli, vice president of research at Google DeepMind.
SynthID is created using two neural networks. One takes the original image and produces another image that looks almost identical to it, but with some pixels subtly modified. This creates an embedded pattern that is invisible to the human eye. The second neural network can detect the pattern and will tell users whether it detects a watermark, suspects the image has a watermark, or finds that it doesn’t have a watermark. Kohli said SynthID is designed in a way that means the watermark can still be detected even if the image is screenshotted or edited—for example, by rotating or resizing it.
Google DeepMind is not the only one working on these sorts of watermarking methods, says Ben Zhao, a professor at the University of Chicago, who has worked on systems to prevent artists’ images from being scraped by AI systems. Similar techniques already exist and are used in the open-source AI image generator Stable Diffusion. Meta has also conducted research on watermarks, although it has yet to launch any public watermarking tools.
Kohli claims Google DeepMind’s watermark is more resistant to tampering than previous attempts to create watermarks for images, although still not perfectly immune.
But Zhao is skeptical. “There are few or no watermarks that have proven robust over time,” he says. Early work on watermarks for text has found that they are easily broken, usually within a few months.
Bad actors have a vested interest in disrupting watermarks, he adds—for example, to claim that deepfaked content is genuine photographic evidence of a nonexistent crime or event.
“An attacker seeking to promote deepfake imagery as real, or discredit a real photo as fake, will have a lot to gain, and will not stop at cropping, or lossy compression or changing colors,” Zhao says.
Nevertheless, Google DeepMind’s launch is a good first step and could lead to better information-sharing in the field about which techniques work and which don’t, says Claire Leibowicz, the head of the AI and Media Integrity Program at the Partnership on AI.
“The fact that this is really complicated shouldn’t paralyze us into doing nothing,” she says.
Kohli told MIT Technology Review the watermarking tool is “experimental” and said the company wants to see how people use it and learn about its strengths and weaknesses before rolling it out more widely. He refused to say whether Google DeepMind might make the tool more widely available for images other than ones generated by Imagen. He also did not say whether Google will add the watermark to its AI image generation systems.
This limits its usefulness, says Sasha Luccioni, an AI researcher at startup Hugging Face. Google’s decision to keep the tool proprietary means only Google will be able to both embed and detect these watermarks, she adds.
“If you add a watermarking component to image generation systems across the board, there will be less risk of harms like deepfake pornography,” Luccioni says.

For two decades, Google Search was the invisible force that determined the ebb and flow of online content. Now, for the first time, its cultural relevance is in question.
The first thing ever searched on Google was the name Gerhard Casper, a former Stanford president. As the story goes, in 1998, Larry Page and Sergey Brin demoed Google for computer scientist John Hennessy. They searched Casper’s name on both AltaVista and Google. The former pulled up results for Casper the Friendly Ghost; the latter pulled up information on Gerhard Casper the person.
What made Google’s results different from AltaVista’s was its algorithm, PageRank, which organized results based on the amount of links between pages. In fact, the site’s original name, BackRub, was a reference to the backlinks it was using to rank results. If your site was linked to by other authoritative sites, it would place higher in the list than some random blog that no one was citing.
Google officially went online later in 1998. It quickly became so inseparable from both the way we use the internet and, eventually, culture itself, that we almost lack the language to describe what Google’s impact over the last 25 years has actually been. It’s like asking a fish to explain what the ocean is. And yet, all around us are signs that the era of “peak Google” is ending or, possibly, already over.
There is a growing chorus of complaints that Google is not as accurate, as competent, as dedicated to search as it once was. The rise of massive closed algorithmic social networks like Meta’s Facebook and Instagram began eating the web in the 2010s. More recently, there’s been a shift to entertainment-based video feeds like TikTok — which is now being used as a primary search engine by a new generation of internet users.
For two decades, Google Search was the largely invisible force that determined the ebb and flow of online content. Now, for the first time since Google’s launch, a world without it at the center actually seems possible. We’re clearly at the end of one era and at the threshold of another. But to understand where we’re headed, we have to look back at how it all started.
If you’re looking for the moment Google truly crossed over into the zeitgeist, it was likely around 2001. In February 2000, Jennifer Lopez wore her iconic green Versace dress to the Grammys, which former Google CEO Eric Schmidt would later say searches for inspired how Google Image Search functioned when it launched in summer 2001. That year was also the moment when users began to realize that Google was important enough to hijack.
The term “Google bombing” was first coined by Adam Mathes, now a product manager at Google, who first described the concept in April 2001 while writing for the site Uber.nu. Mathes successfully used the backlinks that fueled PageRank to make the search term “talentless hack” bring up his friend’s website. Mathes did not respond to a request for comment.
A humor site called Hugedisk.com, however, successfully pulled it off first in January 2001. A writer for the site, interviewed under the pseudonym Michael Hugedisk, told Wired in 2007 that their three-person team linked to a webpage selling pro-George W. Bush merchandise and was able to make it the top result on Google if you searched “dumb motherfucker.”
“One of the other guys who ran the site got a cease and desist letter from the bombed George Bush site’s lawyers. We chickened out and pulled down the link, but we got a lot of press,” Hugedisk recounted.
“It’s difficult to see which factors contribute to this result, though. It has to do with Google’s ranking algorithm,” a Google spokesperson said of the stunt at the time, calling the search results “an anomaly.”
But it wasn’t an anomaly. In fact, there’s a way of viewing the company’s 25-year history as an ongoing battle against users who want to manipulate what PageRank surfaces.
“[Google bombing] was a popular thing — get your political enemy and some curse words and then merge them in the top Google Image resolve and sometimes it works,” blogger Philipp Lenssen told The Verge. “Mostly for the laughs or giggles.”
Lenssen still remembers the first time he started to get a surge of page views from Google. He had been running a gaming site called Games for the Brain for around three years without much fanfare. “It was just not doing anything,” he told The Verge. “And then, suddenly, it was a super popular website.”
It can be hard to remember how mysterious these early run-ins with Google traffic were. It came as a genuine surprise to Lenssen when he figured out that “brain games” had become a huge search term on Google. (Even now, in 2023, Lenssen’s site is still the first non-sponsored Google result for “brain games.”)
“Google kept sending me people all day long from organic search results,” he said. “It became my main source of income.”
Rather than brain games, however, Lenssen is probably best known for a blog he ran from 2003 to 2011 called Google Blogoscoped. He was, for a long time, one of the main chroniclers of everything Google. And he remembers the switch from other search engines to Google in the late 1990s. It was passed around by word of mouth as a better alternative to AltaVista, which wasn’t the biggest search engine of the era but was considered the best one yet.
In 2023, search optimization is a matter of sheer self-interest, a necessity of life in a Google-dominated world. The URLs of new articles are loaded with keywords. YouTube video titles, too — not too many, of course, because an overly long title gets cut off. Shop listings by vendors sprawl into wordy repetition, like side sign spinners reimagined as content sludge. And it goes beyond just Google’s domain. Solid blocks of blue hashtags and account tags trail at the end of influencer Instagram posts. Even teenagers tag their TikToks with #fyp — a hashtag thought to make it more likely for videos to be gently bumped into the algorithmic feeds of strangers.
The word SEO “kind of sounds like spam when you say it today,” said Lenssen, in a slightly affected voice. “But that was not how it started.”
/cdn.vox-cdn.com/uploads/chorus_asset/file/24874223/236755_When_Google_ruled_the_world_MRohn_002.jpeg)
To use the language of today, Lenssen and his cohort of bloggers were the earliest content creators. Their tastes and sensibilities would inflect much of digital media today, from Wordle to food Instagram. It might seem unfathomable now, but unlike the creators of 2023, the bloggers of the early 2000s weren’t in a low-grade war with algorithms. By optimizing for PageRank, they were helping Google by making it better. And that was good for everyone because making Google better was good for the internet.
This attitude is easier to comprehend when you look back at Google’s product launches in these early years — Google Groups, Google Calendar, Google News, Google Answers. The company also acquired Blogger in 2003.
“Everything was done really intelligently, very clean, very easy to use, and extremely sophisticated,” said technologist Andy Baio, who still blogs at Waxy.org. “And I think that Google Reader was probably the best, like one of the best, shining examples of that.”
“Everybody I knew was living off Google Reader,” recalled Scott Beale of Laughing Squid.
Google Reader was created by engineer Chris Wetherell in 2005. It allowed users to take the RSS feeds — an open protocol for organizing a website’s content and updates — and add those feeds into a singular reader. If Google Search was the spinal cord of 2000s internet culture, Google Reader was the central nervous system.
“They were encouraging people to write on the web,” said Baio. Bloggers like Lenssen, Baio, and Beale felt like everything Google was doing was in service of making the internet better. The tools it kept launching felt tied to a mission of collecting the world’s information and helping people add more content to the web.
Many of these bloggers feel differently now. Lenssen said he now sees SEO as more or less part of the same nefarious tradition as Google bombing. “You want a certain opinion to be in the number one spot, not as a meme but to influence people,” he said. Most of the other bloggers expressed a similar change of heart in interviews for this piece.
“When Google came along, they were ad-free with actually relevant results in a minimalistic kind of design,” Lenssen said. “If we fast-forward to now, it’s kind of inverted now. The results are kind of spammy and keyword-built and SEO stuff. And so it might be hard to understand for people looking at Google now how useful it was back then.”
But there is one notable holdout among these early web pioneers: Danny Sullivan, who, during this period, became the world’s de facto expert on all things search. (Which, after the dawn of the millennium, increasingly just became Google Search.) Sullivan’s expertise gives his opinion some weight, though there is one teeny little wrinkle — since 2017, he’s been an employee of Google, working as the company’s official search liaison. Which means even if he doesn’t think they are, his opinions about search now have to be in line with Google’s opinions about search.
According to Sullivan, the pattern of optimizing for search predates Google — it wasn’t the first search engine, after all. As early as 1997, people were creating “doorway pages” — pages full of keywords meant to trick web crawlers into overindexing a site.
More crucially, Sullivan sees Google Search not as a driver of virality but as a mere echo.
“I just can’t think of something that I did as a Google search that caused everybody else to do the same Google search,” Sullivan said. “I can see that something’s become a meme in some way. And sometimes, it could even be a meme on Google Search, like, you know, the Doodles we do. People will say, ‘Now you got to go search for this; you’ve got to go see it or whatever.’ But search itself doesn’t tend to cause the virality.”
Those hundreds of millions of websites jockeying for placement on the first page of results don’t influence how culture works, as Sullivan sees it. For him, Google Search activity does not create more search activity. Decades may have passed, but people are essentially still searching for “Jennifer Lopez dress.” Culture motivates what goes into the search box, and it’s a one-way street.
But causality is both hard to prove and disprove. The same set of facts that leads Sullivan to discount the effect of Google on culture can just as readily point to the opposite conclusion.
In February 2001, right after Hugedisk’s Google bomb, Google launched Google Groups, a discussion platform that integrated with the internet’s first real social network, Usenet. And that same month, what is largely considered to be the first real internet meme, “All Your Base Are Belong To Us,” was launched into the mainstream after years of bouncing around as a message board inside joke. It became one of the largest search trends on Google, and an archived Google Zeitgeist report even lists the infamous mistranslated video game cutscene as one of the top searches in February 2001.
Per Sullivan’s logic, Google Groups added better discovery to both Usenet and the myriad other message boards and online communities creating proto-meme culture at the time. And that discoverability created word-of-mouth interest, which led to search interest. The uptick in searches merely reflected what was happening outside of Google.
But you can just as easily conclude that Google — in the form of Search and Groups — drove the virality of “All Your Base Are Belong To Us.”
“All Your Base Are Belong To Us” had been floating around message boards as an animated GIF as early as 1998. But after Google went live, it began mutating the way modern memes do. A fan project launched to redub the game, the meme got a page on Newgrounds, and most importantly, the first Photoshops of the meme showed up in a Something Awful thread. (Consider how much harder it would have been, pre-Google, to find the assets for “All Your Base Are Belong To Us” in order to remix them.)
That back and forth between social and search would create pathways for, and then supercharge, an online network of independent publishers that we now call the blogosphere. Google’s backlink algorithm gave a new level of influence to online curation. The spread of “All Your Base Are Belong To Us” — from message boards, to search, to aggregators and blogs — set the stage for, well, how everything has worked ever since.
SEO experts like Sullivan might rankle at the idea that Google’s PageRank is a social algorithm, but it’s not not a social mechanism.
We tend to think of “search” and “social” as competing ideas. The history of the internet between the 2000s and the 2010s is often painted as a shift from search engines to social networks. But PageRank does measure online discussion, in a sense — and it also influences how discussion flows. And just like the algorithms that would eventually dominate platforms like Facebook years later, PageRank has a profound effect on how people create content.
Alex Turvy, a sociologist specializing in digital culture, said it’s hard to map our current understanding of virality and platform optimization to the earliest days of Google, but there are definitely similarities.
“I think that the celebrity gossip world is a good example,” he said. “Folks that understood backlinks and keywords earlier than others and were able to get low-quality content pretty high on search results pages.”
He cited examples such as Perez Hilton and the blogs Crazy Days and Nights and Oh No They Didn’t! Over the next few years, the web began to fill with aggregators like eBaum’s World, Digg, and CollegeHumor.
But even the creators of original high-quality content were not immune to the pressures of Google Search.
Deb Perelman is considered one of the earliest food bloggers and is certainly one of the few who’s still at it. She started blogging about food in 2003. Her site, Smitten Kitchen, was launched in 2006 and has since spawned three books. In the beginning, she says, she didn’t really think much about search. But eventually, she, like the other eminent bloggers of the period, took notice.
“It was definitely something you were aware of — your page ranking — just because it affected whether people could find your stuff through Google,” she said.
It’s hard to find another sector more thoroughly molded by the pressures of SEO than recipe sites, which, these days, take a near-uniform shape as an extremely long anecdote (often interspersed with ads), culminating in a recipe card that is remarkably terse in comparison. The formatting and style of food bloggers has generated endless discourse for years.
The reason why food blogs look like that, according to Perelman, is pretty straightforward: the bloggers want to be read on Google.
That said, she’s adamant that most of the backlash against food bloggers attaching long personal essays to the top of their posts is obnoxious and sexist. People can just not read it if they don’t want to. But she also acknowledged writers are caving to formatting pressures. (There are countless guides instructing that writers use a specific amount of sentences per paragraph and a specific amount of paragraphs per post to rank better on Google.)
“Rather than writing because there was maybe a story to tell, there was this idea that it was good for SEO,” she said. “And I think that that’s a less quality experience. And yeah, you could directly say I guess that Google has sort of created that in a way.”
Sullivan says PageRank’s algorithm is a lot simpler than most people assume it is. At the beginning, most of the tips and tricks people were sharing were largely pointless for SEO. The subject of SEO is still rife with superstition. There are a lot of different ideas that people have about exactly how to get a prominent spot on Google’s results, Sullivan acknowledges. But most of the stuff you’ll find by, well, googling “SEO tricks” isn’t very accurate.
And here is where you get into the circular nature of his argument against Google’s influence. Thousands of food bloggers are searching for advice on how to optimize their blogs for Google. The advice that sits at the top of Google is bad, but they’re using it anyway, and now, their blogs all look the same. Isn’t that, in a sense, Google shaping how content is made?
/cdn.vox-cdn.com/uploads/chorus_asset/file/24874235/236755_When_Google_ruled_the_world_MRohn_003.jpeg)
“All Your Base Are Belong To Us” existed pre-Google but suddenly rose in prominence as the search engine flickered on. Other forms of content began following the same virality curve, rocketing to the top of Google and then into greater pop culture.
Perelman said that one of the first viral recipes she remembers from that era was a 2006 New York Times tutorial on how to make no-knead bread by Sullivan Street Bakery’s Jim Lahey. “That was a really big moment,” she said.
True to form, Sullivan doubts that it was search, itself, that made it go viral. “It almost certainly wasn’t hot because search made it hot. Something else made it hot and then everybody went to search for it,” he said.
(Which may be true. But the video tutorial was also published on YouTube one month after the site was purchased by Google.)
The viral no-knead bread recipe is a perfect example of how hard it can be to separate the discoverability Google brought to the internet from the influence of that discoverability. And it was even harder 20 years ago, long before we had concepts like “viral” or “influencer.”
Alice Marwick, a communications professor and author of The Private Is Political: Networked Privacy and Social Media, told The Verge that it wasn’t until Myspace launched in 2003 that we started to even develop the idea of internet fame.
“There wasn’t like a pipeline for virality in the way that it is,” she said. “Now, there is a template of, like, weird people doing weird stuff on the internet.”
Marwick said that within the internet landscape of the 2000s, Google was the thing that sat on top of everything else. There was a sense that as anarchic and chaotic as the early social web was out in the digital wilderness, what Google surfaced denoted a certain level of quality.
But if that last 25 years of Google’s history could be boiled down to a battle against the Google bomb, it is now starting to feel that the search engine is finally losing pace with the hijackers. Or as Marwick put it, “Google has gotten shittier and shittier.”
“To me, it just continues the transformation of the internet into this shitty mall,” Marwick said. “A dead mall that’s just filled with the shady sort of stores you don’t want to go to.”
The question, of course, is when did it all go wrong? How did a site that captured the imagination of the internet and fundamentally changed the way we communicate turn into a burned-out Walmart at the edge of town?
Well, if you ask Anil Dash, it was all the way back in 2003 — when the company turned on its AdSense program.
“Prior to 2003–2004, you could have an open comment box on the internet. And nobody would pretty much type in it unless they wanted to leave a comment. No authentication. Nothing. And the reason why was because who the fuck cares what you comment on there. And then instantly, overnight, what happened?” Dash said. “Every single comment thread on the internet was instantly spammed. And it happened overnight.”
Dash is one of the web’s earliest bloggers. In 2004, he won a competition Google held to google-bomb itself with the made-up term “nigritude ultramarine.” Since then, Dash has written extensively over the years on the impact platform optimization has had on the way the internet works. As he sees it, Google’s advertising tools gave links a monetary value, killing anything organic on the platform. From that moment forward, Google cared more about the health of its own network than the health of the wider internet.
“At that point it was really clear where the next 20 years were going to go,” he said.
Google Answers closed in 2006. Google Reader shut down in 2013, taking with it the last vestiges of the blogosphere. Search inside of Google Groups has repeatedly broken over the years. Blogger still works, but without Google Reader as a hub for aggregating it, most publishers started making native content on platforms like Facebook and Instagram and, more recently, TikTok.
Discoverability of the open web has suffered. Pinterest has been accused of eating Google Image Search results. And the recent protests over third-party API access at Reddit revealed how popular Google has become as a search engine not for Google’s results but for Reddit content. Google’s place in the hierarchy of Big Tech is slipping enough that some are even admitting that Apple Maps is worth giving another chance, something unthinkable even a few years ago.
On top of it all, OpenAI’s massively successful ChatGPT has dragged Google into a race against Microsoft to build a completely different kind of search, one that uses a chatbot interface supported by generative AI.
Twenty-five years ago, at the dawn of a different internet age, another search engine began to struggle with similar issues. It was considered the top of the heap, praised for its sophisticated technology, and then suddenly faced an existential threat. A young company created a new way of finding content.
Instead of trying to make its core product better, fixing the issues its users had, the company, instead, became more of a portal, weighted down by bloated services that worked less and less well. The company’s CEO admitted in 2002 that it “tried to become a portal too late in the game, and lost focus” and told Wired at the time that it was going to try and double back and focus on search again. But it never regained the lead.
That company was AltaVista.