How to Spot AI Generated Content

Timothy Carter
January 17, 2023

Well, the robot takeover is finally here.

Today’s robots aren’t just building cars or cooking pasta; they’ve having full-blown conversations and writing articles (not unlike this one).

That’s right. Thanks to some major breakthroughs in the world of artificial intelligence (AI), we now have sophisticated tools capable of generating human-like text.

But there are also AI content detection tools as well.

Some of you aren’t surprised by this. After all, AI-written articles have been published in mainstream media sources for many years now. You’ve probably even read one of these AI-written articles without even realizing it.

The difference is the level of sophistication present. Previously, AI content generation tools were fundamentally limited to only producing articles on easily digestible topics, like stock reports or sports updates.

But these days, machine generated content is everywhere and covers everything.

And it’s practically indistinguishable from human-written content…

Or is it?

Let’s find out.

Table of Contents

What Is AI-Generated Content?

AI-generated content is any text, message, article, or another type of content produced by a machine learning algorithm. Typically, a user can enter a prompt, guiding the AI to write about a certain topic, asking you a question, or directing it to cover some specific event.

In response to the prompt, the AI comes to life and produces something readable, understandable, and hopefully, effective.

AI content creation has also been touted for its ability to scale content velocity for some of the biggest websites online.

OpenAI’s latest project, ChatGPT, is an example of this. In the organization’s own words, “We’ve trained a model called ChatGPT which interacts in a conversational way. The dialogue format makes it possible for ChatGPT to answer follow-up questions, admit its mistakes, challenge incorrect premises, and reject inappropriate requests.”

How does it work?

The ChatGPT language model uses both supervised learning and reinforcement learning, though it uses reinforcement learning more heavily, relying on human feedback to fine-tune itself. Basically, it observes and attempts to mimic examples of human language across a practically infinite number of contexts; then, it “interacts” with humans, who can guide it to more acceptable and desirable language outputs. With millions of tiny feedback loops helping the AI model “understand” language, it’s only a matter of time before it masters the use of language.

As we’ll see, this is not “true” mastery.

The AI in this context has no cerebral understanding of the subject matter, though it might appear that way to an outsider. The AI is not truly thinking about what it says, nor is it performing any advanced cognitive functionality in processing the topic.

Instead, the AI is simply observing and mimicking patterns that it sees replicated all over the web and in the prompts and responses of users it interacts with.

After a few billion examples, it becomes trivially easy for the AI to mimic conventional English sentence structures, using nouns, verbs, and adjectives completely appropriately.

After a few billion more examples, and some context-specific information, it can write up a short couple of paragraphs about why and how the Industrial Revolution happened.

Of course, ChatGPT is just one example of an AI generated content application. In fact, we’re poised to see an explosion of AI content generation tools in the next few years as entrepreneurs and disruptive innovators race to see who can come up with the most profitable application for this new technology.

We’ll likely see tools specifically geared for individual use cases, like generating news stories within a specific category, writing for SEO, writing college essays, and even generating business emails.

What a time to be a human writer. Or reader.

Why AI Generated Content Is a Problem

Leaving aside some tongue-in-cheek jokes I could make about my own job insecurity, it’s fair to say that AI generated content has the potential to be problematic, and in more than one area.

Consider this small selection of possibilities.

Academic misconduct. The world of academics is already freaking out about the possibility of students using AI to generate essays, responses for homework assignments, and more. If it’s impossible, or even difficult to tell the difference between an essay written by a student and one generated by a machine, how can we be sure we’re grading and rewarding students appropriately? Can you now get a degree in a field like English just because you know how to use ChatGPT somewhat effectively?
Content spam. Content spam is another potential problem. For years, the search engine optimization (SEO) industry has relied heavily on the work of human writers. Writing onsite content, offsite content, and building backlinks establishes the authority of a website and allows it to rank higher in search engines. And even with competent humans doing the writing, the web has been overloaded with aggressive content production. Everywhere you look, there are fluffy articles and promotional pieces providing minimal information but serving a purpose for SEO. The problem is only going to get worse when marketers can generate entire articles in seconds.
Inaccuracies and fake news. ChatGPT is specifically developed with safeguards to prevent it from being influenced by bias or reporting inaccurate information. But how reliable are these safeguards going to be? And could they conceivably apply to all AI content generation tools? In any case, inaccuracies and fake news are a legitimate concern.

AI Content Detection Tools

GPT-2 Output Detector from HuggingFace

GPT-2 Output Detector Demo by HuggingFace is open source, free and requires no registration to gain immediate access.

The lack of limitations often overloads the server so the tool doesn’t even work.

It’s very high quality and probably the best of the several we have outlined here.

GPT Writer AI Content Detecting Tool

GPT – Writer is a great, dependable AI detecting tool that can be accessed for free.

Although it proved to be slightly less reliable than GPT-2 in my experience, it was still able to successfully identify the fact that my content piece wasn’t crafted by a human being.

Content at Scale

AI Detector – Content at Scale is a dependable tool that has been of great help to our writer teams. Not only does it rate your content, but also provides you with human-like scores for each one so that you can aspire to reach higher percentages!

AI Content Detector

Although AI Content Detector is a free tool, it’s limited to only 200 words and I found this unreliable for comparison against the other tools.

Originality.AI

Originality.AI is actually a premium paid service that charges $0.01 per credit, but the solution searches for plagiarized work as well as detects artificial intelligence generated content.

When compared to other free tools, it is a standout choice for detecting AI-generated content. It offers convenient API access and the ability to save scan results – both of which guarantee reliability even when other options are unavailable.

How to Spot AI Generated Content: The High Level

So how can you spot AI generated content?

What makes it different from human written text?

We’ll start with the high-level approach.

You can attempt to detect AI generated content using tools or a manual approach. With the tool-based approach, you’ll need a specific application that’s been designed and programmed to identify and measure potential signals that a piece of content has been written by a machine. In the manual approach, you’ll use your own due diligence and common sense to do the work.

In both contexts, your success will depend on your ability to detect patterns. Remember, AI content generation tools may seem as creative and thoughtful as human beings, but they’re approach is extremely mathematical and based on existing patterns. Accordingly, the content they produce, when scrutinized, reveals the patterns they studied.

Noticing a single quirk or hallmark of AI generated content isn’t enough to definitively prove that a piece of content was written by a machine. But if you start noticing multiple hallmarks, and those signals are consistent across the entire piece, you can conclude that the piece was probably written by AI – or at least, that it was written by an incompetent human author.

AI Generated Content: Specific Tactics

Now let’s dig into more details.

Sure, you can use a tool or your own good judgment for identifying and detecting AI generated content. But what are you in these machines looking for, specifically?

These are the tactics you can use to discern the difference between content written by a machine and content written by a human:

Look for repetitious vocabulary.

AI writing tools base all of their output on patterns and averages across millions of different entries. They want to follow the most common, average rules they can, so they typically focus only on the most common words in the English language. In any piece of writing, whether it’s generated by an AI or a human, you’ll find specific words repeated over and over, like “the,” “and,” or “but.”

But in AI generated content, the repetition is much more apparent, and it applies to higher level vocabulary words as well. In a review of a restaurant written by a human, you might see words like “delicious,” “tasty,” “delectable,” “delightful,” “scrumptious,” “palatable,” or even “orgasmic.” An AI may only use 1-2 of these. The more colorful and diverse these descriptive words are, the more likely the content will have been written by a human. The more static and repetitive the vocabulary is, more likely the content will have been written by an AI.

Flag rare and very specific words.

Similarly, you can rule out the possibility that a piece of content was written by AI if you can find an ample selection of rare or very specific words. Most AI generation machines aren’t going to take a risk by using a word they only encountered once or twice in their millions of crawled documents. They’re going to stick to only the most commonly used words in the English language unless it’s absolutely necessary to deviate.

It would be pretentious of me to describe my business as bespoke or myself as erudite, though my kakorrhaphiophobia holds me back from doing so. You might argue these words are perfectly cromulent. But in any case, – you’ll never see a paragraph of text like this in an AI-written article.

Pay attention to phrasing.

Edward Tian, a 22-year-old senior at Princeton University, came up with a content detection tool or app that detects whether a swath of text was written by AI. One of its primary evaluative criteria is “burstiness.”

Simply put, burstiness is a characteristic of text marked by variation in sentence structure and is a tool use to detect AI content.

When human beings write something, they tend to use a very diverse mix of sentence lengths and patterns. There are short sentences. There are long sentences. There are sentences in between the two. As a demonstration of this, you can look at this very paragraph; the shortest sentence has only 4 words, while the longest has 26. You’re probably not going to find this diversity in content written by an AI.

Instead, the sentences tend to be similar and repetitive, following a blocky and (appropriately) robotic pattern.

Evaluate fluidity of language.

Fluidity of language is a concept that’s difficult to describe because it’s somewhat subjective. But most of us can tell the difference between a native speaker of English and someone who’s learning it for the first time, even if they speak cleanly and without any discernible accent. Why? We’ve spent our entire lives speaking, listening, reading, and writing in this language, so we’re intimately familiar with it. We understand the power of language and how it’s best used, so we’re capable of tapping into its power casually. AI tools can identify patterns in language and repeat those patterns, but because it doesn’t understand the meaning behind those patterns, it’s currently not able to accurately replicate fluidity.

How can you evaluate this and tell the difference between a piece written by an AI and one written by a human? Try to imagine the piece of content being read aloud by a human being. Does the person reading it seem comfortable, warm, and relatable? Or does something seem “off” about the way they’re speaking? Obviously, written text is flatter than conversational text, and some of us are naturally a bit robotic. But in combination with some of these other telltale signs, a lack of fluidity can be an indication of machine origin.

Consider the complexity.

Does the piece of content make you think? Does it challenge any of your existing notions? Does it introduce any concepts that are hard to understand? If any of these are true, the piece was probably written by a human.

Currently, content generation machines are excellent at repeating facts and reassembling pieces of text found throughout the web. But they’re terrible at coming up with novel ideas. AI content generators have practically no ability to challenge the status quo, deviate from mainstream opinion, question major assumptions, or think creatively. Only humans can do this.

It’s easiest to observe this difference when you’re reading a piece on a topic you truly understand, or a subject in which you’re considered an expert. You can almost immediately tell the difference between a true master of the subject material and someone regurgitating basic facts from textbooks. The more complex a piece is, the more likely it came from a human.

Scout for slang, idioms, and metaphors.

For now, it’s an exclusively human quality to be able to use language very casually and illustratively. Our current AI content generation tools either aren’t sophisticated enough or aren’t willing to take the risk to use slang, idioms, or metaphors.

I’m not going to embarrass myself by using Generation Z slang as an example. But think back to my example about how a human or machine reviewer would approach describing food at a restaurant; this is an illustrative example, and a simple one, but it’s probably still too complex to appear in the body of a piece of content written by an AI.

Count the typos.

Ironically, when people see typos and mistakes, they’re more likely to think it was AI-generated. That’s because we have this strange bias of assuming that humans are better than machines in every way. But in fact, the opposite is true.

AI algorithms are functionally perfect at replicating text, so if you find a spelling error, or a gross misuse of a vocabulary word, you can almost guarantee that it was written by a human.

In some ways, this is the most reliable signal that can tell you whether a piece of content was written by a human. Just as calculators never make numerical errors, AI content generators never make painfully simple typos.

I’d imagine that because of this, the next generation of AI content generation engines is going to include features that allow you to control imperfections; with the click of a button, you can guarantee that every article produced by your AI content generator includes at least one spelling mistake to artificially increase its authenticity.

We live in strange and ironic times.

If you’re looking for a bottom-line summary, it’s this: AI generated content is robotically repetitive, inartistic, and incapable of making simple errors. Slang, diverse vocabulary words, good metaphors, diverse sentence structures, complex ideas, and typos are all sure signs you’re reading something written by a real person.

How Important Is This?

In the film Blade Runner (and tons of similar inspired works like Westworld), one of the central themes is discerning what counts as personhood. If a replicant (an artificial person in the Blade Runner world) looks like a human, talks like a human, thinks like a human, and even feels like a human – is it really so important to label it as nonhuman?

I agree with the notion that if an AI can produce content that’s functionally identical to content produced by humans, it should be treated the same. It’s just as valuable and it’s just as illustrative. So realistically, if you struggle to immediately tell the difference between these two types of generated content, there’s no reason for you to jump through hoops or play Sherlock Holmes to solve the mystery of who wrote each piece of content you read from here on out.

But at the same time, I think it’s important to publicize and internalize an article like this, and for two main reasons:

It’s easy to tell the difference if you know what to look for. In Blade Runner, it’s exceedingly difficult to tell the difference between a person and a replicant. But this difficulty isn’t matched by modern comparisons of human-generated and AI-generated text. In fact, as a professional communicator with many years of experience, it’s trivially easy for me to point out AI-originated material. That’s not a brag; it’s an illustration of just how rudimentary these seemingly sophisticated tools currently are.

Think of it this way; if you’re relatively new to playing chess, you probably wouldn’t be able to tell the difference between a rudimentary AI hacked together by an experimenting teenager and Deep Blue, the landmark IBM supercomputer that beat grandmaster Gary Kasparov. But Gary Kasparov would have no trouble trouncing the rudimentary AI.

This is important because good chess players should strive to tell the difference between a lazy AI and Deep Blue. And good readers should strive to tell the difference between ChatGPT and an AI that surpasses the abilities of our best human writers (though, to be fair, ChatGPT is much closer to Deep Blue than the lazy AI in our example).

AI content generation tools have a place. I’ve spent a fair amount of time in this article disparaging the utility and performance of AI generated content, but the reality is, these AI tools do have a place. They could be incredibly helpful for teaching people, providing help, and enabling the development of new technologies nobody’s yet dreamed of. In the future, they may be able to match or exceed the artistry and illustrative prowess of Tolstoy or Shakespeare.

But we’re only going to push them to that next level if we’re critical and attentive to the tools we currently have. Pointing out the shortcomings of AI content generation is going to motivate developers of these tools to make up for those shortcomings in the future.

We can already see evidence of this. ChatGPT is described as being capable of “challeng[ing] incorrect premises, and reject[ing] inappropriate requests.” And I can’t help but wonder if these elements were introduced because of the disastrous failure of Tay, a Twitter-based AI chatbot Microsoft rolled out, or similar debuting technologies. Tay, for the record, was trained by trolls to become absurdly racist and offensive – in less than 24 hours, no less.

It’s our job as supporters of innovation to point out the flaws and weaknesses of current technologies so we can strive to develop something even better. Something that could truly change the world.

Did you notice what I did in that previous section?

An AI isn’t going to generate Blade Runner references in its writing to illustrate a point.

It’s also not going to make sardonic comments like that. Or use the word sardonic.

I’m all human, baby.

And while there are certainly some fascinating applications for AI writing both now and in the future, if you want to make the biggest impact with your content marketing and SEO strategy, you need human writers to do the heavy lifting.

Human writers can be experts, thought leaders, and persuasive, artful communicators.

And for now, an AI writer can’t match that.

If you need help uplifting your SEO, to create content that truly engages readers or other digital marketing strategies with human experts, you’ve come to the write place (pun use – yet another AI-impossible task). Contact us for a free consultation today!

Author
Recent Posts

Timothy Carter

Chief Revenue Officer at SEO Company

Industry veteran Timothy Carter is SEO.co’s Chief Revenue Officer. Tim leads all revenue for the company and oversees all customer-facing teams for SEO (search engine optimization) services - including sales, marketing & customer success. He has spent more than 20 years in the world of SEO & Digital Marketing, assisting in everything from SEO for lawyers to complex technical SEO for Fortune 500 clients like Wiley, Box.com, Qualtrics and HP.

Tim holds expertise in building and scaling sales operations, helping companies increase revenue efficiency and drive growth from websites and sales teams.

When he's not working, Tim enjoys playing a few rounds of disc golf, running, and spending time with his wife and family on the beach...preferably in Hawaii.

Over the years he's written for publications like Forbes, Entrepreneur, Marketing Land, Search Engine Journal, ReadWrite and other highly respected online publications. Connect with Tim on Linkedin & Twitter.

Latest posts by Timothy Carter (see all)

How to Rank for Local SEO in Multiple Locations - April 16, 2024
SEO for Mass Tort Lawyers: Everything You Need to Know - April 3, 2024
Natural Backlinks vs. Unnatural Backlinks: How to Build a Natural Link Profile - April 1, 2024