Why is it so hard to get AI right?

Published 3 months ago • 3 min read

The internet in 2024, a three-act play

Act 1: Google strikes a deal to use Reddit posts as training data
Act 2: Google swaps in AI for its usual search results
Act 3: Google tells you to eat rocks and put glue in your pasta sauce

Why is it so hard to get AI right?

comic of google telling a woman "at least one small rock per day!"

Hey Siri, what's a HIPAA violation?

I had this conversation with the HR leader of a major hospital system last year, and I haven't stopped thinking about it since.

It was a few months after ChatGPT was broadly released, and she shared how enthusiastically everyone in her organization had embraced the tool. The hospital had convened a task force to look at AI, and they were busy thinking up powerful use cases to accelerate patient care.

Except that the doctors had already put an unsanctioned use case into motion, and it was enough to give the CISO a coronary: the doctors were dropping confidential patient notes into ChatGPT to write case summaries faster. Their AI firewall went up faster than you can say HIPAA violation.

Why is it so hard to get AI right?

Sparkle questions

You know those little AI sparkle questions that LinkedIn adds below most posts now? And sometimes they’re relevant, mostly they’re kind of weird, and sometimes they’re real clunkers?

I saw a post someone made about surviving cancer, and the sparkle question underneath it was, “How can cancer add value to our lives?” This is all kinds of yikes, so I did the obvious thing: I wrote a post about it. Unfortunately, LinkedIn appended another dicey sparkle question to my post about the original post.

screen shot of a linkedin post about this topic, with the question "how can cancer impact personal growth" appended by LinkedIn AI

Uh oh. Why is it so hard to get AI right?

Finding terrible AI examples is easy!

There are so many transformative, wonderful AI use cases and apps. But it's just not that hard to find these terrible examples, and so many of them feel like unforced errors. Popular and trusted apps silently updating their privacy policies to be able to use their user input as training data, without notifying users. People dumping sensitive information into free public websites. Developers racing to implement AI features without safeguards to ensure accuracy, privacy, or bias protections.

It's fair to say that I'm not a typical user; I spent a decade building proprietary technology to handle AI sensitivities at Textio, and it is very, very hard. After I saw the problematic LinkedIn questions, I started searching on topics to see if I could figure out when those little questions would trigger.

I discovered quickly that if I searched for terms like guns or Israel, there were no sparkle questions. In other words, I could clearly see that LinkedIn has implemented some "catch" topics where the system intercepts the query and doesn't take the risk of showing inappropriate questions. But, at least until my post, cancer wasn't one of them.

In just 30 minutes of rooting around, I found dozens of other topics that occasionally triggered highly insensitive sparkle questions: anorexia, abuse, adoption, layoffs, and immigration, to name just a few. Finding terrible examples was easy.

So why is it so hard to get AI right?

LinkedIn is not alone in their approach to trying to solve this problem; this kind of query filtering is what everyone does. Remember in 2023 when conservatives blasted OpenAI for the way ChatGPT talked differently about Joe Biden and Donald Trump? OpenAI "fixed" that the fastest way they could think of: by intercepting the political queries before processing them, and tossing up the equivalent of I can't do that, Hal.

Unfortunately, this never quite works, because you just can't write rules to catch all the ways that future queries might be problematic. I've written extensively about the racist ways in which ChatGPT describes alumni of Harvard University and Howard University differently, or about the stereotypical "roses are red" poems that the system writes for people of different backgrounds. It's easy to trip over the problematic biases in most AI just by wording queries a little differently than what app developers have planned for.

The problem can't be solved by manually intercepting people's queries. The issue is with the underlying generative engine and data set, so app developers who try to intercept queries end up trying to whack-a-mole mortifying examples forever. After all, nearly any topic can be sensitive or not depending on context.

The LinkedIn situation is better than average; a concerned exec saw my post and shared it with the product team, who replied quickly to my public post taking responsibility and removing the problematic sparkle question you can see in my original screen shot.

screenshot of LinkedIn product director apologizing for the inappropriate feature behavior

Like most teams implementing AI, the LinkedIn team would like to get it right. And like most teams implementing AI, the approach they're using to fix the offending features means that they'll never run out of problematic examples.

The LinkedIn case and the HIPAA violation case and the Google case are fundamentally similar. The problem isn't exactly with the technology. As with most technologies, the issue is with human judgment around its implementation and usage.

What do you think?

Thanks for reading!

Kieran

Catch up on nerd processor case studies | Subscribe | nerdprocessor.com

kieran@nerdprocessor.com
Unsubscribe · Preferences

Read more from nerd processor

AI pipe dream vs. AI reality

Selling AI sure is fun! Recently, I had a conversation with the founder of an AI startup who described an interesting dynamic: CEO of a traditional enterprise announces that every department should be modernizing operations with AI Every team rushes to investigate, including teams with limited tech expertise (think HR, finance, marketing, legal) Suddenly everyone wants to buy stuff, per the CEO mandate, so IT scrambles to assemble an AI review board, usually without experienced personnel...

4 days ago • 4 min read

Exclusive data: Survey of VCs for Kamala Harris

Surveying VCs for Kamala Harris Two weeks ago, Leslie Feinzaig, Founder and General Partner at Graham & Walker, approached me about running a survey for signatories of the VCs for Kamala Harris pledge. As Leslie led the effort to get investors engaged with the pledge, she saw that signatories included numerous Republicans and Independents as well as the Democrats you might expect. In order to accurately represent their perspective, she wanted an independent view into what this politically...

11 days ago • 5 min read

Hot take: Working for free

This is a great opportunity for you! A few times a month, someone asks me to work for free, and then is surprised when I ask about compensation. My favorite example happened back in 2019. It started with an email I got from a VC I hadn't met before. VC: "Hi! I've never met you. Will you help me?" VCs are always trying to get a look at potential investments, so cold requests to connect with CEOs are common. And as investors started ramping up their view into AI a few years back, I started to...

18 days ago • 4 min read