TVNewsCheck’s Michael Depp talks with Joe Murphy of Deep Brain AI, a technology company that’s creating digital copies of TV news anchors for outlets in China and South Korea, about how the tech works, the ethical issues around it and the likelihood that we’ll be seeing digital anchors on U.S. screens.

Imagine if a network or TV station could create an AI-based digital copy of its main anchor, allowing them to do a little pinch-hitting for parts of the job.

It’s already happening in South Korea and China, where South Korean company Deep Brain AI is working with four different networks on digital copy anchors reading news briefs.

In this Talking TV conversation, Joe Murphy, business development manager for Deep Brain AI, talks about the implementation there, how the underlying technology works and how its clients address potential ethical concerns around using digital copies. He also discusses whether U.S. broadcasters might get into the game.

Michael Depp: Deep Brain AI is a company that uses artificial intelligence to create digital twins of real people or completely new digital people. They’ve been working with broadcasters in both South Korea and China to create virtual versions of anchors there for automated news updates throughout the day.

I’m Michael Depp, editor of TVNewsCheck, and this is Talking TV, the podcast that brings you smart conversations about the business of broadcasting. Coming up, a conversation with Joe Murphy, business development manager of Deep Brain AI. The advent of this technology and its application for news opens up a raft of technical and ethical questions, and I’ll be asking some of them in just a moment.

Welcome, Joe Murphy, to Talking TV.

Joe Murphy: Hi, Michael. Thank you for having me. I’m excited to be here.

Glad for you to be here. Joe, first, obvious question: Why would any legitimate news organization ever do this, creating a virtual copy of one of their anchors in order to go on air?

Yeah, it to me, that seems like an obvious answer because as I’ve been talking to news agencies in North America, they’re all faced with the same challenge. Every year they’re challenged for doing more with less or not getting budget increases, but we need more content. And creating a digital twin or a virtual human of their franchise face or lead anchor seems like a slam dunk because you can now create more content, lower costs and you can create custom content faster and then get it out in different channels that maybe weren’t accessible before. So, it is really all about more, faster, better.

I mentioned [this technology in] South Korea and China. Where specifically are these things being used so far?

Deep Brain itself is headquartered in Seoul, Korea, and I’m actually part of a team of business development managers that are bringing this technology to North America. We have a head start in Asia with this technology. We have four networks, two in Korea and two in China that have worked with us to create a digital twin of their lead anchor. In Korea, it is MBN and Arirang. And then in China it’s BTV and CCTV. All four of these news stations are broadcasting anchors using technology from Deep Brain AI.

And they’re each using a single anchor at each network?

Yeah, at this time, they’ve each chosen to kind of take — their franchise face or their lead anchor and created a digital twin of that person. And we do see interest as from other anchors within their organizations of right now. It’s sticking pretty much with the franchise face.

Are these pilot projects, or how long have they been in play?

For the greater part of last year, so 2021, there have been anchors on Korea TV and then in late 2021, they started in China.

Now, as I understand it, you’re not trying to dupe viewers here. These virtual anchors are being labeled as such?

Yes. We’re not trying to dupe people and we’re not trying to replace people. Those are the two questions I get the most. I’ll say when the AI anchors are used, the news station puts up a symbol that says AI anchor, so the people know it looks like the lead anchor. It sounds like the lead anchor. But what’s being presented right now is actually the AI version of that anchor presenting the news to me.

And how is that presented? Is it a something on the bottom of the screen in the chyron?

Yes. Typically, it’s something on the bottom of the screen in the chyron. I did provide some footage for you folks, and you’ll see the actual English letters AI followed by some Korean symbols which are indicating this is the AI anchor and that’s kind of prominent on the screen during the presentation.

Having that kind of labeling, is that an ethical necessity as far as your company is concerned?

It’s a recommendation from our company, but at the end of the day, it’s a decision by the network and how they want to interact with their audience. I imagine it is a negotiation between the network and the talent, but it’s really outside of our scope. That kind of happens behind closed doors. We’re very happy to see that ethical and responsible way these are being used. But again, it’s not really our place to tell people how to do it.

Well, these are two very different markets already. South Korea is a democratic society. In China, it’s state-controlled media. It’s very much you know, it’s controlled by the party. So, were there different kinds of conversations? You say, those conversations were entirely internal for those organizations or are they back and forth with you at all?

The conversations between the network and the talent were we’re pretty much behind closed doors. We’re not privy to that information. I can say from an implementation point of view, the cloud infrastructure that was used and the on the balance of some is cloud, some is on premise. And not to get too technical here…

It’s OK, you can get technical.

So, I would say in the China market, they wanted a lot more done on premise. You can imagine everything is controlled at a central location, where in the Korean implementations it was more done in the cloud.

Just to come back to the ethics here of deploying … is there an ethical rulebook here in place or and if so, whose writing it? Are you as a company? Are you in discussion with other [media] branches? You’re think about moving this into the States. Who’s laying out the ethical guidelines?

I would say it’s a fast-growing industry with virtual humans and media on a collision course and we’re kind of learning as we’re going. There are ethical considerations, there are security considerations. But really at the end of the day is we view this as another tool for creating content. It’s a new tool and there are new questions about it, but it really is just a tool for creating new video content. Just as a video editor is a tool you might use in audio editors, it’s or you might use AI is another tool. And I think you see AI being applied in multiple spots throughout the video production process.

Absolutely it is. I mean, AI and machine are learning are huge parts of workflows now. But this is a whole different category. This lives in its own way. You talk about replicating a person and presenting or fabricating the person. It’s a whole different construct than the other applications, which are, I think, much more rote and not really controversial other than concerns people have about job elimination, potentially.

I want to get into the fact that this gets into some dangerous crossover territory with deepfake videos, which we’ve all seen, and which are widely employed in disinformation campaigns across the internet and social media. So, if this kind of technology widens in its legitimate use among news organizations, what kind of an opening do you think that creates for the further proliferation of deepfakes?

That’s a great question. I think I want to take a step back and say what we’re doing is very different than what deepfake technology is. The work we’re doing is complete video synthesis. So, we take a real person, we do a video shoot, and that video shoot is our training data to create an AI model of that person. They’ve opted in the entire way. Then when that model is created, it is tied to security within the cloud. And typically, portrait rights or face rights are extended to that model.

So, the station is legally OK’d to use that model for the intended purposes that are all contracted out. So, pure video synthesis, legal checks every step of the way. Making sure everybody is opted in and on board is what we’re working on at Deep Brain.

A deepfake starts with real video, so you need to shoot a live person and then you need to paste another live person’s face on top of the video you shot. So, already at first step, we’re different. In video synthesis technology, there is no shoot needed. We do one day shoot for a video training session, but after that, all the video that’s generated is completely AI generated. There’s no need to shoot.

Is that video that’s generated watermarked in some sort of way that you can authenticate it?

Yes, we can show through metadata that it came from our solution. And there are also checks and balances that we can put in even as simple as text filters that if a network wants it put in, it can limit what that AI model can say and cannot say.

Let’s get a little more into the weeds of how this comes together, technically. So, you mentioned you have the person, the talent, comes sits in a studio and you have them record. They read out any number of sentences while they’re being photographed and audio recorded?

Typically, we will prepare a script and that script will contain between 500 and 1,000 sentences or utterances. What we’re really trying to do with those sentences and utterances is learn how they move their mouths with all the different sounds and all the different words and the transitions from one word to the next, the pauses in between. So, using that script that we prepare, that’s the training data for our deep learning models.

And how do you do that? How are the cameras set up on the person’s face?

Typically, the talent is in front of a green screen. We shoot at about one or two meters away, one and a half meters away, head-on shot. And we try to get a head to foot, sort of top to bottom, full coverage. And we also have the option of shooting on angles. We have clients who want to switch angles during the presentation of the media, so we can do that as well. But in the most simplistic phase, it’s a straight head-on shot, full-body shot in front of a green screen.

So, they’re wearing one set of clothes, presumably during this shoot? Can you change their clothes like paper dolls in different iterations when they go on the air?

Yeah. So, typically when we do the shoot for the training day we will go through multiple outfits and multiple hairstyles. On our roadmap is the ability to change hairstyle and outfit without actually having to reshoot.

As you deploy this, does this twin use machine learning to kind of improve on its verisimilitude? Or is the thing that you get out of that session what you have going forward?

It’s the latter. The thing that we create, the model we create out of the session is then just an engine. It’s not continuously learning. It’s an engine that takes text in and exports video out. And that video out that it’s exporting, that’s where the deep learning was applied. How does this person speak? How do they move their mouth? How do they blink? When do they breathe? All of this is learned behavior that goes into the model. We can learn from that.

From that session of recording to when it when it can spit out a digital twin, how long is that process?

That’s a great question. It’s actually about three weeks of machine time.

The other side of this is your company also creates these digital people whole cloth, and you’ve actually made one of them for us at TVNewsCheck. Why don’t we take a quick look at that right now?

Great.

That is something else. What goes into making this wholly constructed person?

We start with pretty much a lot of the same processes of the deep learning. It’s just that the video going in, we use a different AI algorithm to construct a face for that person. So, we will start with a frame of a real person. But then take a face that’s completely synthetic and mesh those two together during the deep learning process.

You had a woman standing up in the studio? You were shooting her and you’re superimposing a different face onto her body?

It’s not really just one person. It’s an estimation of a lot of different people.

How many different kinds of avatars, if we can call them that, do you have? Is it an infinite number of different types of people, different genders, ages, races, etc.?

Yes. This is a little bit off topic, but we just did an NFT drop of 5,000 virtual humans in China, and it was a very successful launch for us. And now we have another 5,000 ready to go, and it really is nearly infinite in the amount of variations and virtual humans that we can create.

Hold on. What do you get when you buy an NFT of a virtual person? What is the product?

It was tied to what was the equivalent of Valentine’s Day in China. And you get the portrait of the person, and they were calling it their virtual boyfriend or virtual girlfriend. Now each one of those models can be linked to our software platform, which is called AI Studios. And if you choose to link it to AI Studios, you can sign up and you can create videos with that virtual person that you’ve just purchased through this NFT drop.

I don’t even … I’m processing this. It’s like Blade Runner to some extent.

Yeah, this was more of a fun experiment and the NFT market is very exciting for us, but it’s probably a little outside the scope of what we’re doing with news and media.

So back to that. Do you have any U.S. broadcasters who are kicking the tires here?

Yes. So, all the big names in the U.S. are kicking the tires right now. I think the U.S. in general is a little more cautious and they’re kind of seeing how this is playing out. But it’s really getting fast and rapid adoption throughout Asia. And in our mind, it’s coming very soon to the U.S. I can’t share too many details on that, but it will be here soon.

Well, those with whom you’re talking about this, what are they talking about for the potential implementation? The same thing as we see in Korea?

Yeah, I would say the primary use case is these short little segments shot throughout the day where the talent is busy working on a story or out in the field, but they need to get some updates out to the audience. So, every hour or so that the producers in the studio can create these clips and present these updates, here’s what we’re working on for tonight’s show or here’s the latest breaking news we’ll talk more this evening about. And so those little cut overs and segments are where we’re seeing is supplementing the content feed for the franchise face.

And so, in terms of where this is going to iterate next year, we looked at this example that you created for us and there’s sort of a bizarre nonhuman kind of reset that the woman does between her sentences. There’s, you know, there is a bit of kind of an “unnaturality” to it. How are you smoothing out the edges there?

Very observant of you. So that is a demo model that we use. The actual models we create for broadcast media are smoothed out with all those little things that you see. And the demo model is kind of a tuning process that we go through to get it ready for broadcast media. So, there is a bit of a performance difference, but when we go through the additional layer of tuning, that’s where we get to the side by side. It’s very difficult to determine which is the AI and which is the real person.

Well, only one’s mother can tell, or perhaps not even that. This is certainly something else. I’m very interested in feedback from the audience. If you have thoughts about the ethics, the technical side of implementing technology like this, what the what the implications could be for local and national U.S. media, I’d love to hear it. So, please do give us feedback.

That’s all the time we have, so we’ve got to leave it there. Thanks to Joe Murphy of Deep Brain AI for being here today. Thank you, Joe.

All right. Thank you, Michael.