More privacy, better models: Discussing the future of zkML with Alex Pruden
May 11, 2023

More privacy, better models: Discussing the future of zkML with Alex Pruden

Some of us remember a time before internet encryption became standardized. Back then, online commerce was impossible because no one wanted to share their credit card information to be potentially snooped on and stolen. It was only the advent of the Secure Sockets Layer (SSL), an encryption security protocol developed and promoted by Netscape, which protected our personal information and made online shopping universally accepted.

Machine learning is, right now, a similar black box. Your data goes in — but what happens after that? Zero knowledge can enhance AI/ML in the same way that SSL made the web more usable by allowing us to feel comfortable putting private, highly personal information into these models and receive personalized outputs, without worrying about revealing that data to a third party.

We launched the zkML Initiative to encourage people to build the foundation of the new future of machine learning, one that’s more secure, private, and personal. As we prepare to welcome applicants into our Season One class, we spoke with Alex Pruden, CEO of Aleo, about his hope for zkML and the future of machine learning as a whole.

Why is zkML important for the future of machine learning? 

Machine learning is an amazing, revolutionary technology, and zero knowledge makes it even better from a human perspective. Through approaches such as federated learning, we can train models on aggregated personal data, without revealing an individual’s own personal data, but still prove its validity. The model ends up being more accurate because people would be willing to share more data in that situation. This means you can train models on better-quality datasets, which then yield more personalized and useful outputs. 

In your opinion, what’s the most exciting use case for zkML?

Healthcare data is a fascinating, exciting use case because there is so much regulation and personal sensitivity (understandably) around sharing our personal healthcare data. Prior attempts at solving this problem have been met with limited success. But by using zero-knowledge cryptography, we can imagine a system where a user can prove certain facts about themselves without revealing the underlying data. Users could also run a model on their own data, and provide the output to a federated learning model that aggregates the results of everyone's individual outputs.

If you could build anything for the zkML initiative, what would you build and why would it matter?

I think building a fitness app using zkML would be really cool. You could train a model using a bunch of people’s training regimes, and then prove using zero knowledge that you’ve achieved a certain benchmark without revealing how you did it.

But for the first season of the zkML initiative, the most important thing is the building blocks. I would love to see connections between Leo and the most popular Python libraries, such as SideKick or TensorFlow. This could give data scientists access to zero knowledge in their toolkit, without having to change their regular workflow.

Why does this initiative need to happen? What happens if it doesn’t?

There’s a huge arms race with AI right now, and I don’t think AI companies necessarily have people’s best interest in mind. I think it's important consumers of large language models understand the costs and actively advocate for a system that protects our data. 

If it doesn't happen, I think we risk no longer owning ourselves in a digital sense. Instead, you’re owned by a company. They don’t own you as a human, but for all intents and purposes, the online version of “you” would be owned — and potentially controlled — by someone else.

What’s your hope for this technology in the future?

My hope for it is that it gets more usable, it gets more performant, and people invest time and energy and making it better. I think this is early days, and just like the early days of encryption on the web, it might not be as user-friendly as we’d hope. But as people invested more time and energy and coming up with better techniques and better hardware, things got easier and easier.

Interested in creating the foundations of zkML? Aleo's open-source zkML transpiler bridges Python — one of the most popular programming languages for machine learning developers — and zero-knowledge cryptography.