Data replication model, and fast takedown of nasty servers #50

joelverhagen · 2025-05-20T00:34:16Z

joelverhagen
May 20, 2025

Pre-submission Checklist

I have checked that this question would not be more appropriate as an issue in a specific repository
I have searched existing discussions and documentation for answers

Question Category

Your Question

Hey folks! I am on t he NuGet.org and VS Marketplace team. I'm really excited to see this project and read about your new registry. I have a bunch of thoughts swirling in my mind, but I'll start with just one for now.

It looks like your registry is going to be mainly "data upstream" and end users will interact directly with consumers of your registry not with your registry yourself (except perhaps server authors who publish to you). Apologies if I misunderstand that.

In the event of an unsafe/malicious/illegal MCP entry in your registry, I presume you will delete it on the backend, then subsequent calls to GET /v0/servers will not show the delete server signaling to the "middle layer" that a delete occurred.

It seems like the "middle layer" needs to poll pretty frequently to reduce the "time-to-mitigate" to the smallest time possible.

It seems like being a "data upstream" puts you in a different position than other public registries, w.r.t. to your downstreams. It's like you and your downstreams are collectively responsible for keeping the ecosystem clean, instead of that responsibility solely resting on you, the central registry.

Am I understanding this right? Or do you envision takedowns happening in some other way?

The reason I ask is that the trustworthiness of a public registry is one of the primary "value adds" it can provide to a new or even established ecosystem. I hope that this data replication model does not yield problems when server deletion (rather than add/updates) need to happen fast, but perhaps don't due to something outside of your control (a slow downstream).

As an aside, I wonder if you plan on providing any transparency on the deletions that have occurred, such as an event log that can be followed without polling the entire server list each time.

tadasant · 2025-05-21T16:20:58Z

tadasant
May 21, 2025
Maintainer

Thanks for the question! It's a good topic.

It looks like your registry is going to be mainly "data upstream" and end users will interact directly with consumers of your registry not with your registry yourself (except perhaps server authors who publish to you). Apologies if I misunderstand that.

This is correct.

In the event of an unsafe/malicious/illegal MCP entry in your registry, I presume you will delete it on the backend, then subsequent calls to GET /v0/servers will not show the delete server signaling to the "middle layer" that a delete occurred.

Our thinking (perhaps naive / insufficient - very open to feedback and poking holes) is that because we are not hosting source code, the responsibility for pulling/deleting malicious code actually falls to npm/pypi/GHCR/etc.

To that end, we do need some feature that removes packages that have been pulled from the npm/pypi registry, but it's not a direct security risk because if someone tries to pull the source code from that other registry in the meantime, it will already be gone. In this way, our "unsafe/malicious/illegal" package management guarantee is basically equivalent to what npm/pypi/etc collectively provide.

That said... I can think of a few vectors we might need to think more on:

Remote servers. If a server just has a remote URL, and it's proven to be malicious, we don't have a third party around to manage that for us.
Even if it's not a security risk, we do need to think about the bad UX of potentially referencing deleted packages

So, we probably do need some notion of being able to manage this centrally even if npm/pypi/etc help with part of the issue. Because we have no long term operational resourcing in place for this project, it would be good to figure out a way to do this in an automated or community-driven way. e.g. perhaps there is a way for community members to report these problems, or some vendor the project could lean on to help. Open to suggestions on how to power this mechanism.

As an aside, I wonder if you plan on providing any transparency on the deletions that have occurred, such as an event log that can be followed without polling the entire server list each time.

No design has been suggested on this but I think it'd be a reasonable addition to the roadmap.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Data replication model, and fast takedown of nasty servers #50

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Data replication model, and fast takedown of nasty servers #50

Uh oh!

joelverhagen May 20, 2025

Pre-submission Checklist

Question Category

Your Question

Replies: 1 comment

Uh oh!

tadasant May 21, 2025 Maintainer

joelverhagen
May 20, 2025

tadasant
May 21, 2025
Maintainer