-
Notifications
You must be signed in to change notification settings - Fork 213
chore: add tag code #380
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
chore: add tag code #380
Conversation
6c41886
to
e99e05f
Compare
Mind elaborating how useful Is the idea to store this information in routing systems like As noted in comment on your exploration document, I'm skeptical
|
Thanks for raising these questions @lidel . I will keep the discussion here as well for https://github.com/vasco-santos/provider-hinted-uri/pull/1/files#r2105422391 to make it easier. This is not as critical as supporting a multiaddr as a provider, but in my opinion there is value in this optional add on. I will try to make it clear in this answer. Let's start with the broader improvement this intends to bring:
The main goal of having clients understanding a
Building on this direction, where we try to create a fast lane for clients to try to retrieve some trustless content, we MAY end up in situations where despite a provider providing this content, it may provide it in ways that the client does not know how to communicate. Let's consider that a provider only talks Graphsync, the client won't know how to talk graphsync and this fast lane is actually meaning more requests and latency, and becomes a slow lane. Moreover, the client may have protocol preference and if there are multiple providers with different protocols, it makes it easier to sort prioritization right away. With that in mind, I think having a way to express retrieval protocol can definitely avoid extra probing/lookups. I totally agree that it is helpful to be more granular than just
This is a fair question. I know this is the current behaviour of client and that there are probings in the spec that allow the client to make sure the host behind such multiaddr can really serve this content. However, we are again facing a pattern where extra requests are required. If we try to create a routing system encoded in the URI that reduces indirection layers and hops to the bare minimum, we are trying to make this not only be for HTTP, but also for other protocols like the ones today used via libp2p. Given we try to accommodate the more general case, does it make sense that we try to find an encoding pattern that we can use across the board?
I thought about libp2p identify and using that name, but we already have protocols that go outside of libp2p route (HTTP as described above), so I assume having a separate dictionary that is a superset of libp2p is critical? That feels better ergonomics to me than having two different ways of expressing this. Thus Also to be extra clear, we CAN and probably SHOULD encode the version, the exploration table is a simple example.
For now the general idea with this exploration work is to give content publishers the power to encode this in URIs, so that smart clients like
I mean just the trustless gateway spec in its current shape, but opening the door to improvements. Though, I am still trying to figure out what would fit better here, and maybe even separating by CAR and BLOCK support could actually be a good call? @lidel can you see more merit in being able to encode protocols that some host accepts to retrieve some content ? |
Thank you for clarification. Yes, I do see merit in removing need for libp2p identify and/or extra HTTP probing. Some fresh thoughts below + would like to hear from @rvagg and @aschmahmann as my perspective may also be too biased towards positives (ad-hoc webseeds, being able to publish protocol hints that allow clients do skip libp2p identify roundtrips). On making this
|
If I understand correctly we'd have PeerInfo objects like:
The first address is not routable and needs special handling. Why not add a new field instead? It has been requested previously.
Protocols are already stored in the peer store so there would be no additional storage at rest, though DHT messages would get bigger. |
My understanding is BOTH require special handling, but IIRC if you add a new "protocol" field to DHT message it will be ignored by all exiting DHT servers until they update to software that can understand it, read and persists value in the peerstore. This means waiting 6-12-24 months until enough DHT servers update and are able to persist and gossip this new "protocols" field. Note that Amino DHT server update rate is worse than clients (hit To simplify conversation here, the choice IPFS ecosystem has is:
That is to say, if we are not doing to pivot this PR towards cc @guillaumemichel @aschmahmann @robin for visibility |
Aside/Process: IMO this discussion should really have happened in the multiaddr or libp2p/specs repo rather than here in the multicodec repo. Ideally the multicodec repo is just for handling registration which would occur AFTER there's more consensus / alignment between the people who would use the code. That will hopefully allow the registration process to be a little less opinionated and instead objective with questions like:
|
Overall this proposal seems to me like one that should be handled at the multiaddr and/or libp2p layer with feedback from those users. Fundamentally it's about putting more data into a multiaddr to save on round-trips and as a result you're likely to run into a bunch of the associated paper cuts from multiaddr being pretty neglected over the last several years. Some common examples around saving round-trips include:
A couple of the nearby problems it also touches are:
There's already been some questionable yolo-ing here in the last couple years such as with FWIW I'm not trying to bring up the thorniness (and ancientness) of some of the issues above to scare anyone off, it's more of an indication that if this is a multiaddr related spec then it could be a good idea to try moving a little bit forwards towards solving these many years ignored issues. |
If we are to change DHT implementations to be upgradable/composable, we need to ship more changes anyway, so I wouldn't count a potential future Composable DHT as beneficiary of the While I understand that a
I would lean more toward (A), because clients (retrievers) could signal to the server which extra information/tags they are expecting, without forcing extra tags upon all retrievers. The adoption delay is currently long because content routing systems have been neglected over the last years. Investing into improving the content routing systems could make such protocol upgrades quick to ship. |
I like PS: note that |
022ebfc
to
3a6092a
Compare
Thanks for all the feedback. Followed up updated To make it clear, the main intention for this is usage in smart clients like
@lidel this is totally right and something I did not explicitly added because I did not feel this was the space to agree on how tag/retrieval names would be. But yes, I totally agree we MUST include versions and would be great to avoid keeping more dictionaries if possible
@darobin yes, but that can be "application" level problem. For instance, this can be a full application level param for some hash, or protocol hints that are part of a spec for smart content addressable clients. In other words, one can parse multiaddr get all the tags and see the ones it cares about. |
This PR adds a retrieval code as draft to hint what kind of retrieval protocols one host may run. e.g./ip4/.../p2p/qmfoo/retrieval/bitswap/retrieval/http
as discussed with @achingbrainThis PR adds a
tag
code for application purpose context, such as to pass hints of kind of retrieval protocols one host may run. e.g./ip4/.../p2p/qmfoo/tag/bitswap/tag/http
This PR does not intend to decide on the tag names and assumes is out of content here if
bitswap
//ipfs/bitswap/1.2.0
/ etc, and we should leave that to other discussion. Would be great to avoid needing for encode though.This is particularly interesting in the context of ipfs/specs#504 given it let's clients fetch content in a non interactive way relying on provider hints and (optionally) tags for them.
This enables us to be able to prioritize dialling to hosts that we know how to communicate with and avoid unnecessary dials.
More context: https://github.com/vasco-santos/provider-hinted-uri/blob/main/EXPLORATION.md