Skip to content

Size of the typst/packages repository #2024

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
gasche opened this issue Mar 17, 2025 · 7 comments
Open

Size of the typst/packages repository #2024

gasche opened this issue Mar 17, 2025 · 7 comments

Comments

@gasche
Copy link
Contributor

gasche commented Mar 17, 2025

The typst/packages repository takes 1.9Gio on my machine with a current clone. As Typst gets more popular, its size will increase at a higher-than-linear speed, and there is a risk that it becomes painful in practice to operate with the package repository: at some point, people with low bandwidth will have trouble cloning the repository to contribute their own package.

The size of the repository is currently roughly:

  • 500MiB of git metadata
  • 1.4GiB of package data

(in particular, doing a shallow clone will not help much)

On my current checkout of the repository, there are

  • 585 packages in total
  • 343 packages which take less than 1Mio of disk space, they consume 117Mio in total
  • 219 that take between 1Mio and 10Mio, they consume 724Mio in total
  • 23 which take more than 10Mio, they consume 567Mio in total

In the short term, the following could work:

  • in packaging guidelines, encourage people to stick to small packages below 1Mio (maybe templates need different recommendations)
  • replace identical asset files by symbolic links, to avoid duplication of assets across different versions

Replacing identical asset files by symbolic links can be done by package authors if they are told how to do it, or by repository maintainers after the fact. (git already deduplicates its internal data, so it is not strictly necessary to do it at package-submission time.) A quick experiment suggests that doing this with the current repository should shrink its size from 1.4Gio to 947Mio, which is a sizeable win.

In the long term, I think that repository maintainers should maybe consider git-lfs or other options. The end goal would be that package authors do not need to download all other packages to submit theirs.

@elegaanz
Copy link
Member

We are well aware of this issue, and even if a shallow clone wouldn't help, a sparse checkout should normally keep the repository size smaller on your disk. As I said in #2007 I plan to rework the packaging guidelines at some point, and explanations on how to do that (as well as general tips on how to keep a package small) will definitely be part of it.

On the long term, we will probably move away from Git and GitHub to store and review packages to use a custom solution instead.

@gasche
Copy link
Contributor Author

gasche commented Mar 17, 2025

I plan to rework the packaging guidelines at some point, and explanations on how to do that (as well as general tips on how to keep a package small) will definitely be part of it.

Would you be able to suggest a way to do this incrementally, so that we can get this useful documentation out right now, without being in conflict with your medium-plans for the README? It's a shame to find out how to generate thumbnails or cloning the repository later on, after having already done both in a non-optimal way, because the information is hard to find.

For example, an immediate plan would be to create a doc/ repository, and move the content of the current README in subfiles there (with links from the main README). I would think of the following separate topics:

  • submitting-packages.md: the process of how to submit a new package
  • packaging-guidelines.md: general packaging rules and recommendations
  • package-content.md: recommended content of a package (README, thumbnails... including the recommended filesystem layout)
  • typst.toml.md: how to write typst.toml
  • local-packages.md: local packages / how to test packages locally

This is mostly for package authors. The main README should work as a landing page for package users by briefly mentioning how to use packages, and how to browse the set of existing packages. (The documentation on local packages can also be useful there.)

Would you be willing to review a PR that splits up the existing guidelines in this way?

@elegaanz
Copy link
Member

Yes, that would be extremely helpful. There are few other things I have in mind when it comes to improving the docs, but this can be an incremental process, splitting everything as you suggested would be a good first step.

@avonmoll
Copy link
Contributor

avonmoll commented May 9, 2025

I am not very knowledgable in this area, but I have wondered for a long time whether typst would consider moving away from a monolithic package repo in favor of a more distributed approach based on a registry. One example is the Julia programming language. Every package lives in its own repo (on any publicly accessible host) and is then registered within an official registry (e.g., the general registry: https://github.com/JuliaRegistries/General).

@jonaspleyer
Copy link
Contributor

Idea: Registries for Typst packages

Would it be worth to explore if we can define some sort of "registry" (or find more suitable name) like is being done in the Rust ecosystem with cargo and crates.io. I see some benefits by doing so:

  1. In this way, Universities or Companies might be able to host their own packages if desired which do not need to be open to the public if they desire to do so.
  2. Namespaces which are currently not supported (only preview) could be exchanged for registries.
  3. The import syntax could be extended such that packages from other registries could be used very easily.
// Use shared registry identifier
#import "@registry-identifier/my-package:0.1.1"
// Preview registry which will be replaced in the future
#import "@preview/my-package:0.1.1"
// Use direct url
#import "@{https://my-registry.org}/my-package:0.1.1"
  1. Reduced maintenance for the Typst company as registries manage themselves
  2. Avoid clashes

I also see some caveats and it might be reasonable to not simply "throw around" new registries lightly.

  1. Possible fragmentation of the ecosystem
  2. Guidelines/naming conventions which are directed by the Typst company would somehow need to be enforced within these registries.
  3. Which registries are "trustworthy" and can be included as "registry-identifier" such that the import syntax from above works flawlessly? How to we guarantee this for forward compatiblity? Although a similar problem already exists but on a package level.

@gasche
Copy link
Contributor Author

gasche commented May 26, 2025

Personally I think that this is somewhat illusory, especially the idea that "registries manage themselves": no, actually sizeable ecosystems have people who pour a lot of work into the repository (crates.io has dedicated volunteers, so does the opam-repository for the OCaml community that I'm more familiar with), and this also requires a lot of tooling and architecture.

My understanding from a distance is that Typst is both an open-source tool and also a company, and that the people who tried to make a living working on it currently want something that is more integrated than a repository of stuff hosted elsewhere. (For example maybe they want to evolve the repository format at the same time as their cloud frontend, and for this being able to update previous package versions is actually very convenient.) I don't have a strong opinion on which approach is better, but I think it's their choice to make, based on concrete needs.

If we want to stick to a repository that hosts packages content (in addition to metadata), but reduce the size that people have to download to contribute a package, there are plenty of technical solutions around. If people want to change the social organization of the package repository, this could/should be discussed separately.

@jonaspleyer
Copy link
Contributor

@gasche Thanks for the insights. I was in no way under the impression that this is the solution to the problem but simply offering a perspective. I personally do not really care about what the actual solution might be. What counts to me is that i can contribute my packages with a low barrier (which the current solution offers imho).

I think one problem which needs to be addressed in the future is the question of namespaces (aka what else to do than @preview/...). Furthermore, the typst team is already going in the direction to support whole organizations (https://typst.app/pricing/). They might want to host their proprietary typst-based packages for internal uses. So hosting something which is able to distribute typst packages on your own might not be too far fetched. Also I would say that implementing a new registry should not be done lightly and ultimately the typst team has the last saying about which registry-identifiers could be included in the default typst distribution. Furthermore, if the typst team decides at some point that particular namespaces (e.g. for particular institutions) are causing them too much effort, this namespace could be taken over by a full-fledged registry without users ever noticing (since the import syntax @identifier/... remains identical).

Maybe now that I think about it, it would be more similar to the pypi.org package index. And maybe the name "registry" was misleading in the beginning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants