Skip to content

[Feature]: Support resolve the modules with hard-links #5912

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
colinaaa opened this issue Mar 11, 2024 · 17 comments
Closed

[Feature]: Support resolve the modules with hard-links #5912

colinaaa opened this issue Mar 11, 2024 · 17 comments
Labels
feat New feature or request

Comments

@colinaaa
Copy link
Collaborator

What problem does this feature solve?

pnpm creates hard links from the global store to the project's node_modules folders.

For example, imagine you have the following directory structure: packages/a and packages/b have the same version of lodash as a dependency. But they are different symlink that points to different files.

.
├── node_modules
│   ├── a -> ../packages/a
├── package.json
├── packages
│   ├── a
│   │   ├── index.ts
│   │   ├── node_modules
│   │   │   ├── b -> ../../b
│   │   │   └── lodash -> ../../../node_modules/.pnpm/[email protected]/node_modules/lodash
│   │   └── package.json
│   └── b
│       ├── index.ts
│       ├── node_modules
│       │   └── lodash -> .pnpm/[email protected]/node_modules/lodash
│       ├── package.json
│       ├── pnpm-lock.yaml
│       └── pnpm-workspace.yaml
├── pnpm-lock.yaml
└── pnpm-workspace.yaml

Since two lodash/lodash.js points to different files, both rspack(or oxc internally) and webpack(or enhanced-resolve internally) treat the two files as unrelated. So at build time, two copies of lodash code will be packaged in the output bundle, even if their content is the same.

However, it can be found that in the file system, the two lodash.js point exactly to the same inode node. Which means they are the same file in physical storage.

$ ls -i packages/a/node_modules/lodash/lodash.js
2249211 packages/a/node_modules/lodash/lodash.js

$ ls -i packages/b/node_modules/lodash/lodash.js
2249211 packages/b/node_modules/lodash/lodash.js

To solve the problem, we created a webpack plugin to cache the request:

import fs from 'node:fs'

import type { ResolveRequest, Resolver } from 'enhanced-resolve'

class InodeWebpackPlugin {
  static #source = 'resolved'

  apply(resolver: Resolver) {
    resolver
      .getHook(InodeWebpackPlugin.#source)
      .tapAsync('INodeCachePlugin', (request, _, callback) => {
        if (!request.path) {
          return callback()
        }

        try {
          const { ino } = fs.statSync(request.path)

          const cachedRequest = this.#cache.get(ino)
          if (cachedRequest) {
            // Note that the query may change for the same path
            // with a different query.
            Object.assign(request, { ...cachedRequest, query: request.query })
            return callback()
          }

          this.#cache.set(ino, request)
        } catch {
          // explicitly ignore error
        }

        return callback()
      })
  }

  #cache = new Map<number, ResolveRequest>()
}

And it works as expected in webpack, the duplicated modules have been eliminated.

But when we are trying to migrate to rspack, we found that resolve.plugins is not supported(#3890).

So it would be great if rspack or oxc supports dealing with hard links like this by default(I would like to file an issue to enhanced-resolve, too).

What does the proposed API of configuration look like?

It would be great if this is enabled by default(just like symlinks options of enhanced-resolve).

It is acceptable if this is an experimental configuration or a sub-configuration in resolve. E.g.:

export default {
  resolve: {
    hardlinks: true,
  },
  experimental: {
    resolve: { hardlinks: true }
  },
}
@colinaaa colinaaa added feat New feature or request pending triage The issue/PR is currently untouched. labels Mar 11, 2024
@LingyuCoder
Copy link
Contributor

Relate pnpm hardlinks

Pnpm use symlinks from package node_modules to workspace node_modules, and use hardlinks from workspace node_modules files to global files.

Currently resolver can only deal with symlinks and get the node_modules of workspace. So this issue should occur when
there are multiple workspaces.

cc @Boshen

@Boshen
Copy link
Contributor

Boshen commented Mar 11, 2024

This isn't a bundler problem. Your codebase is no longer a monorepo, it's just different packages put inside the same codebase, and then you enabled compilation from source.

I don't have an exact solution, but you can try:

  • resolve alias that module
  • hoist the dependency to the root

@LingyuCoder LingyuCoder removed the pending triage The issue/PR is currently untouched. label Mar 11, 2024
@xc2
Copy link
Collaborator

xc2 commented Mar 11, 2024

there's no source or target role around paths sharing the same inode, thus i believe it's impossible to determine the resolved path without a rule.

in your plugin, the resolved path is determined by the first request. i think it is unstable.

@colinaaa
Copy link
Collaborator Author

This isn't a bundler problem. Your codebase is no longer a monorepo, it's just different packages put inside the same codebase, and then you enabled compilation from source.

I don't have an exact solution, but you can try:

  • resolve alias that module
  • hoist the dependency to the root

I agree that having such a huge monorepo with a bunch of pnpm workspaces inside is not a good way. But this helps us to collaborate between different teams efficiently. And our build tools based on webpack and esbuild works just fine with the plugin.

I don't think resolve.alias is a good solution since it will break the version constraint of the packages, it is likely to use a different version of a package with resolve.alias and cause bugs. It also requires all developers to be familiar with resolve and the dependencies(and the dependencies of dependencies) and write them down to the webpack.config.js, which is just impossible. This is not how you manage dependencies in such a huge project.

Not to mention hoist the dependency to the root. We would not use such a monorepo if we could merge the workspaces into one.

@colinaaa
Copy link
Collaborator Author

there's no source or target role around paths sharing the same inode, thus i believe it's impossible to determine the resolved path without a rule.

I'm not sure what do you mean by "there's no source or target role around paths sharing the same inode". Could you please explain that to me in detail? Thanks!

in your plugin, the resolved path is determined by the first request. i think it is unstable.

I agree that it is not 100% stable, but it is correct(at least from our experience).

@xc2
Copy link
Collaborator

xc2 commented Mar 11, 2024

I'm not sure what do you mean by "there's no source or target role around paths sharing the same inode". Could you please explain that to me in detail? Thanks!

as you know, a symbolic link represents a link file linking to another file, when a resolver resolves a request that points to a symbolic link with option like preferRealPath it gives the linked file's path.

while a "hard link" just represents a path to an inode node. in your case, there's two "hard links" linked to a same inode node. the two paths are of equal status thus a resolver cannot say "who it prefer".

so i do think, other than just "resolving paths with same inode number to one path", we need an explicit option to tell "which path the resolver should choose when resolving paths with same inode number to one path"

@xc2
Copy link
Collaborator

xc2 commented Mar 11, 2024

"use the first coming result" might not be a good idea as it will change.

it might cause problem when there're different loaders/plugins logic against the two paths sharing the same inode number. though it's rarely common in node_modules scene.

@xc2
Copy link
Collaborator

xc2 commented Mar 11, 2024

Another concern I have is that the term "inode" is closely associated with filesystems, so I'm not sure if it will function properly in filesystems without inode feature, such NTFS (though it has a similiar feature).

just to address some concerns, I know nothing about filesystem(_;).

also not intended to reject any possible feature

@Boshen
Copy link
Contributor

Boshen commented Mar 12, 2024

@colinaaa Can you provide a github test repository that replicates this project structure?

@colinaaa
Copy link
Collaborator Author

@Boshen Check this out! https://github.com/colinaaa/rspack-hardlink-repro

@colinaaa
Copy link
Collaborator Author

"use the first coming result" might not be a good idea as it will change.

it might cause problem when there're different loaders/plugins logic against the two paths sharing the same inode number. though it's rarely common in node_modules scene.

I agree that using the first coming result is not an ideal way. But we have to return results for each request, and there is no way to know if there are requests to come in the future. So the only way to make it work is to return the first coming result.

@LingyuCoder
Copy link
Contributor

In my opinion, "use the first coming result" or inode check of hardlink are both business-specific logic that can be ensured by the business itself. They are not suitable to implement in OXC.

But in Rspack, the before_resolve hook cannot get the resolved path, and the after_resolve hook would generate full request, which leading to a substantial increase in transmission costs. Rspack does indeed need a low-cost way to modify the resolve result.

@Boshen
Copy link
Contributor

Boshen commented Mar 12, 2024

@Boshen Check this out! https://github.com/colinaaa/rspack-hardlink-repro

rspack-hardlink-repro  main ❯ ls -lh ./packages/a/node_modules/lodash
lrwxr-xr-x  1 bytedance  staff    62B Mar 12 16:09 ./packages/a/node_modules/lodash -> ../../../node_modules/.pnpm/[email protected]/node_modules/lodash

rspack-hardlink-repro  main ❯ ls -lh ./packages/b/node_modules/lodash
lrwxr-xr-x  1 bytedance  staff    62B Mar 12 16:09 ./packages/b/node_modules/lodash -> ../../../node_modules/.pnpm/[email protected]/node_modules/lodash

rspack-hardlink-repro  main ❯ ls -i ./packages/a/node_modules/lodash
162517384 ./packages/a/node_modules/lodash

rspack-hardlink-repro  main ❯  ls -i ./packages/b/node_modules/lodash
162517383 ./packages/b/node_modules/lodash

Is this replicated correctly?

@colinaaa
Copy link
Collaborator Author

rspack-hardlink-repro on  main is 📦 v1.0.0 via  v18.18.0
❯ /bin/ls -lh ./packages/a/node_modules/lodash
lrwxr-xr-x@ 1 bytedance  staff    62B Mar 12 14:43 ./packages/a/node_modules/lodash -> ../../../node_modules/.pnpm/[email protected]/node_modules/lodash

rspack-hardlink-repro on  main is 📦 v1.0.0 via  v18.18.0
❯ /bin/ls -lh ./packages/b/node_modules/lodash
lrwxr-xr-x  1 bytedance  staff    40B Mar 12 15:04 ./packages/b/node_modules/lodash -> .pnpm/[email protected]/node_modules/lodash

rspack-hardlink-repro on  main is 📦 v1.0.0 via  v18.18.0
❯ /bin/ls -i ./packages/b/node_modules/lodash/lodash.js
797938 ./packages/b/node_modules/lodash/lodash.js

rspack-hardlink-repro on  main is 📦 v1.0.0 via  v18.18.0
❮ /bin/ls -i ./packages/a/node_modules/lodash/lodash.js
797938 ./packages/a/node_modules/lodash/lodash.js

Did you run the pnpm --prefix packages/b install?

@Boshen
Copy link
Contributor

Boshen commented Mar 12, 2024

I can't get the same inode 🤔

image

@colinaaa
Copy link
Collaborator Author

@Boshen Sorry! Should run pnpm --prefix packages/b install before pnpm install :) README.md updated

image

@Boshen
Copy link
Contributor

Boshen commented Mar 29, 2024

Done in #5924

@Boshen Boshen closed this as completed Mar 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants