Skip to content

[indexstore-db] Declare Codable conformance, to autosynthesize implementation. #248

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

moshe-foreflight
Copy link

@moshe-foreflight moshe-foreflight commented May 27, 2025

This PR adds Codable support to the SymbolOccurrence and Symbol models, in addition to types they depend on. (SymbolRole, etc)

Why?

  • I found a need for this while trying to work with actors to parallelize index access.
  • Clients code depending on indexstore-db would need to manually implement conformance, because conformances in extensions are not synthesized.

Risks

  • Additions to these models would need to support Codable as well. (Compiler will catch this.)
  • Changes may cause unexpected results when deserializing. (Mitigated with unit tests)

See Also

@moshe-foreflight moshe-foreflight changed the title [indexstore-db] Declare Codable conformance, to autosynthesize implementation. [indexstore-db][#231] Declare Codable conformance, to autosynthesize implementation. May 27, 2025
@moshe-foreflight moshe-foreflight changed the title [indexstore-db][#231] Declare Codable conformance, to autosynthesize implementation. [indexstore-db]Declare Codable conformance, to autosynthesize implementation. May 27, 2025
@moshe-foreflight
Copy link
Author

moshe-foreflight commented May 27, 2025

I've added some tests to validate the input and output but an argument could be made that the tests merely confirm the behavior of the system framework. Looking for input from reviewers, please.

@ahoppen
Copy link
Member

ahoppen commented May 28, 2025

Thanks for opening the PR, @moshe-foreflight.

I don’t quite understand why we need a Codable conformance for these types. You said that you need it to work with these types across actor instances but wouldn’t that mean that Sendable is the protocol you are looking for? I’m not sure what you’re doing on your end but with the Codable conformance you are likely serializing the data to JSON just to deserialize it in the same process, which seems like a waste of compute resources. If Codable is indeed the protocol you need, could you elaborate a little more what your use case is?

@moshe-foreflight
Copy link
Author

moshe-foreflight commented May 28, 2025

I'm trying to optimize for performance across multiple runs of a process by splitting the IndexStoreDB into a separate XPC process. (I'm looking to make something easier to ship to teammates than clangd or an LSP server.)

You said that you need it to work with these types across actor instances but wouldn’t that mean that Sendable is the protocol you are looking for?

Quite possibly - it's likely I'm "holding it wrong."

you are likely serializing the data to JSON just to deserialize it in the same process,

I was trying to spin up an XPC service so I could keep the index alive in a lightweight process, and for reasons I can't recall, JSON seemed like a better option for my implementation.

If Codable is indeed the protocol you need, could you elaborate a little more what your use case is?

  • Sending over XPC services, as mention above.
  • Directly dumping symbols and occurrences as part of larger JSON output.

Can you think of a reason not to support this from the perspective of the library vendor? Just because we can, doesn't mean we should, but will adding Codable here worsen IndexStoreDB?

Thanks again for your time and patience!

@ahoppen
Copy link
Member

ahoppen commented May 30, 2025

I'm trying to optimize for performance across multiple runs of a process by splitting the IndexStoreDB into a separate XPC process.

What are you trying to gain by keeping the index alive? Are you expecting to quit and re-launch clients of your index service multiple times per second or something like that? Because when closing the index, it should be saved (look for a saved directory in the directory that contains your indexstore-db) and re-opening the index again should just be re-loading that saved index, which should be almost instant.

Quite possibly - it's likely I'm "holding it wrong."

I didn’t realize that you were talking about separate processes. In that case, Sendable won’t be sufficient.

Can you think of a reason not to support this from the perspective of the library vendor? Just because we can, doesn't mean we should, but will adding Codable here worsen IndexStoreDB?

I am a still a little hesitant, in particular until I understand that this is the correct solution for a real problem. Here are a couple reasons why:

  • If we add the ability to dump the symbols as JSON, there might be expectations that the JSON dump stays backwards-compatible and this could significantly hinder the evolution of indexstore-db.
  • The index can be quite large and dumping all of it as JSON can quickly produce dumps that are too large to handle. JSON might not be the correct dump format anymore. At that point, you would likely be better of just iterating over the entire index store itself, without using indexstore-db (ie. dlopen libIndexStore.dylib and then use these functions to iterate over the index https://github.com/swiftlang/indexstore-db/blob/main/Sources/IndexStoreDB_Index/include/IndexStoreDB_Index/indexstore_functions.h)
  • Adding Codable conformances will prevent us from adding any kind of circular references to these types. To be clear, we don’t have any plans to do this, but I think it’s not completely unreasonable that Symbol became a reference type and SymbolOccurrence contained a pointer to Symbol and then Symbol contained a member that referred to all of its references. This wouldn’t be representable in JSON.
  • In many cases, I would expect that not all the information of these types is actually needed, in which case serializing all of it would add unnecessary overhead. In these cases, it might be more beneficial to do as much processing as possible inside the process that has loaded the index and then just return the aggregated data. FWIW, this is effectively what SourceKit-LSP does to serve its LSP requests.

@moshe-foreflight
Copy link
Author

moshe-foreflight commented Jun 5, 2025

As a concrete example, I'm trying to do some code analysis to find unit tests and dependencies between symbols.

As an intermediate step, I was trying to build something to query the index store. (Then I realized I can use lldb against any instance of the IndexStoreDB and just run expressions.)

What are you trying to gain by keeping the index alive? Are you expecting to quit and re-launch clients of your index service multiple times per second or something like that?

A few reasons to keep the index alive:

  • At least while developing, I want to be able to iterate quickly. (XPC requests against a living service seems faster because it avoids loading even a saved index.)
  • When my application crashes, the saved directory doesn't appear to be created. By keeping the index outside the main process, we don't need it.

Because when closing the index, it should be saved (look for a saved directory in the directory that contains your indexstore-db) and re-opening the index again should just be re-loading that saved index, which should be almost instant.

Now that you mentioned it, I see it, thanks!

As I stated above, if/when my process crashes, the saved directory doesn't appear.

  1. I've noticed the presence of a sibling directory named ...-dead-saved. Is it safe to rename that to saved and use it?
  2. Is there a way to manually close the connection or trigger a save from the Swift API?
  3. Also, I haven't seen the format of the mdbPath documented anywhere. I'm using the following:
import PathKit
// ...
Path(
            components: [
                self.derivedDataPath,
                self.derivedDataSubfolderForProject,
                "Index.noindex",
                "UniDB",
                "\(basename).xcindex"
            ])

Is there a benefit to adding specificity to this path? (v13/saved) The above formula works, but if directly accessing saved would be faster or something, it may also be helpful.

I am a still a little hesitant, in particular until I understand that this is the correct solution for a real problem. Here are a couple reasons why:

This is all reasonable. It seems like the right thing to do is close this PR.

(I'd like to make SymbolOccurrence conform to Sendable. ` Are you open to that change?)

Thanks!

@moshe-foreflight moshe-foreflight changed the title [indexstore-db]Declare Codable conformance, to autosynthesize implementation. [indexstore-db] Declare Codable conformance, to autosynthesize implementation. Jun 5, 2025
@ahoppen
Copy link
Member

ahoppen commented Jun 5, 2025

  • At least while developing, I want to be able to iterate quickly. (XPC requests against a living service seems faster because it avoids loading even a saved index.)

Out of curiosity: How long does it take to load the saved index? I would expect it to take significantly less than a second, especially if you use a small-ish project for local development testing.

  • When my application crashes, the saved directory doesn't appear to be created. By keeping the index outside the main process, we don't need it.

I think I remember there being a bug where we don’t recover the indexstore-db after a crash. If I remember correctly, what we’re supposed to do is, if there is no saved directory, check if we have one that matches a pid, check if that process is still alive, and if not, page that indexstore-db in. We should just fix this issue. If you would be interested to look at it, that would be amazing!

(I'd like to make SymbolOccurrence conform to Sendable. ` Are you open to that change?)

Yes, I think that makes sense. If you open a PR, could you also make the related types like Symbol etc. as Sendable?

@bnbarham
Copy link
Contributor

bnbarham commented Jun 5, 2025

I think I remember there being a bug where we don’t recover the indexstore-db after a crash

This is unfortunately intentional, as it's possible it could be in a corrupted state after a crash. I'd expect creating the database for a small indexstore to be fairly fast regardless though.

@moshe-foreflight
Copy link
Author

moshe-foreflight commented Jun 6, 2025

Out of curiosity: How long does it take to load the saved index? I would expect it to take significantly less than a second,

  • A saved index loads in under a second - usually. (I've seen it take just over a second, but that may have been when I was running in a debugger or alongside a resource-intensive video-call.)
  • A fresh load takes anywhere between 30 and 50 seconds
Loading Saved Index Loading Unsaved Index
Screenshot 2025-06-06 at 9 55 57 AM Screenshot 2025-06-06 at 9 56 52 AM

especially if you use a small-ish project for local development testing.

When you say "small-ish" what do you have in mind? We have 470162 symbols in 48644 units without filtering system symbols like NSObject.

FWIW, according to cloc we have 1448685 lines of code
Language files blank comment code
Objective-C 4840 204461 91610 798253
Swift 5875 121709 91897 582341
C++ 358 10591 8008 53676
C 28 2784 2292 14415
SUM 11101 339545 193807 1448685

If you would be interested to look at it, that would be amazing!

If you have a code-pointer, if Ben agrees (see his followup below) and my manager approves, I'm happy to look.

Yes, I think that makes sense. If you open a PR, could you also make the related types like Symbol etc. as Sendable?

Sure. Some of those already are Sendable but for the missing ones, I can do that.

@moshe-foreflight
Copy link
Author

@ahoppen I've opened #251 to add sendability. (Also addresses #250.)

@ahoppen
Copy link
Member

ahoppen commented Jun 10, 2025

If you have a code-pointer, if Ben agrees (see his followup below) and my manager approves, I'm happy to look.

Sorry, I missed something as @bnbarham pointed out. Let’s not change it then.

And thanks for opening the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants