Skip to content

Monomorphized generics #44

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
nikic opened this issue Jan 22, 2020 · 2 comments
Open

Monomorphized generics #44

nikic opened this issue Jan 22, 2020 · 2 comments

Comments

@nikic
Copy link
Collaborator

nikic commented Jan 22, 2020

Because quite a few people suggested this and @jasny started implementation work on it, I want to discuss this possible alternative implementation approach.

First, to make it clear what I'm talking about:

With reified generics, there is a single class entry representating a parameterized type, and each class reference stores a pointer to the class entry, as well as a list of type arguments. If you have Pair<A, B> there will be one class entry for Pair, which is referenced by Pair<int, int>, Pair<int, string> etc.

With monomorphized generics, we instead generate a new class entry for each combination of type arguments. If you have Pair<A, B>, then Pair<int, int> and Pair<int, string> will generate two separate class entries starting from a common template for Pair.

In languages like C++, one major advantage of monomorphization is that code can be aggressively optimized by specializing it to specific types. While the same is theoretically possible in PHP as well, the optimization potential is many orders of magnitude smaller. The major disadvantage of monomorphization is that it increases code size. Generating an excessively large amount of code with templates, resulting in long compile-times and large binaries is a well known problem in C++.

For PHP, I believe the main appeal of monomorphizing is twofold: First, it may be faster, because generic types can be fully resolved and do not need to be looked up. Second, it seems to simplify the implementation. The reason is that we can keep everything using zend_class_entry*.

I think that things are not quite as simple as that though. In particular, it is not possible to arrange those generic class entries into an accurate inheritance hierarchy as soon as variant type parameters come into play:

class Traversable<out T> { ... }

class A {}
class B extends A {}

// Traversable<B> instanceof Traversable<A> must be true.

Here, the monomorphized class entry for Traversable<B> must be a child of Traversable<A>. More problematically, Traversable<X> must be a subtype of Traversable<X|anything_else_here>, and there is a potentially infinite number of such types. Of course, this also forms a lattice, not a simple chain. As such, subtype checking (which is involved everywhere from instanceof, over type checks to inheritance checks) will not be able to work on a simple "class entries only" model anyway.

The other consideration is memory usage, and caching. Copying the entire class including opcodes for each type combination would skyrocket memory usage. Even if we don't copy opcodes (which means that references to generic parameters in them are not early bound), just copying all the method etc entries would also need a significant amount of memory.

Additionally, it is not clear how those monomorphized class entries would be cached. Creating them for each type combination and resolving all types in them should have some pretty hefty overhead. However, as the monomorphized class entry would depend on runtime state (such as the parent class), it would not be possible to place it into opcache SHM. This would only be possible as part of preloading, where having these kinds of dependencies is legal. But we can't make preloading a requirement to make this at least somewhat efficient.

@lisachenko
Copy link

... the monomorphized class entry for Traversable<B> must be a child of Traversable<A>. More problematically, Traversable<X> must be a subtype of Traversable<X|anything_else_here>, and there is a potentially infinite number of such types.

We can follow Java way https://docs.oracle.com/javase/tutorial/java/generics/inheritance.html

Given two concrete types A and B (for example, Number and Integer), MyClass<A> has no relationship to MyClass<B>, regardless of whether or not A and B are related. The common parent of MyClass<A> and MyClass<B> is object.

изображение

We can subtype a generic class or interface by extending or implementing it. The relationship between the type parameters of one class or interface and the type parameters of another are determined by the extends and implements clauses.

Using the Collections classes as an example, ArrayList<E> implements List<E>, and List<E> extends Collection<E>. So ArrayList<String> is a subtype of List<String>, which is a subtype of Collection<String>. So long as you do not vary the type argument, the subtyping relationship is preserved between the types.

изображение

Now imagine we want to define our own list interface, PayloadList, that associates an optional value of generic type P with each element. Its declaration might look like:

interface PayloadList<E,P> extends List<E> {
  void setPayload(int index, P val);
  ...
}

The following parameterizations of PayloadList are subtypes of List<String>:

  • PayloadList<String,String>
  • PayloadList<String,Integer>
  • PayloadList<String,Exception>

изображение

@nikic
Copy link
Collaborator Author

nikic commented Jan 23, 2020

@lisachenko I think that having variant type parameters is a pretty important feature of generics in PHP, especially for the Traversable case I mentioned. Assuming that we define

class Traversable<out T = mixed> { ... }

with T being a covariant parameter, we then have that Traversable<int> instanceof Traversable<mixed> instanceof Traversable. This means that you can use a Traversable<int> wherever a Traversable (which is same as Traversable<mixed>) is expected. This makes it much easier to make code that uses generics and code that doesn't to interoperate. Without the covariant type parameter, Traversable<int> and Traversable would be completely unrelated classes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants