Description
Our split
function has currently the following signature:
split (sep: seq<'Collection>) (source: 'Collection) : seq<'Collection>
This is very convenient for sequences/streams, since its implementation supports lazy evaluation.
However this is not pattern-match ready, as it would be with an array or a list.
It's possible to generalize the function a bit more making it bigeneric, by generalizing the seq
type we'll get something like:
split (sep: 'GenericCollection<'Collection>) (source: 'Collection) : 'GenericCollection<'Collection>
This is still type inference friendly, in the sense that the first and the last parameter are of the same type, however a bit less than before, where the source
can be inferred from the sep
.
The problem I see with this approach is that there's really no relationship inherent to the separators and the result. Let's take some use cases to prove this:
- Lazy sources: they have to return a
seq
but the separator doesn't need to be lazy, it actually won't be lazy evaluated. It's rather the source which has to be lazy. So the requirement for this use case is something like:
split (sep: 'GenericCollection1<'Collection1>) (source: 'LazyCollection) : 'LazyCollectionGeneric<'Collection2>
So, we're tempted to define it like split (sep: 'Collection<'CollectionGeneric>) (source: 'CollectionGeneric) : 'CollectionGeneric<'Collection>
But requiring a generic collection as a source won't work with lazy non-generic types, like bytestreams.
This take us to another case.
-
Strings: the result type doesn't need to be lazy. Again this point us to a relationship between the source and the result enclosing type, but the problem is that they will have different generic requirements.
-
Sets: we can specify the separators in a set, there's no problem with this, on the contrary it makes really sense as there is no point in having twice the same separator. But this wouldn't imply that the enclosing result type has to be a set. Having a set as enclosing result type could be desired, in a scenario when we don't want to have duplicated results, but that's not implied by the enclosing type of the separator.
-
NonEmptyList: we can specify the separators in a NEL, this makes sense as the separator list is not expected to be empty and the result will always contain at least one element (the original sequence).
Finally it worth noting that a true bigeneric approach would require some sort of rules, at the moment we don't have rules over Collection (but I think we can consider all types that support a roundtrip ofSeq
- toSeq
) but we'll need some abstraction for the enclosing type, otherwise we won't be able to code the default.
Conclusions:
The way I see it there are two options:
-
Although there is no relationship between the container of the separator and the container of the results, we can make them the same, there would be no conflicts and this way we don't need to introduce a third generic parameter.
-
A simpler approach would be to split
split
in two functions: a lazy one and a strict one. In this case apart of coming up with different names we'll need to choose a fixed type for the strict version (array could be efficient for large structures with random access but lists are a bit more pattern-match friendly).