Make split function bigeneric

Our ``split`` function has currently the following signature:

```
split (sep: seq<'Collection>) (source: 'Collection) : seq<'Collection>
```
This is very convenient for sequences/streams, since its implementation supports lazy evaluation.

However this is not pattern-match ready, as it would be with an array or a list.

It's possible to generalize the function a bit more making it bigeneric, by generalizing the `seq` type we'll get something like:

```
split (sep: 'GenericCollection<'Collection>) (source: 'Collection) : 'GenericCollection<'Collection>
```

This is still type inference friendly, in the sense that the first and the last parameter are of the same type, however a bit less than before, where the ``source`` can be inferred from the ``sep``.

The problem I see with this approach is that there's really no relationship inherent to the separators and the result. Let's take some use cases to prove this:

 - Lazy sources: they have to return a ``seq`` but the separator doesn't need to be lazy, it actually won't be lazy evaluated. It's rather the source which has to be lazy. So the requirement for this use case is something like:

```
split (sep: 'GenericCollection1<'Collection1>) (source: 'LazyCollection) : 'LazyCollectionGeneric<'Collection2>
```

So, we're tempted to define it like ``split (sep: 'Collection<'CollectionGeneric>) (source: 'CollectionGeneric) : 'CollectionGeneric<'Collection>``

But requiring a generic collection as a source won't work with lazy non-generic types, like bytestreams.

This take us to another case.

 - Strings: the result type doesn't need to be lazy. Again this point us to a relationship between the source and the result enclosing type, but the problem is that they will have different generic requirements.

 - Sets: we can specify the separators in a set, there's no problem with this, on the contrary it makes really sense as there is no point in having twice the same separator. But this wouldn't imply that the enclosing result type has to be a set. Having a set as enclosing result type could be desired, in a scenario when we don't want to have duplicated results, but that's not implied by the enclosing type of the separator.

 - NonEmptyList: we can specify the separators in a NEL, this makes sense as the separator list is not expected to be empty and the result will always contain at least one element (the original sequence).

Finally it worth noting that a true bigeneric approach would require some sort of rules, at the moment we don't have rules over Collection (but I think we can consider all types that support a roundtrip ``ofSeq`` - ``toSeq``) but we'll need some abstraction for the enclosing type, otherwise we won't be able to code the default.


Conclusions:

The way I see it there are two options:

 - Although there is no relationship between the container of the separator and the container of the results, we can make them the same, there would be no conflicts and this way we don't need to introduce a third generic parameter.

 - A simpler approach would be to split ``split`` in two functions: a lazy one and a strict one. In this case apart of coming up with different names we'll need to choose a fixed type for the strict version (array could be efficient for large structures with random access but lists are a bit more pattern-match friendly).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make split function bigeneric #102

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Make split function bigeneric #102

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions