Skip to content

Add Code.Fragment.lines/1 #14493

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 28, 2025
Merged

Add Code.Fragment.lines/1 #14493

merged 4 commits into from
May 28, 2025

Conversation

josevalim
Copy link
Member

Inspired by String#lines in Ruby and string.splitlines(True) in Python.

defp lines(<<?\n, rest::binary>>, acc),
do: [<<acc::binary, ?\n>> | lines(rest, <<>>)]

defp lines(<<char, rest::binary>>, acc),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we handle /r as well? the python version does:

"asd\radf".splitlines(True)
['asd\r', 'adf']

Side note: LSP document synchronisation treats /r as legit newlines as well

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Ruby does one not and I am skeptical about doing so because it is clearly not a newline on terminals:

$ iex
Erlang/OTP 27 [erts-15.1.2] [source] [64-bit] [smp:10:10] [ds:10:10:10] [async-threads:1] [jit]

Interactive Elixir (1.19.0-dev) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)> IO.puts "foo\rbar"
bar
:ok

So treating it as a newline looks very Windows centric. :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps support the Unicode Line Separator? That would seem consistent with String.split/1 which uses the Unicode definition of whitespace.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So treating it as a newline looks very Windows centric. :)

Wasn't it classic MacOS?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps support the Unicode Line Separator? That would seem consistent with String.split/1 which uses the Unicode definition of whitespace.

python does support it

"asd
adf".splitlines(True)
['asd\u2028', 'adf']

Copy link
Contributor

@lukaszsamson lukaszsamson May 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python has it's own rules https://docs.python.org/3/library/stdtypes.html#str.splitlines basing on https://peps.python.org/pep-0278/ and https://peps.python.org/pep-3116/

And they quite recently added \v and \f to list of line boundaries (in 3.2)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And then you have Erlang's interpretation of newlines, which only considers \r\n and \n as well:

~/OSS/elixir[jv-string-lines *%]$ cat example | elixir -e "IO.stream() |> Enum.each(&IO.inspect/1)"
"LINE\n"
"LINE\rLINE\n"
<<76, 73, 78, 69, 12, 76, 73, 78, 69, 11, 76, 73, 78, 69, 194, 133, 76, 73, 78,
  69, 226, 128, 168, 76, 73, 78, 69, 226, 128, 169, 76, 73, 78, 69>>

So I am thinking the best is to find a new home for this function. Perhaps the Code module indeed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still use Textmate often and it shows something very similar to Zed.

image

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's even more conservative as it only shows \n but I assume there is a configuration somewhere for it to read it Windows style.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the feedback. I have renamed it to Code.lines/1, allowing us to mirror what the Elixir compiler does.

@josevalim josevalim changed the title Add String.lines/1 Add Code.lines/1 May 17, 2025
Comment on lines 331 to 332
considered, namely `\r\n` and `\n`. If you would like the retrieve
lines without their line endings, use `String.split(string, ["\r\n", "\n"])`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like the disclaimer about ending will narrow the use case for this quite a bit?
Perhaps we could add a trim: true option later to remove the ending, in which case it would make it more generally useful.
Which kind of use cases do we have in mind for this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use case is to fetch a range from a source file but preserving its original source code. I can add trim: true but trim typically means trimming more (spaces, tabs, etc). It is also something folks can do by doing another pass on the data.

@josevalim
Copy link
Member Author

I am thinking about moving this to Code.Fragment instead... as Code is quite bloated already and it is usually about evaluating code (not treating Code as text).

@josevalim josevalim changed the title Add Code.lines/1 Add Code.Fragment.lines/1 May 28, 2025
@josevalim josevalim merged commit 7f1fe47 into main May 28, 2025
24 checks passed
@josevalim josevalim deleted the jv-string-lines branch May 28, 2025 09:04
@josevalim
Copy link
Member Author

💚 💙 💜 💛 ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants