Added OriginalMarkdown Property #2737

joejoe87577 · 2019-01-10T08:18:03Z

Added logic for OriginalMarkdown in different block types

PR Type

What kind of change does this PR introduce?
Feature

What is the current behavior?

As described in #2729

What is the new behavior?

Added new property OriginalMarkdown to MarkdownBlock and added logic to extract the original markdown from the parsed string in all block types.

PR Checklist

Please check if your PR fulfills the following requirements:

Tested code with current supported SDKs
Sample in sample app has been added / updated (for bug fixes / features)
- Icon has been created (if new sample) following the Thumbnail Style Guide and templates
Pull Request has been submitted to the documentation repository instructions. Link:
Tests for the changes have been added (for bug fixes / features) (if applicable)
Header has been added to all new source files (run build/UpdateHeaders.bat)
Contains NO breaking changes

Other information

Added logic for OriginalMarkdown in different block types

dnfclas · 2019-01-10T08:18:14Z

All CLA requirements met.

dnfclas · 2019-01-10T08:18:14Z

Thank you for your submission, we really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.

❌ joejoe87577 sign now
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

nmetulev · 2019-01-14T23:29:02Z

Looks good to me. The only question/concern I have is the additional memory used by the new property. Is there a way to use a reference to a substring instead, using span maybe?

joejoe87577 · 2019-01-15T08:06:11Z

@nmetulev I've not yet used Span<T> so far. Please correct me if I'm wrong here.
My (very simple) understanding is, that Span<T> is a new implementation to iterate, sort or splice arrays with a lower memory footprint than traditional methods..

Your idea would be to store the resulting string as char[] made with Span<T> from the parsed markdown or to substring the parsed markdown with a Span<T> and store it as a String?

windowstoolkitbot · 2019-01-30T00:00:04Z

This PR seems inactive. @joejoe87577 Do you need help to complete this issue?

joejoe87577 · 2019-01-30T07:01:24Z

I'm just waiting for another review as this pr needs two reviews.

nmetulev · 2019-02-01T17:49:33Z

Sorry for the late response @joejoe87577. I just realized that Span is not available in .NET Standard 2.0 so we probably can't use it. However, I'm still wondering if there is a way to avoid having to keep a copy of the string in each element and instead generating it as needed. For example, what if we keep a reference as the range in the original string only and generate the string in ToString()?

michael-hawker · 2019-02-01T17:58:21Z

We should also add some Unit Tests to our Markdown Unit Test project to ensure that the right results are returned as expected.

windowstoolkitbot · 2019-02-16T23:58:38Z

This PR seems inactive. @joejoe87577 Do you need help to complete this issue?

kbrons

The code works well, the only small issue is that the OriginalMarkdown property isn't filled on List children, but only on the list itself.

michael-hawker · 2020-03-17T18:03:49Z

@Sergio0694 what are your thoughts on Span and perf here and how we could optimize this?

Sergio0694 · 2020-03-17T18:38:20Z

@michael-hawker I'll have a look at the specific package and this PR and get back to you!

Also @nmetulev about Span<T> not being available on .NET Standard 2.0, as of now they're already included in the base Microsoft.Toolkit package, as I've added a reference to System.Memory in #3131. So we can definitely use them in other dependent packages as well now.

In fact, I was just looking around the markdown parser in this package the other day, and was thinking that it could definitely use some Span<T> love - right now there are a lot of substrings being created that are not necessary - a simple no-allocation slice would suffice in most cases.

For instance, most inline parsers have a line like this:

https://github.com/windows-toolkit/WindowsCommunityToolkit/blob/8d4fb2d43d6b3cfe2fea7c05c487b93ece710861/Microsoft.Toolkit.Parsers/Markdown/Inlines/BoldItalicTextInline.cs#L58

Which could basically always be replaced by a Span<T>, reducing both the execution time as well as the amount of objects the GC has to track. Granted, those are all short lived objects ending up in generation-0, but still, it'd be a nice optimization I think.

Sergio0694 · 2020-03-17T20:52:10Z

After taking a look at this PR I think we could just remove the overhead caused by allocations when setting the new OriginalMarkdown property by calling Substring, by exposing that property as a ReadOnlyMemory<char> (wrapping the area of interest) instead of a string (the substring). Using a Span<T> (or ReadOnlySpan<T>) won't work in this case, since those are ref struct types, so they can't be stored as fields in a class (that'd cause them to end up in the managed heap).

The idea would be, instead of using markdown.Substring(start, length), we'd just use markdown.AsMemory(start, length). That'd require no allocations, and users would be able to get the actual string from that if they need it, by just calling ReadOnlySpan<T>.ToString() on that property.

This is possible since the type has a special handling when the wrapped object is string, returning the actual substring the user expects, see here). If for some reason we'd instead want to expose that property as a string, we could still save the unnecessary allocations when users don't need them by storing the ReadOnlyMemory<char> instance internally and leveraging lazy loading like so:

/// <summary>
/// Gets or sets the <see cref="ReadOnlyMemory{T}"/> for <see cref="OriginalMarkdown"/>.
/// </summary>
internal ReadOnlyMemory<char> OriginalMarkdownMemory { get; set; }

private string _OriginalMarkdown;

/// <summary>
/// Gets the original Markdown the element was parsed from.
/// </summary>
public string OriginalMarkdown => _OriginalMarkdown ??= OriginalMarkdownMemory.ToString();

Note: the OriginalMarkdown getter is not thread safe in this case, but assuming that'd likely be used on the UI thread anyway it shouldn't matter, plus even in a race condition there wouldn't be crashes, the worst that could happen would just be the allocation of an unnecessary copy of the target substring, which would just be discarded. The user wouldn't notice any differences. This approach though keeps things simpler by not requiring additional synchronization primitives.

This is more of an API-design preference issue, the main point is to avoid allocations for users that don't actually need this new OriginalMarkdown property. There is technically still a minor memory overhead even with this approach, since each inline object would be 12 bytes bigger (16 bytes bigger with the lazy loaded string property) on x86 (16/24 bytes on x64), but assuming the input strings would generally be much larger, this should be negligible anyway.

michael-hawker · 2020-03-20T21:59:51Z

@Sergio0694 thanks for all your input here! I know this kind of expands the scope of this original PR for this feature, but I believe if I understand correctly your comments, but if we more broadly apply these techniques to provide the OriginalMarkdown property as a core piece of each parser, that we can improve the performance of the parsers themselves as we wouldn't be copying the strings with the substring calls everywhere, correct? However, this seems like a larger task/feature, eh?

michael-hawker · 2020-04-02T16:40:34Z

Moving out of 6.1 release as think refactoring this would be a lot of work and may be superseded by issue #3200 if we just end up adopting a more robust parser. Hopefully a library like Markdig supports this type of feature already? @MihaZupan

MihaZupan · 2020-04-02T17:10:05Z

Markdig exposes a SourceSpan on every AST element, which is just a Range (currently a user would have to call Substring themselves).
We could expose a helper method returning ReadOnlyMemory<char> for that AST element, leaving the need for a string allocation up to the user.

If this Toolkit exposes a wrapper over the AST, it is free to expose a lazily-allocated (as @Sergio0694 suggested) substring if the user requests it.

I would hold off on exposing more API until a potential migration has been prototyped.

michael-hawker · 2020-07-21T22:56:39Z

@joejoe87577 thanks for taking an initial look at his. I think we should just roll this effort into #3200, I've added your original issue as a requirement there. Are you ok with us closing this PR?

joejoe87577 · 2020-07-22T08:57:45Z

@michael-hawker yeah you can close this PR. Thanks.

Added OriginalMarkdown Property

5ccd6df

Added logic for OriginalMarkdown in different block types

Merge branch 'master' into dev

f2485a2

Bmartin2013 added parsers 🌱 improvements ✨ reviewers wanted labels Jan 14, 2019

Merge branch 'master' into dev

1a4bae5

nmetulev approved these changes Jan 14, 2019

View reviewed changes

Merge branch 'master' into dev

97363cb

Merge branch 'master' into dev

0897bdc

michael-hawker added this to the 6.0 milestone Mar 5, 2019

michael-hawker and others added 2 commits March 5, 2019 12:14

Merge branch 'master' into dev

75e7e93

Merge branch 'master' into dev

0899c6d

kbrons reviewed Mar 26, 2019

View reviewed changes

kbrons and others added 3 commits March 28, 2019 13:32

Merge branch 'master' into dev

f31887a

Merge branch 'master' into dev

0c328d7

Merge branch 'master' into dev

ac076bd

michael-hawker modified the milestones: 6.0, 6.1 Oct 29, 2019

Merge branch 'master' into dev

511972d

michael-hawker modified the milestones: 6.1, 7.0 Apr 2, 2020

Sergio0694 closed this Jul 23, 2020

Added OriginalMarkdown Property #2737

Added OriginalMarkdown Property #2737

Uh oh!

Conversation

joejoe87577 commented Jan 10, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Type

What is the current behavior?

What is the new behavior?

PR Checklist

Other information

Uh oh!

dnfclas commented Jan 10, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dnfclas commented Jan 10, 2019

Uh oh!

nmetulev commented Jan 14, 2019

Uh oh!

joejoe87577 commented Jan 15, 2019

Uh oh!

windowstoolkitbot commented Jan 30, 2019

Uh oh!

joejoe87577 commented Jan 30, 2019

Uh oh!

nmetulev commented Feb 1, 2019

Uh oh!

michael-hawker commented Feb 1, 2019

Uh oh!

windowstoolkitbot commented Feb 16, 2019

Uh oh!

kbrons left a comment

Choose a reason for hiding this comment

Uh oh!

michael-hawker commented Mar 17, 2020

Uh oh!

Sergio0694 commented Mar 17, 2020

Uh oh!

Sergio0694 commented Mar 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

michael-hawker commented Mar 20, 2020

Uh oh!

michael-hawker commented Apr 2, 2020

Uh oh!

MihaZupan commented Apr 2, 2020

Uh oh!

michael-hawker commented Jul 21, 2020

Uh oh!

joejoe87577 commented Jul 22, 2020

Uh oh!

Uh oh!

joejoe87577 commented Jan 10, 2019 •

edited

Loading

dnfclas commented Jan 10, 2019 •

edited

Loading

Sergio0694 commented Mar 17, 2020 •

edited

Loading