Skip to content
Snippets Groups Projects

Separate rendering of Markdown reference links from redacting those the user doesn't have access to and extracting referenced objects

Merged Robert Speicher requested to merge rs-redactor-filter into master

Related: !1014 (merged)

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • @rspeicher Approach looks good to me :)

  • Robert Speicher Added 140 commits:

    Added 140 commits:

    • da489b43...ce30eb2b - 138 commits from branch master
    • 4e43ebc3 - Remove all permission checking from Reference filters
    • e3b6c920 - WIP - Add RedactorFilter
  • @DouweM Pushed an update.

    I had one TODO that I'd appreciate your help with.

  • Note that we'll need the redactor filter everywhere where markdown is rendered, not just where the rest of the processed output has been cached, e.g. Emails, events etc.

  • Robert Speicher Added 134 commits:

    Added 134 commits:

    • e3b6c920...20491498 - 132 commits from branch master
    • 75c4f2d9 - Remove all permission checking from Reference filters
    • e94cdb71 - WIP - Add RedactorFilter
  • Robert Speicher Added 268 commits:

    Added 268 commits:

    • e94cdb71...afb2e6f4 - 266 commits from branch master
    • 454d227b - Remove all permission checking from Reference filters
    • 7f753005 - Add RedactorFilter
  • Robert Speicher Added 6 commits:

    Added 6 commits:

    • 1ec68ea9 - Default user_can_reference? to true when no attributes match
    • 8db4628b - Rescue from RecordNotFound in RedactorFilter
    • 81fc63be - Remove current_user context from markdown and gfm helpers
    • 86f706b1 - Fix GitlabMarkdownHelper spec
    • ca8d2253 - Update MarkdownFeature support class
    • 5794d65a - Add post_process method to Gitlab::Markdown
  • Robert Speicher Added 5 commits:

    Added 5 commits:

    • 21cacb36 - Add RedactorFilter specs for invalid Group and Project references
    • e5d89c10 - Remove unnecessary current_user context from filter specs
    • 4bd92e68 - Return early from markdown and gfm when text is empty
    • 3b690891 - Basic support for an Atom-specific rendering pipeline
    • 48837850 - Remove last remaining reference_only_path usages
  • @DouweM Ready for your review tomorrow. I'm not loving how it's currently handling the pipeline option.

  • Douwe Maan
    Douwe Maan @DouweM started a thread on the diff
  • 50 context.reverse_merge!(
    50 51 path: @path,
    52 pipeline: :default,
    51 53 project: @project,
    52 54 project_wiki: @project_wiki,
    53 55 ref: @ref
    54 56 )
    55 57
    56 Gitlab::Markdown.render(text, context)
    58 html = Gitlab::Markdown.render(text, context)
    59 Gitlab::Markdown.post_process(html, pipeline: context[:pipeline], user: current_user)
    57 60 end
    58 61
    59 62 # TODO (rspeicher): Remove all usages of this helper and just call `markdown`
    60 63 # with a custom pipeline depending on the content being rendered
    61 64 def gfm(text, options = {})
    • The difference between markdown and gfm is currently quite unclear. You'd think gfm would to regular markdown + GitLab-specific markdown, while markdown would just do regular markdown, but that's not what they actually do.

    • I think gfm will be removed once we change all current usages of it to markdown with appropriate pipeline: :whatever arguments. It's only there now because that custom pipeline behavior hasn't been implemented yet.

  • Douwe Maan
    Douwe Maan @DouweM started a thread on the diff
  • 119 }
    105 120
    106 text.html_safe
    121 @pipeline.to_html(html, context).html_safe
    107 122 end
    108 123
    109 124 private
    110 125
    111 def self.renderer
    112 @markdown ||= begin
    113 renderer = Redcarpet::Render::HTML.new
    114 Redcarpet::Markdown.new(renderer, redcarpet_options)
    126 # Check if a pipeline enables the `only_path` context option
    127 #
    128 # Returns Boolean
    129 def self.only_path_pipeline?(pipeline)
  • Ughhh, just realized this design will break ReferenceExtractor because it depends on each reference filter taking a :current_user context option which is now gone. And RedactorFilter can't modify the results hash.

    Edited by Robert Speicher
  • @rspeicher Oh, that's just great. Idea: move away from push_result in the reference filter itself, re-add reference-type and reference-id HTML attributes, make the pipeline [UserReferenceFilter, RedactorFilter, ReferenceGathererFilter], where ReferenceGathererFilter will gather all remaining a.gfms that weren't deleted by the RedactorFilter and put them in the :references array.

    Complexity :cry:

  • I was slightly mistaken. RedactorFilter can modify the result hash, if it's part of the same pipeline as the filter that added to it (IssueReferenceFilter, for example).

    Now my problems are (so far):

    1. How do we remove referenced users that were added to the results via an unpermitted group reference?
    2. From ProjectA, UserA references UserB, but UserB doesn't have access to ProjectA. How do we identify and remove that user?
    1. Would be way easier with my proposed ReferenceGathererFilter, since we would still have the original group reference rather than an array of users. Arguable, this setup is cleaner than messing with the results array as well.
    2. Not a concern of ReferenceExtractor, this is handled by Participable#participants.
  • So, in theory:

    • A Group reference would have data-reference-type="group" data-reference-id="1" attributes.
    • A User reference would have data-reference-type="user" data-reference-id="4".
    • Everything else (let's use Issue as an example) would have data-reference-type="issue" data-reference-id="2" data-project-id="3", where data-project-id is used to prevent an additional database lookup?
    • What does an all reference have? data-reference-type="team" data-reference-id="[project id]"?

    Then RedactorFilter does the database lookups and permission checks, and then ReferenceGathererFilter sweeps up the remaining a.gfm links into the result hash?

    Edited by Robert Speicher
  • @rspeicher:

    What does an all reference have? data-reference-type="team" data-reference-id="[project id]"?

    Doesn't even need the data-reference-id. The point is that the ReferenceGathererFilter will know what data-reference-type corresponds to what lookup and what result hash.

    Edit: I don't like how ReferenceGathererFilter will have to know about looking up all types of references this way, like users/groups/all :/ Perhaps the a.gfm could have data-reference-filter-type="user" for UserReferenceFilter, and that class would be delegated to to return all users that that a.gfm references?

    Note that I don't have this completely figured out in my head yet, I think the best idea would just to start playing around with it and see how far you get.

    Edited by Douwe Maan
  • mentioned in issue #2054 (closed)

  • @DouweM I haven't gotten anywhere with this. Would you mind taking a swing?

  • @rspeicher I hope to have a little bit of time this weekend.

  • mentioned in issue #1697 (closed)

  • Douwe Maan Added 1 commit:

    Added 1 commit:

    • 12626f02 - Return strings where expected
  • @rspeicher cracks knuckles Let's see what I can do.

  • Douwe Maan Added 1 commit:

    Added 1 commit:

    • f42cfa9e - Refactor reference gathering to use a dedicated filter
  • @rspeicher I've got something working, based on my last few comments, that's relatively clean. Please review.

  • Douwe Maan Added 1 commit:

    Added 1 commit:

    • 72afcbcd - Always allow references to the current project
  • Robert Speicher
    Robert Speicher @rspeicher started a thread on commit f42cfa9e
  • 1 require 'gitlab/markdown'
    2 require 'html/pipeline/filter'
    3
    4 module Gitlab
    5 module Markdown
    6 # HTML filter that removes references to records that the current user does
    7 # not have permission to view.
    8 #
    9 # Expected to be run in its own post-processing pipeline.
    10 #
  • 19 19 escape_once(body)
    20 20 end
    21 21
    22 gfm_body = Gitlab::Markdown.gfm(escaped_body, project: @project, current_user: current_user)
    22 user = current_user if defined?(current_user)
  • 15 15 # Results:
    16 16 # :references - A Hash of references that were found and replaced.
    17 17 class ReferenceFilter < HTML::Pipeline::Filter
    18 def initialize(*args)
    19 super
    18 def self.user_can_reference?(user, node, context)
    19 if node.has_attribute?('data-project')
  • Robert Speicher Added 1 commit:

    Added 1 commit:

    • e0f9d3a5 - Update ReferenceFilter docs
  • Douwe Maan Added 2 commits:

    Added 2 commits:

    • e6419cec - Update inline doc
    • 9d066bc9 - Raise error when a ReferenceFilter doesn't implement referenced_by
  • Douwe Maan Added 1 commit:

    Added 1 commit:

    • 5dd77358 - Pass project to RedactorFilter
  • Douwe Maan mentioned in merge request !1578 (merged)

    mentioned in merge request !1578 (merged)

  • Douwe Maan Added 1 commit:

    Added 1 commit:

    • 251e1366 - Efficiently load multiple references of one type.
  • Douwe Maan Added 1 commit:

    Added 1 commit:

    • 93fcddd7 - Allow ReferenceExtractor to efficiently load references from multiple texts at once
  • Douwe Maan Added 1 commit:

    Added 1 commit:

  • Douwe Maan Added 1 commit:

    Added 1 commit:

    • d6fb96b9 - Have Issue#participants load all users mentioned in notes using a single query
  • @rspeicher Please merge !1568 (merged) before this one.

  • Note that the last couple of commits improve reference extraction performance, but aren't directly related to RedactorFilter.

  • Douwe Maan Added 1 commit:

    Added 1 commit:

    • d313a768 - Explicitly only parse references by specified filter
  • Douwe Maan Title changed from WIP - Add RedactorFilter to [WIP] Separate rendering of Markdown reference links from redacting those the user doesn't have access to and extracting referenced objects

    Title changed from WIP - Add RedactorFilter to [WIP] Separate rendering of Markdown reference links from redacting those the user doesn't have access to and extracting referenced objects

  • 37 37
    38 38 # Be aware that this method makes a lot of sql queries.
    39 39 # Save result into variable if you are going to reuse it inside same request
    40 def participants(current_user = self.author, project = self.project)
    40 def participants(current_user = self.author, project = self.project, load_lazy_references: true)
    41 41 participants = self.class.participant_attrs.flat_map do |attr|
    42 42 meth = method(attr)
    43
    44 43 value =
    45 if meth.arity == 1 || meth.arity == -1
    46 meth.call(current_user)
    44 if attr == :mentioned_users
    45 meth.call(current_user, load_lazy_references: false)
    • I think it makes more sense to define a custom method (e.g. mentioned_users_lazy or something like that) and have that take care of lazy loading. Right now the Participable concern has extra knowledge regarding mentioned_users when the concern itself doesn't really do anything with that. I added something similar for notes in db63c369.

    • @yorickpeterse Hmm, I see where you're coming from. mentioned_users still has to be special case though, since it takes the current_user so it only returns groups the current user can see.

      Also, Participable needs to know about lazy references because it needs to load them when it's called directly, but not when it's called recursively.

    • @yorickpeterse I changed something about this, but I'm not 100% happy about it: 4a5b7718

  • Douwe Maan Added 1 commit:

    Added 1 commit:

  • Douwe Maan Title changed from [WIP] Separate rendering of Markdown reference links from redacting those the user doesn't have access to and extracting referenced objects to Separate rendering of Markdown reference links from redacting those the user doesn't have access to and extracting referenced objects

    Title changed from [WIP] Separate rendering of Markdown reference links from redacting those the user doesn't have access to and extracting referenced objects to Separate rendering of Markdown reference links from redacting those the user doesn't have access to and extracting referenced objects

  • Douwe Maan Added 1 commit:

    Added 1 commit:

    • 4a5b7718 - Participable doesn't need to know about Mentionable
  • mentioned in merge request !1578 (merged)

  • Douwe Maan Added 3 commits:

    Added 3 commits:

    • ed41333a - Use Gitlab::Markdown.render with :pipeline option rather than different methods
    • da9746e5 - Enable caching of Gitlab::Markdown rendered result
    • b093cb35 - Use Gitlab::Markdown for Asciidoc and ReferenceExtractor pipelines
  • Douwe Maan Added 184 commits:

    Added 184 commits:

  • Douwe Maan Added 1 commit:

    Added 1 commit:

    • 925dc592 - Cache rendered contents of issues, MRs and notes
  • @rspeicher I implemented more powerful pipelines and caching.

  • @yorickpeterse Please have a look at the caching and other performance tweaks.

  • Douwe Maan Added 1 commit:

    Added 1 commit:

    • 528d2823 - Remove unused Gitlab::Markdown#cached? method
  • The new classes and methods need more inline documentation and tests. Fortunately the output of Gitlab::Markdown is thoroughly tested, so we should be OK.

  • Cross-posting Slack discussion:

    rspeicher [3:54 PM] I dunno about these changes! All of the get_ class methods and all_context_transformers seem super confusing.

    douwe [3:54 PM] Hehe, yeah I agree

    douwe [3:55 PM] I think some documentation would make it more obvious how they’re supposed to be ​_used_​, but the naming is currently very confusing

    douwe [3:56 PM] It probably makes more sense to implement these pipelines using something else than a single Hash

    douwe [3:56 PM] This was easy to implement and experiment with though

    rspeicher [3:56 PM] I just think it's weird that we're accounting for all of these different types in the case statements

    douwe [3:57 PM] I don’t know, that’s just parsing a flexible syntax

    rspeicher [3:57 PM] But does it need to be that flexible? This is all internal stuff, isn't it?

    douwe [3:58 PM] True

    douwe [4:00 PM] I will probably look at making that simpler tomorrow, or you can have a crack at it. My idea is to have a Pipeline class that can refer to other Pipelines and have a method to transform the context.

    rspeicher [4:00 PM] Yeah, is there a reason we aren't doing like FullPipeline, ReferenceExtractorPipeline, PostProcessPipeline, etc?

    rspeicher [4:01 PM] as separate classes

    douwe [4:01 PM] The reason is that I’ve been hacking at this for hours and am too tired to make good design decisions like that :stuck_out_tongue:

    rspeicher [4:01 PM] Hehe, fair enough

    rspeicher [4:04 PM] Do you think there's nailed-down stuff in that MR that's ready to go, and we might be better splitting experimental stuff into a separate one?

    douwe [4:05 PM] Yeah I think so

    rspeicher [4:05 PM] It's getting on the huge side

    douwe [4:05 PM] The ReferenceGatherer stuff is pretty solid. Could probably be tweaked as well, but that’s ready for merge as far as I’m concerned.

    douwe [4:05 PM] Please do https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/1568 first though GitLab Enterprise Edition

    rspeicher [4:05 PM] Ok

    douwe [4:05 PM] I’ll split it up between stable and experimental stuff tomorrow

    rspeicher [4:06 PM] Sounds good

  • The new classes and methods need more inline documentation and tests. Fortunately the output of Gitlab::Markdown is thoroughly tested, so we should be OK.

    It's thoroughly tested with one specific case: A user that has permission to view everything that's referenced. We'll need to be sure that the "redacting" cases are accounted for.

  • @rspeicher We have a redactor_filter_spec which tests project-based and user-based access.

  • @rspeicher We have a redactor_filter_spec which tests project-based and user-based access.

    Ok, I thought we were just talking about the feature spec.

  • Douwe Maan Added 17 commits:

    Added 17 commits:

  • @rspeicher Ah, you're right. Just knowing the RedactorFilter works on its own isn't enough.

  • Douwe Maan Title changed from Separate rendering of Markdown reference links from redacting those the user doesn't have access to and extracting referenced objects to [WIP] Separate rendering of Markdown reference links from redacting those the user doesn't have access to and extracting referenced objects

    Title changed from Separate rendering of Markdown reference links from redacting those the user doesn't have access to and extracting referenced objects to [WIP] Separate rendering of Markdown reference links from redacting those the user doesn't have access to and extracting referenced objects

  • Douwe Maan Added 1 commit:

    Added 1 commit:

    • 95f0440a - Merge branch 'master' into rs-redactor-filter
  • 39 39 # Save result into variable if you are going to reuse it inside same request
    40 40 def participants(current_user = self.author, project = self.project, load_lazy_references: true)
    41 41 participants = self.class.participant_attrs.flat_map do |attr|
    42 meth = method(attr)
    43 42 value =
    44 if attr == :mentioned_users
    45 meth.call(current_user, load_lazy_references: false)
    43 if attr.respond_to?(:call)
    44 instance_exec(current_user, &attr)
    46 45 else
    47 meth.call
    46 send(attr)
    48 47 end
    • One option here would be to use something like the following:

      def participants(project = self.project, load_lazy_references: true, *args)
        participants = self.class.participant_attrs.flat_map do |attr|
          method = method(attr)
      
          if method.arity == 0
            method.call
          else
            method.call(*args)
          end
        end
      end

      The various models including the Participable concern would then have to define a method mentioned_users(current_user). This has the benefit of being able to pass multiple arguments when needed. This would also remove the need for using any Procs instead of symbols as the participant attribute names.

      Edited by yorickpeterse-staging
    • That's basically what we used before:

      self.class.participant_attrs.flat_map do |attr|
        meth = method(attr)
      
        value =
          if meth.arity == 1 || meth.arity == -1
            meth.call(current_user)
          else
            meth.call
          end
      
        participants_for(value, current_user)
      end

      I'm not sure we want or need to pass arbitrary arguments, since we don't know what the methods look like that the model including Participable sets as participant. The Proc approach moves the responsibility for calling the method the way it is supposed to the the model itself.

      Edited by Douwe Maan
  • Douwe Maan mentioned in merge request !1602 (merged)

    mentioned in merge request !1602 (merged)

  • I returned this branch to its roots and moved all the experimental pipeline and caching stuff to !1602 (merged).

  • Douwe Maan Title changed from [WIP] Separate rendering of Markdown reference links from redacting those the user doesn't have access to and extracting referenced objects to Separate rendering of Markdown reference links from redacting those the user doesn't have access to and extracting referenced objects

    Title changed from [WIP] Separate rendering of Markdown reference links from redacting those the user doesn't have access to and extracting referenced objects to Separate rendering of Markdown reference links from redacting those the user doesn't have access to and extracting referenced objects

  • mentioned in issue #2969 (closed)

  • Douwe Maan Added 80 commits:

    Added 80 commits:

  • @DouweM If you catch the builds being green before I do go ahead and merge. Bonus points if you pick it into 8-1-stable for me too. :heart:

    I'll check it periodically.

  • Douwe Maan Status changed to merged

    Status changed to merged

  • @rspeicher Done and done!

  • Douwe Maan mentioned in commit d80a4d3c

    mentioned in commit d80a4d3c

  • Douwe Maan mentioned in commit 5ad3a274

    mentioned in commit 5ad3a274

  • Robert Speicher mentioned in commit bcd89a58

    mentioned in commit bcd89a58

  • Please register or sign in to reply
    Loading