Separate rendering of Markdown reference links from redacting those the user doesn't have access to and extracting referenced objects
Related: !1014 (merged)
Merge request reports
Activity
mentioned in issue #2054 (closed)
@rspeicher thanks for working on this :) I'll check it out tonight or tomorrow
@rspeicher Approach looks good to me :)
Added 140 commits:
- da489b43...ce30eb2b - 138 commits from branch
master
- 4e43ebc3 - Remove all permission checking from Reference filters
- e3b6c920 - WIP - Add RedactorFilter
- da489b43...ce30eb2b - 138 commits from branch
@DouweM Pushed an update.
I had oneTODO
that I'd appreciate your help with.Added 134 commits:
- e3b6c920...20491498 - 132 commits from branch
master
- 75c4f2d9 - Remove all permission checking from Reference filters
- e94cdb71 - WIP - Add RedactorFilter
- e3b6c920...20491498 - 132 commits from branch
Added 6 commits:
-
1ec68ea9 - Default
user_can_reference?
to true when no attributes match -
8db4628b - Rescue from
RecordNotFound
in RedactorFilter -
81fc63be - Remove
current_user
context from markdown and gfm helpers - 86f706b1 - Fix GitlabMarkdownHelper spec
- ca8d2253 - Update MarkdownFeature support class
- 5794d65a - Add post_process method to Gitlab::Markdown
Toggle commit list-
1ec68ea9 - Default
Added 5 commits:
- 21cacb36 - Add RedactorFilter specs for invalid Group and Project references
- e5d89c10 - Remove unnecessary current_user context from filter specs
- 4bd92e68 - Return early from markdown and gfm when text is empty
- 3b690891 - Basic support for an Atom-specific rendering pipeline
-
48837850 - Remove last remaining
reference_only_path
usages
Toggle commit list@DouweM Ready for your review tomorrow. I'm not loving how it's currently handling the
pipeline
option.50 context.reverse_merge!( 50 51 path: @path, 52 pipeline: :default, 51 53 project: @project, 52 54 project_wiki: @project_wiki, 53 55 ref: @ref 54 56 ) 55 57 56 Gitlab::Markdown.render(text, context) 58 html = Gitlab::Markdown.render(text, context) 59 Gitlab::Markdown.post_process(html, pipeline: context[:pipeline], user: current_user) 57 60 end 58 61 59 62 # TODO (rspeicher): Remove all usages of this helper and just call `markdown` 60 63 # with a custom pipeline depending on the content being rendered 61 64 def gfm(text, options = {}) 119 } 105 120 106 text.html_safe 121 @pipeline.to_html(html, context).html_safe 107 122 end 108 123 109 124 private 110 125 111 def self.renderer 112 @markdown ||= begin 113 renderer = Redcarpet::Render::HTML.new 114 Redcarpet::Markdown.new(renderer, redcarpet_options) 126 # Check if a pipeline enables the `only_path` context option 127 # 128 # Returns Boolean 129 def self.only_path_pipeline?(pipeline) Yes, that's a better place for it once I nail down the API.
Edited by Robert Speicher
Ughhh, just realized this design will break
ReferenceExtractor
because it depends on each reference filter taking a:current_user
context option which is now gone. And RedactorFilter can't modify the results hash.Edited by Robert Speicher@rspeicher Oh, that's just great. Idea: move away from
push_result
in the reference filter itself, re-addreference-type
andreference-id
HTML attributes, make the pipeline[UserReferenceFilter, RedactorFilter, ReferenceGathererFilter]
, whereReferenceGathererFilter
will gather all remaininga.gfm
s that weren't deleted by theRedactorFilter
and put them in the:references
array.Complexity
I was slightly mistaken.
RedactorFilter
can modify the result hash, if it's part of the same pipeline as the filter that added to it (IssueReferenceFilter
, for example).Now my problems are (so far):
- How do we remove referenced users that were added to the
result
s via an unpermitted group reference? - From ProjectA, UserA references UserB, but UserB doesn't have access to ProjectA. How do we identify and remove that user?
- How do we remove referenced users that were added to the
- Would be way easier with my proposed
ReferenceGathererFilter
, since we would still have the original group reference rather than an array of users. Arguable, this setup is cleaner than messing with the results array as well. - Not a concern of
ReferenceExtractor
, this is handled byParticipable#participants
.
- Would be way easier with my proposed
So, in theory:
- A Group reference would have
data-reference-type="group" data-reference-id="1"
attributes. - A User reference would have
data-reference-type="user" data-reference-id="4"
. - Everything else (let's use Issue as an example) would have
data-reference-type="issue" data-reference-id="2" data-project-id="3"
, wheredata-project-id
is used to prevent an additional database lookup? - What does an
all
reference have?data-reference-type="team" data-reference-id="[project id]"
?
Then
RedactorFilter
does the database lookups and permission checks, and thenReferenceGathererFilter
sweeps up the remaininga.gfm
links into the result hash?Edited by Robert Speicher- A Group reference would have
What does an all reference have?
data-reference-type="team"
data-reference-id="[project id]"
?Doesn't even need the
data-reference-id
. The point is that theReferenceGathererFilter
will know whatdata-reference-type
corresponds to what lookup and what result hash.Edit: I don't like how
ReferenceGathererFilter
will have to know about looking up all types of references this way, like users/groups/all :/ Perhaps thea.gfm
could havedata-reference-filter-type="user"
forUserReferenceFilter
, and that class would be delegated to to return all users that thata.gfm
references?Note that I don't have this completely figured out in my head yet, I think the best idea would just to start playing around with it and see how far you get.
Edited by Douwe Maanmentioned in issue #2054 (closed)
@DouweM I haven't gotten anywhere with this. Would you mind taking a swing?
@rspeicher I hope to have a little bit of time this weekend.
@DouweM Ping.
mentioned in issue #1697 (closed)
Added 1 commit:
- 12626f02 - Return strings where expected
@rspeicher cracks knuckles Let's see what I can do.
Added 1 commit:
- f42cfa9e - Refactor reference gathering to use a dedicated filter
@rspeicher I've got something working, based on my last few comments, that's relatively clean. Please review.
Added 1 commit:
- 72afcbcd - Always allow references to the current project
1 require 'gitlab/markdown' 2 require 'html/pipeline/filter' 3 4 module Gitlab 5 module Markdown 6 # HTML filter that removes references to records that the current user does 7 # not have permission to view. 8 # 9 # Expected to be run in its own post-processing pipeline. 10 # 19 19 escape_once(body) 20 20 end 21 21 22 gfm_body = Gitlab::Markdown.gfm(escaped_body, project: @project, current_user: current_user) 22 user = current_user if defined?(current_user) 15 15 # Results: 16 16 # :references - A Hash of references that were found and replaced. 17 17 class ReferenceFilter < HTML::Pipeline::Filter 18 def initialize(*args) 19 super 18 def self.user_can_reference?(user, node, context) 19 if node.has_attribute?('data-project') Added 1 commit:
- e0f9d3a5 - Update ReferenceFilter docs
@rspeicher Ping
Added 1 commit:
- 5dd77358 - Pass project to RedactorFilter
mentioned in merge request !1578 (merged)
Added 1 commit:
- 251e1366 - Efficiently load multiple references of one type.
Added 1 commit:
- 93fcddd7 - Allow ReferenceExtractor to efficiently load references from multiple texts at once
Added 1 commit:
- cd2583a3 - Code cleanup
Added 1 commit:
- d6fb96b9 - Have Issue#participants load all users mentioned in notes using a single query
@rspeicher Please merge !1568 (merged) before this one.
Added 1 commit:
- d313a768 - Explicitly only parse references by specified filter
37 37 38 38 # Be aware that this method makes a lot of sql queries. 39 39 # Save result into variable if you are going to reuse it inside same request 40 def participants(current_user = self.author, project = self.project) 40 def participants(current_user = self.author, project = self.project, load_lazy_references: true) 41 41 participants = self.class.participant_attrs.flat_map do |attr| 42 42 meth = method(attr) 43 44 43 value = 45 if meth.arity == 1 || meth.arity == -1 46 meth.call(current_user) 44 if attr == :mentioned_users 45 meth.call(current_user, load_lazy_references: false) I think it makes more sense to define a custom method (e.g.
mentioned_users_lazy
or something like that) and have that take care of lazy loading. Right now the Participable concern has extra knowledge regardingmentioned_users
when the concern itself doesn't really do anything with that. I added something similar for notes in db63c369.@yorickpeterse Hmm, I see where you're coming from.
mentioned_users
still has to be special case though, since it takes thecurrent_user
so it only returns groups the current user can see.Also,
Participable
needs to know about lazy references because it needs to load them when it's called directly, but not when it's called recursively.@yorickpeterse I changed something about this, but I'm not 100% happy about it: 4a5b7718
Added 1 commit:
- 61d8f961 - Fix specs
Added 1 commit:
- 4a5b7718 - Participable doesn't need to know about Mentionable
mentioned in merge request !1578 (merged)
Added 184 commits:
-
b093cb35...0b9273a2 - 183 commits from branch
master
- 539de0dd - Merge branch 'master' into rs-redactor-filter
-
b093cb35...0b9273a2 - 183 commits from branch
Added 1 commit:
- 925dc592 - Cache rendered contents of issues, MRs and notes
@rspeicher I implemented more powerful pipelines and caching.
@yorickpeterse Please have a look at the caching and other performance tweaks.
Added 1 commit:
- 528d2823 - Remove unused Gitlab::Markdown#cached? method
Cross-posting Slack discussion:
rspeicher [3:54 PM] I dunno about these changes! All of the
get_
class methods andall_context_transformers
seem super confusing.douwe [3:54 PM] Hehe, yeah I agree
douwe [3:55 PM] I think some documentation would make it more obvious how they’re supposed to be _used_, but the naming is currently very confusing
douwe [3:56 PM] It probably makes more sense to implement these pipelines using something else than a single Hash
douwe [3:56 PM] This was easy to implement and experiment with though
rspeicher [3:56 PM] I just think it's weird that we're accounting for all of these different types in the
case
statementsdouwe [3:57 PM] I don’t know, that’s just parsing a flexible syntax
rspeicher [3:57 PM] But does it need to be that flexible? This is all internal stuff, isn't it?
douwe [3:58 PM] True
douwe [4:00 PM] I will probably look at making that simpler tomorrow, or you can have a crack at it. My idea is to have a Pipeline class that can refer to other Pipelines and have a method to transform the context.
rspeicher [4:00 PM] Yeah, is there a reason we aren't doing like
FullPipeline
,ReferenceExtractorPipeline
,PostProcessPipeline
, etc?rspeicher [4:01 PM] as separate classes
douwe [4:01 PM] The reason is that I’ve been hacking at this for hours and am too tired to make good design decisions like that
rspeicher [4:01 PM] Hehe, fair enough
rspeicher [4:04 PM] Do you think there's nailed-down stuff in that MR that's ready to go, and we might be better splitting experimental stuff into a separate one?
douwe [4:05 PM] Yeah I think so
rspeicher [4:05 PM] It's getting on the huge side
douwe [4:05 PM] The ReferenceGatherer stuff is pretty solid. Could probably be tweaked as well, but that’s ready for merge as far as I’m concerned.
douwe [4:05 PM] Please do https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/1568 first though GitLab Enterprise Edition
rspeicher [4:05 PM] Ok
douwe [4:05 PM] I’ll split it up between stable and experimental stuff tomorrow
rspeicher [4:06 PM] Sounds good
The new classes and methods need more inline documentation and tests. Fortunately the output of Gitlab::Markdown is thoroughly tested, so we should be OK.
It's thoroughly tested with one specific case: A user that has permission to view everything that's referenced. We'll need to be sure that the "redacting" cases are accounted for.
@rspeicher We have a
redactor_filter_spec
which tests project-based and user-based access.@rspeicher We have a redactor_filter_spec which tests project-based and user-based access.
Ok, I thought we were just talking about the feature spec.
Added 17 commits:
-
528d2823...123669a5 - 16 commits from branch
master
- ef44138c - Merge branch 'master' into rs-redactor-filter
-
528d2823...123669a5 - 16 commits from branch
@rspeicher Ah, you're right. Just knowing the
RedactorFilter
works on its own isn't enough.Added 1 commit:
- 95f0440a - Merge branch 'master' into rs-redactor-filter
39 39 # Save result into variable if you are going to reuse it inside same request 40 40 def participants(current_user = self.author, project = self.project, load_lazy_references: true) 41 41 participants = self.class.participant_attrs.flat_map do |attr| 42 meth = method(attr) 43 42 value = 44 if attr == :mentioned_users 45 meth.call(current_user, load_lazy_references: false) 43 if attr.respond_to?(:call) 44 instance_exec(current_user, &attr) 46 45 else 47 meth.call 46 send(attr) 48 47 end One option here would be to use something like the following:
def participants(project = self.project, load_lazy_references: true, *args) participants = self.class.participant_attrs.flat_map do |attr| method = method(attr) if method.arity == 0 method.call else method.call(*args) end end end
The various models including the
Participable
concern would then have to define a methodmentioned_users(current_user)
. This has the benefit of being able to pass multiple arguments when needed. This would also remove the need for using any Procs instead of symbols as the participant attribute names.Edited by yorickpeterse-stagingThat's basically what we used before:
self.class.participant_attrs.flat_map do |attr| meth = method(attr) value = if meth.arity == 1 || meth.arity == -1 meth.call(current_user) else meth.call end participants_for(value, current_user) end
I'm not sure we want or need to pass arbitrary arguments, since we don't know what the methods look like that the model including
Participable
sets asparticipant
. The Proc approach moves the responsibility for calling the method the way it is supposed to the the model itself.Edited by Douwe Maan
mentioned in merge request !1602 (merged)
I returned this branch to its roots and moved all the experimental pipeline and caching stuff to !1602 (merged).
mentioned in issue #2969 (closed)
Added 80 commits:
-
95f0440a...bd3689e9 - 79 commits from branch
master
- 34148d15 - Merge branch 'master' into rs-redactor-filter
-
95f0440a...bd3689e9 - 79 commits from branch
@DouweM If you catch the builds being green before I do go ahead and merge. Bonus points if you pick it into
8-1-stable
for me too.I'll check it periodically.
@rspeicher Done and done!
mentioned in commit d80a4d3c
mentioned in commit 5ad3a274
mentioned in commit bcd89a58