Audit logging is a security feature and is critical for customers and is required by many regulatory bodies. Enterprises—especially in regulated industries that need to show accurate logs of data and application access—may hesitate to use a software for this very reason.
This is the kind of features that add credibility to the Enterprise version.
We are going to improve our audit events in each release.
Proposal
Audit events will be recorded on the database, not logs (see comment).
@regisF A log file can quickly grow in size and requires non trivial log parsing to visualise or permanently the data. The fastest/most natural solution that I can think of from the top of my head is to:
Track all events for a request in memory (local to the request)
At the end of a request, schedule this data using Sidekiq
Have a Sidekiq worker process this data, inserting it, maybe adding extra data in the process, etc
Flush the data once scheduled
Having Sidekiq perform the storing of the data is better than doing this in a request. This is due to scheduling a job in Sidekiq being much faster than inserting (potentially) hundreds of rows into a database.
This is similar to how we track performance data using GitLab Performance Monitoring. Everything we measure is basically stored in an array which is then sent to InfluxDB at the end of a request.
@yorickpeterse I've heard about the risk of losing jobs with Sidekiq. Losing an audit event is definitely not something that we want in this case. Is this a potential issue?
The process dies immediately (e.g. the process dies, the kernel crashes, the server explodes, etc) or is SIGKILL'd by another process/user
Other solutions suffer from similar (and other) problems. Log files may get rotated because they're too big, a disk could crash without being recoverable, etc. Inserting in a database can also lead to data loss if the database crashes during the write, etc.
Handling the losing of jobs is something we have been looking into, but we don't have a solution yet. Sidekiq Enterprise has a reliable job fetching system, but it's not FOSS so we can't use/ship it.
figure out where previous versions are (so we don't suddenly lose data due to a rotation)
store the data in some kind of format and then parse it upon reading
efficiently read/parse the file so we don't load GBs of data into memory
ensure that multiple processes/threads can store data in the same log file without overwriting each other's data. This can be done using synchronisation, but this is very slow
configure all of this in Omnibus, document it, make sure it's backed up, etc, etc
figure out a way to migrate data in these files should we ever change the format in a way that's not backwards compatible
When using a database we need to:
Store the data in the database, using the tools we already have in place
Get the data from the database, using the tools we already have
Make sure we remove data older than N months so the table doesn't grow forever
In other words, when using the database we save ourselves a lot of coding trouble.
for expanding the current audit events for this purpose. That is the original proposal in the issue, too. Then we can easily show these events to users in the UI, and we can also add some export functionality to satisfy global auditing requirements.
@regisF Currently, log_audit_events is a synchronous event - it is not sent to Sidekiq so there's no concern of a lost event.
I've put the Platform label @mydigitalself. I think it relates to the platform. I'm not sure we'll have bandwidth for this in 8.17 but if we don't, we should plan this in the future.
We're going to try and target this for 9.3 (May 22nd 2017). We can't confirm quite yet if it will make that date, but will certainly try as this is an important additional to GitLab EE.
There are two use cases to consider. One is an administrator trying to understand what has happened. The other is for audit compliance. For audit compliance, you CANNOT (purely) store the logs in the "system". By this I mean if the audit data is stored in GitLab's own database, that means they are controlled internally by GitLab and can be subject to discrepancy. For audit compliance, you MUST send the data to an external source, a log file is sometimes acceptable, but what is more acceptable is being able to forward the logs on to an external system such as Splunk or Logstash.
I would also suggest that if we are logging to a file that we save the data in a structured format (e.g. JSON) so that log forwarding tools can search/query this data easily.
@mydigitalself So we have to decide between storing the audit events in a log vs. the database? I think @yorickpeterse his comment as linked from the description makes sense https://gitlab.com/gitlab-org/gitlab-ee/issues/579#note_20021796 and storing them in the DB is the easiest thing to do. I thought that most customers would like to see us storing more events, not necessarily storing them outside the DB. For Geo DR having it in the database would also be much better.
You talk about needing it externally because audit compliance, I have a few questions:
Was this expressed by many customers?
Can we get there by just streaming the DB or snapshots of it to an external system?
Can we get there by instrumenting the Rails model for audit events, using PostgreSQL triggers, or something like that?
As a sideline to the above. I feel we have to make the right decision for both system separately. If we feel audit logs (Mike mentioned the audit requirements to me as well) need to go towards this change, whereas we need db-level logging for disaster recovery, we should not force either into the other. That'll result in a product/feature that is hard to iterate on.
That said, if we can make both work together, that'd be beautiful killing of two birds with one stone.
I've personally just spoken to the one so far who aren't an existing customer yet, but they immediately said for any type of banking-related regulatory requirements you cannot store audit data in the same data store that the system resides in, which makes complete sense. So you will need log forwarding regardless in the banking space. @teemo also mentioned customer requirements to send the logs to their central auditing system https://gitlab.com/gitlab-org/gitlab-ee/issues/579#note_23164332. By sending to syslog, it's then pretty simple to forward to Splunk/Logstash as it's an established mechanism.
You could, but it's additional effort when you can already use standard log tools.
To me, using syslog seems like an established boring solution, at least for the first phase, so I'm a bit puzzled as to why we would look to add complexity to the approach. This Audit Logging is not about DR, Geo or auditing every single action with content that takes place in the system, it's to provide compliance around permission, access and top-level events and I think by trying to do more, you add unnecessary requirements and complicate shipping something that we can iterate on in the future.
@mydigitalself Currently we don't use syslog directly, instead we either log to STDERR or to a log file.
I think there are two different feature requests here that are being mixed together:
Being able to store a trail of operations for auditing purposes
Being able to replicate this log elsewhere
Because different users may have different requirements for feature 2 I would propose first tackling feature one (by storing data in the database). Replicating database data to external sources is easy, and it's much easier to visualize. Operating on log files on the other hand (e.g. when displaying the log for a user) is a total nightmare.
For exporting the audit log I would start with something as simple as just providing an API that extracts the data from the database. This allows customers to build custom solutions (if they so desire). Based on commonly requested integrations we can then see what makes sense to support out of the box.
Either way, I strongly recommend against using log files as the primary source. Visualising this data, filtering it, etc, is a total pain.
@mydigitalself to clarify, not allowing one to store the audit data somewhere, it's fine if it's replicated elsewhere, right?
The arguments by @yorickpeterse sound pretty strong in favor for a database approach, but it assumes it's relatively easy to export our audit log from a database into something that can be used by applications typically used by enterprises. Is there some way we can validate that? Are there examples of similar approaches, or @yorickpeterse how would such a thing look, a db-based audit log that forwards to a logging solution?
We should solve the problems of our enterprise customers with the audit log. If their problem is having auditing information in a central logging service that expects X, we should make sure X is something we can easily have as an output.
An API that customers can use themselves to build whatever exporter they want
Some form of recurring background job that e.g. every hour takes the log data of the past hour, then formats this and sends it somewhere. For example, this job could take the logs and write them to a plain text log file. The format to use, when to export it, etc, could be adjusted in the admin panel or some config file (depending on what works best).
How we do part 2 depends a bit on what makes the most sense for our customers, I unfortunately can't really judge on this.
@JobV@yorickpeterse my concern with the database approach as below is that there seem to be a lot of moving parts, with the potential for data loss to occur.
- Track all events for a request in memory (local to the request)- At the end of a request, schedule this data using Sidekiq- Have a Sidekiq worker process this data, inserting it, maybe adding extra data in the process, etc- Flush the data once scheduled
Given this is audit data, the number 1 requirement for it should be reliability - i.e. that the risk of losing data is negligible. If it's possible for the queue to be tampered with, or deleted, or if the server falls over before the data has been put onto the queue or any other imaginable situation where the integrity of the data can be called into question, then you are failing the primary requirement. Searchability, filtering, etc... are secondary requirements and are also functions that can be performed by other systems such as Kibana, Logstash, etc...
One approach could be to use fluentd, which has countless input and output plugins to send the logs off to almost anywhere.
my concern with the database approach as below is that there seem to be a lot of moving parts, with the potential for data loss to occur.
This is no different when using a log file. A write may fail, a log file may be removed, a disk may explode, a process may be killed while writing the log data. Effectively the amount of possible problems is similar. We don't have to use Sidekiq for the DB approach, it was merely a suggestion. An alternative would be to insert all audit logs using a single INSERT (this would be absolutely crucial, we can't use multiple INSERTs for this as this is terrible performance wise) at the end of a request. So here the setup would be:
Track audit events in memory during a request
At the end of a request, flush these events to the database using a single INSERT statement
Separately we can then flush this data to external services periodically
Searchability, filtering, etc... are secondary requirements and are also functions that can be performed by other systems such as Kibana, Logstash, etc
We shouldn't require customers to install Kibana and related tools just to look at their audit logs. If GitLab wants to provide a solit auditing experience we should provide ways of viewing the data, and not just ways to write the data.
This is no different when using a log file. A write may fail, a.... etc
Of course, I appreciate this can all happen, it just felt to me that we were adding more layers of complexity when a simple solution can suffice.
We shouldn't require customers to install Kibana and related tools just to look at their audit logs. If GitLab wants to provide a solit auditing experience we should provide ways of viewing the data, and not just ways to write the data.
@yorickpeterse i'm not proclaiming that customers have to install Kibana to view their logs, they can have a simple log viewer in GitLab the same way we have the other logs available in the admin interface. If there's a need for more sophisticated visualisation, searching, then those tools could be appropriate, rather than re-inventing the wheel.
I'll leave it entirely up to the technology team to make the decision here, I think I've expressed my opinion and if you believe that using the db for storage will be as reliable and easy to stream into external systems, then I'm fine with that.
I would like someone to evaluate fluentd, it may be a good way of pushing BOTH to a db and an external source with minimal effort.
Modify AuditEventService, which currently writes directly to the DB, to track all events for a request in memory (local to the request)
At the end of a request, schedule this data using Sidekiq in something like an AuditEventFlusherWorker
Have the Sidekiq worker process this data, iterating over an array of FLUSHERS (subclasses of something like Gitlab::BaseFlusher) and passing each the data to handle as they see fit
Implement a Gitlab::DatabaseFlusher which writes the events to the DB just like AuditEventService does currently
Make sure the DB table doesn't grow indefinitely by implementing a (configurable?) limit of number of months of data
Implement a Gitlab::LogFileFlusher which writes the events to application.log in a human readible format
Replace all Gitlab::AppLogger calls with AuditEventService calls. This will likely involve changing the data model of AuditEvent and AuditEventService
Make sure "Project > Settings > Audit Events" (EE) and "Group > Settings > Audit Events" (EE) can show all types of events
Rename "Profile Settings > Audit Log" to "Authentication Log" and only show authentication AuditEvents for the current user (This replaces https://gitlab.com/gitlab-org/gitlab-ce/issues/30827, since it's not as easy as simply renaming that page.)
Add "Admin Area > Monitoring > Audit Events" to display this audit event data for the entire instance. "Admin Area > Logs > application.log" will continue to show the raw log.
For 9.4
Implement First Step from above description, to the extent where these are not already implemented:
Authentication log (currently in GitLab CE for individual users)
Activity log (to show dashboard)
Geo log (consensus is to make this a separate thing, that makes sense)
Are we deprecating any of the above? I don't hear anyone talking about the DB audit log. But I assume we can't live without it. Should the DB and File audit log contain exactly the same types of events to make our lives simpler?
@DouweM Would we need Sidekiq, or can we somehow insert the data in-request in the most efficient way possible? Using Sidekiq we'd end up with between 20 000 and 30 000 jobs per minute (more or less equal to the number of Rails requests), permanently occupying workers in the process. I'd prefer for something like the following:
We start a request
We perform a bunch of events, the audit events are stored in memory
At the end of the request we insert these events into the DB using a single INSERT
If file logging (or something else) is enabled we also write the data to the file as this is a fairly quick operation
We only use Sidekiq if we need to ship audit data to an external service, if we need this at all
@sytses Note that "DB audit log" and "Authentication log" are actually the same thing, and that we already have a "File audit log" in application.log. My proposal unifies those.
I'd like to add another use case along with what's mentioned above. I'd like to know the last time a user performed an action against any piece of the overall GitLab platform. My current problem is I don't necessarily know who is active or not. I'm fairly certain I have users that have logged in 2, 3 or even 4 years ago and have not logged in since. However they could potentially still log in as their identity attribute still maps to a valid entry in the authentication source. I also have to worry about service accounts (not people) that don't ever log into the WebUI but do git push/pulls all the time.
I would consider the following actions to count towards this last action date:
Sign into WebUI
Git push/pull via ssh (if non-public repo)
Git push/pull via https (if non-public repo)
Access API endpoints (via password or token)
Upload Docker image to GitLab Registry
Mattermost … ?? … I don’t actually use mattermost, I’m assuming it’s the same set of users? So however people access this ;)
I think this goes along with the above auditing requests? I wouldn't need it in real time, generating a report of sorts would be acceptable.
@jameslopez Ahh thanks, didn't realize that endpoint was available. Is this data different then the "last sign-in at" date in the WebUI for a user in the admin section?