How to measure how actively users cohorts are using GitLab
Dev: https://dev.gitlab.org/gitlab/gitlabhq/issues/2413
Sytse
dzaporozhets is it possible to count people that have not logged in through the web interface but have pushed or cloned via http or ssh?
Dmitriy
No. We need to implement some
last_git_contact_at:timestamp
field for User model and set it during interaction. Only after this we will be able to select users based on last time they pushed
Sytse
I think this will be good to do. People I talked to never use the web interface. If we show activity numbers from only web users please will assume they have less users. Schedule last_git_contact_at:timestamp for 7.13 or 14?
Sytse
This is needed to get accurate numbers in statistics issue https://dev.gitlab.org/gitlab/gitlab-ee/issues/312
Dmitriy
As I understand we want this to count
active users
asweb active users
+git active users
, right?
Also in issue description I saw
cloned via http or ssh
. I think we should count only write operations. Read operations can be public (by anonymous users or CI) which makes no sense to count as user activity
I checked
gitlab-shell
code. I was wrong - we can check read operations for both http and ssh
Sytse
Lets track clones as well, some people might only clone and push to another system, in this case we want people see that they are active so they are not blocked.
Sytse
First step might be a query to show which users have signed up to GitLab each month. After that we can start showing activity of each of these cohorts. From our customer:
Thanks to the wonders of Stack Overflow, here's a query that produces a sum of users with their "created_at" timestamp on each of the past 365 days:
WITH dates_table AS (
SELECT created_at::date AS date_column FROM users
)
SELECT series_table.date, COUNT(dates_table.date_column), SUM(COUNT(dates_table.date_column)) OVER (ORDER BY series_table.date desc) FROM (
SELECT (last_date - b.offs) AS date
FROM (
SELECT GENERATE_SERIES(0, last_date - first_date, 1) AS offs, last_date from (
SELECT MAX(date_column) AS last_date, (MAX(date_column) - '1 year'::interval)::date AS first_date FROM dates_table
) AS a
) AS b
) AS series_table
LEFT OUTER JOIN dates_table
ON (series_table.date = dates_table.date_column)
GROUP BY series_table.date
ORDER BY series_table.date desc
date | count | sum
------------+-------+------
2015-06-26 | 22 | 22
2015-06-25 | 26 | 48
2015-06-24 | 39 | 87
2015-06-23 | 31 | 118
2015-06-22 | 31 | 149
2015-06-21 | 0 | 149
SNIP