If we add more data to this customer success can better help customers.
None of the data should be sensitive (project names, etc.) and everything should be transparent (visible in the JSON string show to the user).
I propose we send the total number of items for:
Comments
Groups
Users
Projects
Issues
Labels
CI Jobs or builds
Snippets
Milestones
Todos
Pushes
Merge requests
Environments
Triggers
Deploy keys
Pages
Project Services
We should ensure the above are all simple DB queries, if something is complex we shouldn't sent it.
Our version.gitlab.com should do the calculations, such as 'how many comments got made in the last 30 days', and 'did people suddenly stop using something'?
Can we have the ping data for CE as well? This will help us understand more of what our subscribers are using, both EE and CE, making us more data-driven. We can then make decisions based on data. What we invest resources on building more of. How we better market certain features or products to increase adoption.
I propose that I write a blog post about the CE usage ping and why we want it and then we build it. I think the community reaction will tell us whether it's a good idea.
We'll make it opt-out and make it clear with the upgrade guid.e
@JobV I think that makes a lot of sense, meaning the blog post from you and making it clear that sending us data is optional. I would suggest we think about the messaging to educate users on why we are tracking usage and how we will use the data. Examples, to improve the product, understand what features are being used, improve our marketing around features, help identify new potential features and products.
what about gitlab.com? Since we are hosting this, we should be able to collect usage data as well so we can better understand this user group and maybe compare usage as the user got to CE or EE from .com.
At the moment it seems that the ping usage is linked to a license. Is that mandatory? That won't work for CE.
Simple data we can collect (missing from @sytses list):
the number of issues
the number of labels
Other point: I get that we shouldn't send heavily computed data. However, here is my take on this. Let's say we decide to improve labels. Apart from feedback we read here and there, we don't know for sure the real usage of labels in the wild. If we wanted to change labels management, we would need answers to those questions first:
How many labels are there per project?
How many labels are associated to an issue on average?
Do merge requests have labels? If so, on many on average?
Do people use priority labels?
...
It's not about measuring everything, because we can easily get lost in metrics and statistics, and waste a huge amount of time on these. But simply knowing the number of labels in total for an account won't nearly help us understand how labels are used.
So it's great to have stats about product adoption. But we definitely need more data than this.
@regisF I agree with your comments, but let's start small. Can you make a concrete proposal of something we can ship in 8.13? Make sure to include necessary changes to version.gitlab.com.
We should do the blog post about considering the CE check only after we launched the extended EE check. This will make it easier to reason about for everyone.
This check will help us see the activity of GitLab.com
We should check our terms and privacy policy and write about this is clear language (link to the documentation).
Add the number of issues and the number of labels => great point, I added it to the description
Things like 'How many labels are there per project?' => interesting but too computationally intensive, might cause problems, maybe in a later iteration
@regisF@JobV now hat we have begun this for EE, shall we begin collecting data on CE users to help us better understand what features CE users are using and the growth and adoption that is happening in those accounts. With this data we can create and push content to help drive usage of certain features (i.e CI or Issue Boards) to help customers see more value in the product. We can use this data to identify who may find value in EE and push content teaching them what is possible with EE that they may not be aware of.