Skip to content
Snippets Groups Projects

Highest Throughput and Most Time consuming transaction scripts

Closed Paco Guzman requested to merge app-transactions into master
1 unresolved thread

Most time consuming

This will list the most time consuming transaction over the total duration time for the transactions in the last 24 hours

Screen_Shot_2016-08-16_at_14.05.59

Throughput

This will list the actions with more throughput in the last 24 hours

Screen_Shot_2016-08-16_at_14.06.28

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
1 # This script calculates the actions with highest throughput in the past 24
2 # hours.
3 require_relative '../config'
4
5 HOURS = (ENV.fetch('INTERVAL') { 24 }).to_i
6
7 rows = CLIENT.query(<<-SQL
8 SELECT count("duration")
9 FROM "rails_transactions"
10 WHERE time > NOW() - #{HOURS}h
11 GROUP BY "action";
  • Highest Throughput

    Screen_Shot_2016-07-18_at_19.56.14

    Most time consuming transactions

    Screen_Shot_2016-07-18_at_19.58.05

  • @pacoguzman which one is the most expensive on average. To disregard the fact that builds have 5M calls. Which one is "slow" with a low number of calls but still representative?

  • I think the quick answer is the ones that are in the second image but not in the first one. Projects::CommitController#show Projects:::RefsController#logs_tree ProjectsController#show Projects::BlobController#show but let's refine the scripts

  • Paco Guzman Added 1 commit:

    Added 1 commit:

    • 92454a96 - Highest Throughput and Most Time consuming transaction scripts
  • mentioned in merge request influxdb-management!6 (merged)

  • @yorickpeterse I decided to use the existing downsample data to get the data for these scripts, if you can take a look.

  • @pcarranza how do you want the info about slow endpoints but representative in the total number of requests (probably this means users request those endpoints)?

  • I'm good with whatever at this point, just an email, a doc, a txt file. Whatever helps us know what we need to monitor.

    Later on it would be interesting to have a living table that shows which are the ones we need to work on, as that list will change as we fix them.

  • Paco Guzman Added 1 commit:

    Added 1 commit:

    • 4847251d - Highest Throughput and Most Time consuming transaction scripts
  • @pacoguzman it should be representative, but we should also worry about the P99 and not about the sum of all the requests, what I mean by that is that I want to know the slowest endpoints, not the endpoints that summed up the most time because there are millions of really fast calls.

    Does that make sense?

  • @pcarranza what about this measure https://en.wikipedia.org/wiki/Apdex ? I guess we can set satisfied request those under 2s and tolerance to 4s but these numbers are something that we need to align as we improve the app, but as a start point should be good. The idea is move the numbers to get worst Apdex and then work again to move to 1.0 and move the numbers again and keep iterating. Does it make sense?

  • It's worth mentioning that over time I kinda want to drop this repository all together. That is, all this information should be in Grafana so everybody can view it without having to set up credentials and what not.

    Apdex is quite interesting and I have used it in the past somewhat, though not extensively. We could start with trying to calculate it in InfluxDB for some of the controllers/API endpoints.

  • @yorickpeterse how far are we from starting to get this data? can we merge and improve later?

    I'm good with dropping the repo later on.

  • @pacoguzman that's exactly what I want.

    Eventually we need to have monitoring in place to trigger an alarm whenever a request goes over the threshold of the SLO.

    But for now what we need is to bring order into the chaos.

  • @pcarranza In theory we should have the data as the apdex depends on the number of samples and a threshold (which we have to define ourselves). Using this we should be able to for example calculate the apdex per controller per a certain time interval. For the apdex I don't think we need to measure everything per minute, though it's of course still possible.

    Edited by yorickpeterse-staging
  • @pacoguzman So can we close this MR in favour of getting Apdex calculations in InfluxDB?

  • I guess if we only want to order endpoints based on user perceived performance like Apdex. I've been trying to figure out how to get the data on Grafana or in InfluxDB, but I wasn't able to get it, I've made a script to get the data from the last hour where we get all the transactions for the moment -> https://gitlab.com/gitlab-org/gitlab-ce/issues/19273#note_14143191.

    Do you have an idea on how to do this?

  • @yorickpeterse a general index will not tell me what page we need to work on. I need a list of pages sorted by the p99 request time.

  • @pcarranza Ah right. For that we just need to add a query that aggregates mean/p95/p99 timings per day (or another time frame larger than 1 minute), then we can sort the data in Grafana (InfluxDB can only sort by timestamp :/).

  • @pcarranza These queries have been set up and group data per day. I'll check back tomorrow and set up a Grafana dashboard once the data is there.

  • To clarify, the queries I set up group by action and not the request URI. Grouping by URIs will blow up the database I'm afraid.

  • I'm good with this as it, thanks @yorickpeterse

  • I'll close this MR in favour of the Grafana data mentioned above.

  • yorickpeterse-staging Status changed to closed

    Status changed to closed

  • Please register or sign in to reply
    Loading