Since the deployment of RC2 the CPU usage on the secondaries has increased from around 20% to around 70%:
Looking at the tuple statistics one particular secondary shows this pattern:
The big increase here is the number of sequentially scanned rows in the users table. This pattern occurs on both secondaries. This increase in load suggests we're doing something bad, something that might even affect availability if we don't solve it.
SELECTCOUNT(*)FROM "users"WHERE ("users"."state" IN (?))AND ("users"."ghost" = ? OR "users"."ghost" IS NULL)AND ("users"."support_bot" = ? OR "users"."support_bot" IS NULL)AND (( id IN (SELECT "members"."user_id" FROM "members" WHERE "members"."source_type" = ? AND "members"."type" IN (?) AND "members"."source_id" = ? AND "members"."source_type" = ? AND "members"."type" IN (?) AND "members"."requested_at" IS NULL AND (access_level > ?) ORDER BY "members"."id" DESC)OR id IN ( SELECT "members"."user_id" FROM "members" WHERE "members"."source_type" = ? AND "members"."type" IN (?) AND "members"."source_id" = ? AND "members"."source_type" = ? AND "members"."type" IN (?) AND "members"."requested_at" IS NULL AND (access_level > ?) ORDER BY "members"."id" DESC) )AND id NOT IN (SELECT "approvals"."user_id" FROM "approvals" WHERE "approvals"."merge_request_id" = ?)) AND ("users"."id" != ?)
gitlabhq_production=# select * from ( select mean_time, rows, substring(query, 0, 250) as query from pg_stat_statements where query like '%users%' order by mean_time desc limit 50 ) sub order by rows desc; mean_time | rows | query ------------------+---------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 1125.91002621289 | 1118304 | SELECT COUNT(*) FROM "users" WHERE ("users"."state" IN (?)) AND ("users"."ghost" = ? OR "users"."ghost" IS NULL) AND ("users"."support_bot" = ? OR "users"."support_bot" IS NULL) AND ((id IN (SELECT "members"."user_id" FROM "members" WHERE "members". 1668.21841030477 | 213874 | SELECT "members".* FROM "members" LEFT OUTER JOIN "users" ON "members"."user_id" = "users"."id" WHERE (members.id IN (SELECT "members"."id" FROM "members" WHERE "members"."source_type" = ? AND "members"."type" IN (?) AND "members"."source_id" = ? A 1641.20059076275 | 213777 | SELECT "members".* FROM "members" LEFT OUTER JOIN "users" ON "members"."user_id" = "users"."id" WHERE (members.id IN (SELECT "members"."id" FROM "members" WHERE "members"."source_type" = ? AND "members"."type" IN (?) AND "members"."source_id" = ? A 1163.71512753037 | 136837 | SELECT "events"."id" AS t0_r0, "events"."target_type" AS t0_r1, "events"."target_id" AS t0_r2, "events"."title" AS t0_r3, "events"."data" AS t0_r4, "events"."project_id" AS t0_r5, "events"."created_at" AS t0_r6, "events"."updated_at" AS t0_r7, "eve 1172.46931805999 | 110781 | SELECT COUNT(*) FROM "users" WHERE ("users"."state" IN (?)) AND ("users"."ghost" = ? OR "users"."ghost" IS NULL) AND ("users"."support_bot" = ? OR "users"."support_bot" IS NULL) AND ((id IN (SELECT "members"."user_id" FROM "members" WHERE "members". 1733.1016822313 | 108585 | SELECT "members".* FROM "members" LEFT OUTER JOIN "users" ON "members"."user_id" = "users"."id" WHERE (members.id IN (SELECT "members"."id" FROM "members" WHERE "members"."source_type" = ? AND "members"."type" IN (?) AND "members"."source_id" = ? A 1216.58198725337 | 96496 | SELECT COUNT(*) FROM "users" WHERE ("users"."state" IN (?)) AND ("users"."ghost" = ? OR "users"."ghost" IS NULL) AND ("users"."support_bot" = ? OR "users"."support_bot" IS NULL) AND ((id IN (SELECT "members"."user_id" FROM "members" WHERE "members". 1223.87329185551 | 67297 | SELECT COUNT(*) FROM "users" WHERE ("users"."state" IN (?)) AND ("users"."ghost" = ? OR "users"."ghost" IS NULL) AND ("users"."support_bot" = ? OR "users"."support_bot" IS NULL) AND ((id IN (SELECT "members"."user_id" FROM "members" WHERE "members". 1235.07720831485 | 40578 | SELECT COUNT(*) FROM "users" WHERE ("users"."state" IN (?)) AND ("users"."ghost" = ? OR "users"."ghost" IS NULL) AND ("users"."support_bot" = ? OR "users"."support_bot" IS NULL) AND ((id IN (SELECT "members"."user_id" FROM "members" WHERE "members". 1643.07963777846 | 22076 | SELECT "members".* FROM "members" LEFT OUTER JOIN "users" ON "members"."user_id" = "users"."id" WHERE "members"."type" IN (?) AND "members"."source_type" = ? AND (members.id IN (SELECT "members"."id" FROM "members" WHERE "members"."source_type" = ? 1243.26599628695 | 21007 | SELECT COUNT(*) FROM "users" WHERE ("users"."state" IN (?)) AND ("users"."ghost" = ? OR "users"."ghost" IS NULL) AND ("users"."support_bot" = ? OR "users"."support_bot" IS NULL) AND ((id IN (SELECT "members"."user_id" FROM "members" WHERE "members". 1104.08754346128 | 19201 | SELECT COUNT(*) FROM "users" WHERE ("users"."state" IN (?)) AND ("users"."ghost" = ? OR "users"."ghost" IS NULL) AND ("users"."support_bot" = ? OR "users"."support_bot" IS NULL) AND ((id IN (SELECT "members"."user_id" FROM "members" WHERE "members". 1250.64314818873 | 15514 | SELECT COUNT(*) FROM "users" WHERE ("users"."state" IN (?)) AND ("users"."ghost" = ? OR "users"."ghost" IS NULL) AND ("users"."support_bot" = ? OR "users"."support_bot" IS NULL) AND ((id IN (SELECT "members"."user_id" FROM "members" WHERE "members". 1266.02006133582 | 15032 | SELECT COUNT(*) FROM "users" WHERE ("users"."state" IN (?)) AND ("users"."ghost" = ? OR "users"."ghost" IS NULL) AND ("users"."support_bot" = ? OR "users"."support_bot" IS NULL) AND ((id IN (SELECT "members"."user_id" FROM "members" WHERE "members". 3421.84427172475 | 12524 | SELECT "members".* FROM "members" LEFT OUTER JOIN "users" ON "members"."user_id" = "users"."id" WHERE (members.id IN (SELECT "members"."id" FROM "members" WHERE "members"."source_type" = ? AND "members"."type" IN (?) AND "members"."source_id" = ? A 1278.60777490743 | 11613 | SELECT COUNT(*) FROM "users" WHERE ("users"."state" IN (?)) AND ("users"."ghost" = ? OR "users"."ghost" IS NULL) AND ("users"."support_bot" = ? OR "users"."support_bot" IS NULL) AND ((id IN (SELECT "members"."user_id" FROM "members" WHERE "members". 1544.52363027426 | 9184 | SELECT "members".* FROM "members" LEFT OUTER JOIN "users" ON "members"."user_id" = "users"."id" WHERE (members.id IN (SELECT "members"."id" FROM "members" WHERE "members"."source_type" = ? AND "members"."type" IN (?) AND "members"."source_id" = ? A 1464.79283376759 | 9164 | SELECT "members".* FROM "members" LEFT OUTER JOIN "users" ON "members"."user_id" = "users"."id" WHERE (members.id IN (SELECT "members"."id" FROM "members" WHERE "members"."source_type" = ? AND "members"."type" IN (?) AND "members"."source_id" = ? A 1460.70497417355 | 8884 | SELECT "members".* FROM "members" LEFT OUTER JOIN "users" ON "members"."user_id" = "users"."id" WHERE (members.id IN (SELECT "members"."id" FROM "members" WHERE "members"."source_type" = ? AND "members"."type" IN (?) AND "members"."source_id" = ? A 1140.44012473845 | 6213 | SELECT COUNT(*) FROM "users" WHERE ("users"."state" IN (?)) AND ("users"."ghost" = ? OR "users"."ghost" IS NULL) AND ("users"."support_bot" = ? OR "users"."support_bot" IS NULL) AND ((id IN (SELECT "members"."user_id" FROM "members" WHERE "members". 1172.25900634972 | 6142 | SELECT COUNT(*) FROM "users" WHERE ("users"."state" IN (?)) AND ("users"."ghost" = ? OR "users"."ghost" IS NULL) AND ("users"."support_bot" = ? OR "users"."support_bot" IS NULL) AND ((id IN (SELECT "members"."user_id" FROM "members" WHERE "members". 1300.67616672338 | 5878 | SELECT COUNT(*) FROM "users" WHERE ("users"."state" IN (?)) AND ("users"."ghost" = ? OR "users"."ghost" IS NULL) AND ("users"."support_bot" = ? OR "users"."support_bot" IS NULL) AND ((id IN (SELECT "members"."user_id" FROM "members" WHERE "members". 1158.00330891089 | 5555 | SELECT COUNT(*) FROM "users" WHERE ("users"."state" IN (?)) AND ("users"."ghost" = ? OR "users"."ghost" IS NULL) AND ("users"."support_bot" = ? OR "users"."support_bot" IS NULL) AND ((id IN (SELECT "members"."user_id" FROM "members" WHERE "members". 3555.50751683168 | 4887 | SELECT "members".* FROM "members" LEFT OUTER JOIN "users" ON "members"."user_id" = "users"."id" WHERE (members.id IN (SELECT "members"."id" FROM "members" WHERE "members"."source_type" = ? AND "members"."type" IN (?) AND "members"."source_id" = ? A 1306.09562228686 | 4469 | SELECT COUNT(*) FROM "users" WHERE ("users"."state" IN (?)) AND ("users"."ghost" = ? OR "users"."ghost" IS NULL) AND ("users"."support_bot" = ? OR "users"."support_bot" IS NULL) AND ((id IN (SELECT "members"."user_id" FROM "members" WHERE "members". 3448.2079981685 | 4132 | SELECT "members".* FROM "members" LEFT OUTER JOIN "users" ON "members"."user_id" = "users"."id" WHERE (members.id IN (SELECT "members"."id" FROM "members" WHERE "members"."source_type" = ? AND "members"."type" IN (?) AND "members"."source_id" = ? A 1658.52125747127 | 3359 | SELECT "members".* FROM "members" LEFT OUTER JOIN "users" ON "members"."user_id" = "users"."id" WHERE "members"."type" IN (?) AND "members"."source_type" = ? AND (members.id IN (SELECT "members"."id" FROM "members" WHERE "members"."source_type" = ? 1631.6443364366 | 3115 | SELECT COUNT(*) FROM "users" WHERE ("users"."state" IN (?)) AND ("users"."ghost" = ? OR "users"."ghost" IS NULL) AND ("users"."support_bot" = ? OR "users"."support_bot" IS NULL) AND ((id IN (SELECT "members"."user_id" FROM "members" WHERE "members". 1339.79410297559 | 2991 | SELECT COUNT(*) FROM "users" WHERE ("users"."state" IN (?)) AND ("users"."ghost" = ? OR "users"."ghost" IS NULL) AND ("users"."support_bot" = ? OR "users"."support_bot" IS NULL) AND ((id IN (SELECT "members"."user_id" FROM "members" WHERE "members". 1191.24742893218 | 2772 | SELECT COUNT(*) FROM "users" WHERE ("users"."state" IN (?)) AND ("users"."ghost" = ? OR "users"."ghost" IS NULL) AND ("users"."support_bot" = ? OR "users"."support_bot" IS NULL) AND ((id IN (SELECT "members"."user_id" FROM "members" WHERE "members". 1160.38510748003 | 2754 | SELECT COUNT(*) FROM "users" WHERE ("users"."state" IN (?)) AND ("users"."ghost" = ? OR "users"."ghost" IS NULL) AND ("users"."support_bot" = ? OR "users"."support_bot" IS NULL) AND ((id IN (SELECT "members"."user_id" FROM "members" WHERE "members". 2503.664 | 2268 | SELECT COUNT(DISTINCT "members"."id") FROM "members" LEFT OUTER JOIN "users" ON "members"."user_id" = "users"."id" WHERE (members.id IN (SELECT "members"."id" FROM "members" WHERE "members"."source_type" = ? AND "members"."type" IN (?) AND "members" 1293.70191710758 | 2268 | SELECT COUNT(*) FROM "users" WHERE ("users"."state" IN (?)) AND ("users"."ghost" = ? OR "users"."ghost" IS NULL) AND ("users"."support_bot" = ? OR "users"."support_bot" IS NULL) AND ((id IN (SELECT "members"."user_id" FROM "members" WHERE "members". 1666.32593220339 | 2106 | SELECT "members".* FROM "members" LEFT OUTER JOIN "users" ON "members"."user_id" = "users"."id" WHERE "members"."type" IN (?) AND "members"."source_type" = ? AND (members.id IN (SELECT "members"."id" FROM "members" WHERE "members"."source_type" = ? 1281.44918970503 | 1729 | SELECT COUNT(*) FROM "users" WHERE ("users"."state" IN (?)) AND ("users"."ghost" = ? OR "users"."ghost" IS NULL) AND ("users"."support_bot" = ? OR "users"."support_bot" IS NULL) AND ((id IN (SELECT "members"."user_id" FROM "members" WHERE "members". 1184.48465263158 | 1691 | SELECT "events"."id" AS t0_r0, "events"."target_type" AS t0_r1, "events"."target_id" AS t0_r2, "events"."title" AS t0_r3, "events"."data" AS t0_r4, "events"."project_id" AS t0_r5, "events"."created_at" AS t0_r6, "events"."updated_at" AS t0_r7, "eve 1298.49675909993 | 1511 | SELECT COUNT(*) FROM "users" WHERE ("users"."state" IN (?)) AND ("users"."ghost" = ? OR "users"."ghost" IS NULL) AND ("users"."support_bot" = ? OR "users"."support_bot" IS NULL) AND ((id IN (SELECT "members"."user_id" FROM "members" WHERE "members". 1284.28642240216 | 1482 | SELECT COUNT(*) FROM "users" WHERE ("users"."state" IN (?)) AND ("users"."ghost" = ? OR "users"."ghost" IS NULL) AND ("users"."support_bot" = ? OR "users"."support_bot" IS NULL) AND ((id IN (SELECT "members"."user_id" FROM "members" WHERE "members". 1322.97739352306 | 1019 | SELECT COUNT(*) FROM "users" WHERE ("users"."state" IN (?)) AND ("users"."ghost" = ? OR "users"."ghost" IS NULL) AND ("users"."support_bot" = ? OR "users"."support_bot" IS NULL) AND ((id IN (SELECT "members"."user_id" FROM "members" WHERE "members". 2507.08417803661 | 601 | SELECT COUNT(DISTINCT "members"."id") FROM "members" LEFT OUTER JOIN "users" ON "members"."user_id" = "users"."id" WHERE (members.id IN (SELECT "members"."id" FROM "members" WHERE "members"."source_type" = ? AND "members"."type" IN (?) AND "members" 1501.58764822134 | 506 | SELECT COUNT(*) FROM "users" WHERE ("users"."state" IN (?)) AND ("users"."ghost" = ? OR "users"."ghost" IS NULL) AND ("users"."support_bot" = ? OR "users"."support_bot" IS NULL) AND ((id IN (SELECT "members"."user_id" FROM "members" WHERE "members". 2655.51296435644 | 505 | SELECT COUNT(DISTINCT "members"."id") FROM "members" LEFT OUTER JOIN "users" ON "members"."user_id" = "users"."id" WHERE (members.id IN (SELECT "members"."id" FROM "members" WHERE "members"."source_type" = ? AND "members"."type" IN (?) AND "members" 1425.90066752577 | 388 | SELECT COUNT(*) FROM "users" WHERE ("users"."state" IN (?)) AND ("users"."ghost" = ? OR "users"."ghost" IS NULL) AND ("users"."support_bot" = ? OR "users"."support_bot" IS NULL) AND ((id IN (SELECT "members"."user_id" FROM "members" WHERE "members". 5401.00402312139 | 229 | SELECT "members".* FROM "members" INNER JOIN "users" "users_members" ON "users_members"."id" = "members"."user_id" LEFT OUTER JOIN "users" ON "members"."user_id" = "users"."id" WHERE (members.id IN (SELECT "members"."id" FROM "members" WHERE "membe 1520.21793567251 | 171 | SELECT COUNT(*) FROM "users" WHERE ("users"."state" IN (?)) AND ("users"."ghost" = ? OR "users"."ghost" IS NULL) AND ("users"."support_bot" = ? OR "users"."support_bot" IS NULL) AND ((id IN (SELECT "members"."user_id" FROM "members" WHERE "members". 1199.07496875 | 64 | SELECT COUNT(*) FROM "users" WHERE ("users"."state" IN (?)) AND ("users"."ghost" = ? OR "users"."ghost" IS NULL) AND ("users"."support_bot" = ? OR "users"."support_bot" IS NULL) AND ((id IN (SELECT "members"."user_id" FROM "members" WHERE "members". 1296.00611111111 | 18 | SELECT COUNT(*) FROM "users" WHERE ("users"."state" IN (?)) AND ("users"."ghost" = ? OR "users"."ghost" IS NULL) AND ("users"."support_bot" = ? OR "users"."support_bot" IS NULL) AND ((id IN (SELECT "members"."user_id" FROM "members" WHERE "members". 1571.09708333333 | 12 | SELECT COUNT(*) FROM "users" WHERE ("users"."state" IN (?)) AND ("users"."ghost" = ? OR "users"."ghost" IS NULL) AND ("users"."support_bot" = ? OR "users"."support_bot" IS NULL) AND ((id IN (SELECT "members"."user_id" FROM "members" WHERE "members". 1681.71557142857 | 7 | SELECT COUNT(*) FROM "users" WHERE ("users"."state" IN (?)) AND ("users"."ghost" = ? OR "users"."ghost" IS NULL) AND ("users"."support_bot" = ? OR "users"."support_bot" IS NULL) AND ((id IN (SELECT "members"."user_id" FROM "members" WHERE "members". 2043.993 | 0 | explain analyze select count(*) from users where users.state in ('active') and (ghost is false or ghost is null) and (support_bot is false or support_bot is null);(50 rows)
Doing a simple count with these filters produces:
gitlabhq_production=# explain analyze select count(*) from users where users.state in ('active') and (ghost is false or ghost is null) and (support_bot is false or support_bot is null); QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=187336.88..187336.89 rows=1 width=8) (actual time=2042.179..2042.179 rows=1 loops=1) -> Seq Scan on users (cost=0.00..183090.86 rows=1698408 width=0) (actual time=0.026..1678.871 rows=1198416 loops=1) Filter: (((ghost IS FALSE) OR (ghost IS NULL)) AND ((support_bot IS FALSE) OR (support_bot IS NULL)) AND ((state)::text = 'active'::text)) Rows Removed by Filter: 2429 Planning time: 1.278 ms Execution time: 2042.229 ms
As per @sfrost and some further poking, it seems these support_bot/ghost columns have been around prior to 9.2 RC2, so they can't be the cause of the massive increase in sequence scans. This means we don't have to tackle that particular problem right now.
While not an exact measure as the query structures change and therefore the set of queryids change, there definitely is an indication here that the changes rolled out make a great deal more use of this query which does not perform very well.
It seems we're plucking IDs into memory here, despite that being on a blacklist. Further, there's an ORDER BY in a sub-query which is unnecessary (thanks to Rails' default_scope).
It seems this query is used to get the number of merge request approvers for a given merge request, in what is probably the most inefficient way of getting the data.
SELECTCOUNT(*)FROM "users"WHERE ("users"."state" IN ('active'))AND ("users"."ghost" = 'f' OR "users"."ghost" IS NULL)AND ("users"."support_bot" = 'f' OR "users"."support_bot" IS NULL)AND exists (select 1 from project_authorizationswhere users.id = project_authorizations.user_id and project_id = 1576764 and access_level > 20)and not exists (select 1 from approvalswhere approvals.merge_request_id = 3490631 and users.id = approvals.user_id)and users.id <> 180822;
@sfrost Such a query was basically my train of thought too.
@DouweM So basically instead of using members/explicit user IDs we query project_authorizations to get all the people that have access to a project, then reduce that set to those that are not already approvers.
@DouweM So basically instead of using members/explicit user IDs we query project_authorizations to get all the people that have access to a project, then reduce that set to those that are not already approvers.
@fatihacet Let us know if it's doable to disable the Merge button until receiving a proper response from approvals endpoint, then updating the widget afterwards.
That way we could get rid of MergeRequestEntity#approved data.
@oswaldo That's great, but is anybody looking into that approvers method in particular? While we don't use it as often any more, we do still use it and the code itself can become a problem again in the future.