Skip to content
Snippets Groups Projects
Commit 96017e24 authored by Pablo Carranza's avatar Pablo Carranza
Browse files

Add storage and networking documentation with our last aggreements

parent 3d1a0f3d
No related branches found
No related tags found
No related merge requests found
# GitLab Infastructure Design
[Storage](design/storage.md)
[Networking](design/networking.md)
# Networking
# Storage
## CephFS in hardware
This is our general plan and reasoning
* We are moving forward with the CephFS cluster in hardware.
* Our general architecture goes in the way of using
* 12 cores processors
* 2 sockets
* 24 drives
* 22 for Ceph storage
* 2 for the OS as a raid
* NVMe drive on the PCIe bus for Ceph Journal (as fast as it gets)
* 40GB nic card for general networking
* 1GB nic card for management
* To reduce blast radious we will have 4 independent clusters accessed through the sharding feature.
* If one goes down we still have 3 left.
* This is particularly useful to reduce load and improve recovery times.
* This can be also useful to migrate CephFS data into BlueStore when it's stable and available.
* As a backup for git repos we will use GitLab GEO feature pushing into a secondary node hosted at Amazon with an EFS drive (we don't care if it's slow)
* This makes Amazon DirectConnect a critical feature for our colo as we will need to have high bandwidth to it.
* We will start backfilling this Amazon instance as soon as we finish draining CephFS, so when we are done we can start moving from Amazon to the colo.
* To prevent a total loss in the case of another MDS meltdown we will create snapshots periodically so we can recover (hourly, daily, whatever makes sense)
* We will push forward with the GEO feature to use an object storage, in which case we will use RADOS as the object storage to simplify our installation.
## CephFS in the cloud
** Don't do it. **
* Latencies will kill you.
* Random hosts going down at any time will double your workload.
* Network attached storage, as premium as it is, is shared.
* CephFS will lock when it can't write to the journal.
The good side:
* CephFS survives to things locking and injecting latencies remarkably well.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment