File duplication in storage
Hi
I'm trying to move all my documents to Mayan and use the API for batch upload. In this batch I uploaded 440 PDFs using a simple, hacky script called via xargs -n 1 -P 1
:
#!/bin/bash
token=$(cat token)
curl -H "Authorization: Token $token" -F "document_type=1" -F "label=$1" -F "language=deu" -F "file=@$1/combine.pdf" -o $1/upres http://docker.lan:8000/api/documents/documents/
When I now look into the document_storage folder there are lot of duplicate files using fdupes:
[...]
./d28cdff6-bb85-48af-a99a-956ec81453ab
./2b118360-98bf-481a-8241-4cdf7bcfad9a
./shared-file-1e6589af58f2485dae7877acb2e63b93
./09cf47d7-ad9f-4745-96b0-ca190c1aa14c
./204b52ea-410f-46a3-bea0-9d5d92cb1baa
./9bdd528d-3618-4e5b-9c36-f154124266d6
./295c320b-6524-4b81-9a43-3cc12daf8ab9
./ed2a646e-4c50-4522-bf6d-a30904d89c52
./0368d393-ee8d-4220-9bd5-3eda501419e3
./4ec64dc7-4b30-47ca-b08d-d7d703b4b671
./0b17384b-4dd0-4518-912a-6105a08d04c2
./031fdb08-f6f5-4334-bfae-8c4cd411013d
./shared-file-8f469182f5ed4591a495e8a364ae2a28
./c5ec9ae6-876e-4d7c-ad8b-4bbeab0aaba4
./bb5119d5-aeed-401b-be14-ef089c255427
./e74c4057-a87f-4bf1-a384-bb5a3b425c21
./8a20b654-f989-40cd-8d0c-cdf65bd8a6f3
./f2d045e5-8bba-49a1-9b18-b347821a6090
./8e212f60-1731-42f2-baec-5efe331af6f8
./006485cc-e384-4c8d-9a6a-c7e824637fe4
./5190b055-1a92-454d-97db-a7eda67adeb7
[...]
613 duplicate files (in 222 sets), occupying 4551.3 megabytes
So I now have about 2Gig of actual data and 4.5Gig of wasted space! If I add a sleep 5 after the curl the issue is not as bad anymore, but still present. Whats going on here?
Also I saw that the API is very, very touchy regarding the slightest bit of activity on the server. If I touch anything in the Mayan GUI I get failed uploads. Of course I can handle those failures in my script, but it seems odd to be so very sensetive.
Using a fresh container with Mayan 2.7.3
Regards, Flo