3 minute read

After waiting long hours to download a few SRAs using fastq-dump, I was happy to learn that fasterq-dump would finally lead to faster fastq files on my machine.

Furthermore I ran into a lot of trouble because of temporary/cache files, clogging my disk and making fastq-downloads fail, a problem which I thought would be solved with the added -t option in fasterq-dump. From SRA-tool’s online wiki:

The location of the temporary directory can be changed too:
$fasterq-dump SRR000001 -O /mnt/big_hdd -t /tmp/scratch

Mysteriously however, my downloads went on crashing, although I set the temorary directory to my home directory which should have enough storage capacity:

2019-11-15T10:33:45 fasterq-dump.2.10.0 err: storage exhausted while writing file within file system module - system bad file descriptor error fd='4'   

Observing disk usage during download of a ~8GB large fastq files showed puzzling results:

After 1 minute of of running fasterq-dump:

$ df -h
Filesystem                                     Size  Used Avail Use% Mounted on
/dev/sda2                                       30G   23G  5.7G  80% /
/dev/sda4                                      3.3T  2.7T  617G  82% /home
$ ncdu /
  12.3GiB [##########] /usr
   1.3GiB [#         ] /lib
   1.3GiB [#         ] /opt
   1.3GiB [#         ] /var
 199.4MiB [          ] /boot
  41.7MiB [          ] /root

After ~10 minutes of running fasterq-dump:

$ df -h
Filesystem                                     Size  Used Avail Use% Mounted on
/dev/sda2                                       30G   28G  666M  98% /
/dev/sda4                                      3.3T  2.7T  605G  82% /home
$ ncdu /
  12.3GiB [##########] /usr
   1.3GiB [#         ] /lib
   1.3GiB [#         ] /opt
   1.3GiB [#         ] /var
 199.4MiB [          ] /boot
  41.7MiB [          ] /root
  24.5MiB [          ] /tmp

A change in the disk usage without a change in file sizes could only be explained by deleted, yet still opened files, so I checked open but deleted files and indeed observed an interesting candidate (the only one actually):

$ lsof +L1
COMMAND     PID     USER   FD   TYPE DEVICE   SIZE/OFF NLINK    NODE NAME
fasterq-d 17854 schamori    7u   REG    8,2 8002551155     0 1179402 /var/tmp/.sra.cache (deleted)

fasterq-dump apparently still allocated a slowly growing sparse file, intended to become ~8GB of size in /var/tmp/.

$ sudo  debugfs -R 'stat <1179402>' /dev/sda2
Inode: 1179402   Type: regular    Mode:  0644   Flags: 0x80000
Generation: 4259928218    Version: 0x00000000:00000001
User: 30925   Group:   100   Size: 8002551155
File ACL: 0    Directory ACL: 0
Links: 0   Blockcount: 468104
Fragment:  Address: 0    Number: 0    Size: 0
 ctime: 0x5dce8179:e2d27d90 -- Fri Nov 15 11:44:09 2019
 atime: 0x5dce8179:e4bac58c -- Fri Nov 15 11:44:09 2019
 mtime: 0x5dce8179:e2d27d90 -- Fri Nov 15 11:44:09 2019
crtime: 0x5dce80fb:af533c54 -- Fri Nov 15 11:42:03 2019
dtime: 0x0011fa28 -- Wed Jan 14 16:15:52 1970
Size of extra inode fields: 28
EXTENTS:
(ETB0):4752581, (0-127):999424-999551, (132-291):999556-999715, (444-507):999868-999931, (632-695):1000056-1000119, (800-863):1000224-1000287, (1000-1063):1000424-1000487, (1196-1259):1000620-1000683, (1352-2047):1000776-1001471, (2048-4095):1294336-1296383 

After several minutes:

$ sudo  debugfs -R 'stat <1179402>' /dev/sda2
Inode: 1179402   Type: regular    Mode:  0644   Flags: 0x80000
Generation: 4259928218    Version: 0x00000000:00000001
User: 30925   Group:   100   Size: 8002551155
File ACL: 0    Directory ACL: 0
Links: 0   Blockcount: 468104
Fragment:  Address: 0    Number: 0    Size: 0
 ctime: 0x5dce8179:e2d27d90 -- Fri Nov 15 11:44:09 2019
 atime: 0x5dce8179:e4bac58c -- Fri Nov 15 11:44:09 2019
 mtime: 0x5dce8179:e2d27d90 -- Fri Nov 15 11:44:09 2019
crtime: 0x5dce80fb:af533c54 -- Fri Nov 15 11:42:03 2019
dtime: 0x0011fa28 -- Wed Jan 14 16:15:52 1970
Size of extra inode fields: 28
EXTENTS:
(ETB0):4752581, (0-127):999424-999551, (132-291):999556-999715, (444-507):999868-999931, (632-695):1000056-1000119, (800-863):1000224-1000287, (1000-1063):1000424-1000487, (1196-1259):1000620-1000683, (1352-2047):1000776-1001471, (2048-4095):1294336-1296383, (4096-6087):1724416-1726407, (157628-157695):1003452-1003519, (157696-159743):1296384-1298431, (159744-161791):1726464-1728511, (161792-162363):1744896-1745467, (313908-315391):1008180-1009663, (315392-317439):1421312-1423359, (317440-319091):1728512-1730163, (470184-471039):1010856-1011711, (471040-473087):1423360-1425407, (473088-474663):1730560-1732135, (626468-626687):1036068-1036287, (626688-628735):1413120-1415167, (628736-630783):1732608-1734655, (630784-631075):1746944-1747235, (782752-784383):1038752-1040383, (784384-786335):1734656-1736607, (939032-940031):1416216-1417215, (940032-942079):1425408-1427455, (942080-943607):1753088-1754615, (1098492-1099775):1434364-1435647, (1099776-1101823):1427456-1429503, (1101824-1103355):1757184-1758715, (1255196-1255423):1437468-1437695, (1255424-1257471):1417216-1419263, (1257472-1259519):1429504-1431551, (1259520-1261083):1765376-1766939, (1412712-1413119):1300072-1300479, (1413120-1415167):1419264-1421311, (1415168-1417215):1763328-1765375, (1417216-1417575):1767424-1767783, (1589844-1591295):1309268-1310719, (1591296-1593343):1431552-1433599, (1593344-1595391):1740800-1742847, (1595392-1595603):1794048-1794259, (1772048-1773567):1286672-1288191, (1773568-1775615):1742848-1744895, (1775616-1776655):1773568-1774607, (1953732-1953747):1290180-1290195

A short google search lead me to learn about the general caching mechanisms of the SRA toolkit (I should have read the docs just earlier) and how to quickly manipulate them.

So thank you NCBI/SRA. You’ve made me learn a lot about the Linux filesystem.

Tags:

Categories:

Updated: