Dictionary efficiency issue (#79) · Issues · OBITools / OBITools3

Dictionary efficiency issue

Handling of huge dictionaries (typically merged information like merged taxids in reference databases with hundreds of thousands of taxids or merged samples in datasets with thousands of samples) is not efficient as it creates big files that are not mapping-friendly (and occupy a lot of disk space).

There is already a solution half implemented in the form of dictionaries stored as characters strings, but the API to parse them in C is not implemented, so it's not or rarely used. This would be the fastest solution to finish to implement, but eventually a better solution could be developed (e.g. using hash tables implemented in a way that makes the most of the mapping behaviour).

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information