Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
O
OBITools3
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 33
    • Issues 33
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
  • Merge Requests 0
    • Merge Requests 0
  • Operations
    • Operations
    • Incidents
  • Analytics
    • Analytics
    • Repository
    • Value Stream
  • Wiki
    • Wiki
  • Members
    • Members
  • Collapse sidebar
  • Activity
  • Graph
  • Create a new issue
  • Commits
  • Issue Boards
  • OBITools
  • OBITools3
  • Issues
  • #79

Closed
Open
Opened Apr 23, 2020 by Celine Mercier@mercierOwner

Dictionary efficiency issue

Handling of huge dictionaries (typically merged information like merged taxids in reference databases with hundreds of thousands of taxids or merged samples in datasets with thousands of samples) is not efficient as it creates big files that are not mapping-friendly (and occupy a lot of disk space).

There is already a solution half implemented in the form of dictionaries stored as characters strings, but the API to parse them in C is not implemented, so it's not or rarely used. This would be the fastest solution to finish to implement, but eventually a better solution could be developed (e.g. using hash tables implemented in a way that makes the most of the mapping behaviour).

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Assignee
Assign to
None
Milestone
None
Assign milestone
Time tracking
None
Due date
None
Reference: obitools/obitools3#79