This working group will be responsible for developing standardized names for data fields which can be understood and interpreted by all software tools, allowing interoperability between pipelines from different developers. Close collaboration with the Minimal Standards Working Group is expected.
The formats working group is focused on developing standard file formats and schemas to represent annotated antibody and T cell receptor sequences and any downstream data representations. The proliferation of tools for processing raw AIRR data is making it more difficult to compare results between tools and to build modular data pipelines. We have been developing a CSV-like file format for representing annotated reads and clones, with the goal of having it implemented in multiple common AIRR pipelines (e.g., immcantation).
- Multiple WGs are designing implementation standards and could use technical input on data representation.
- Coordination with AIRR Working Groups to specify data models, e.g.,
- Common Repo defining minimal APIs for repositories and REST resources
- MinStd choosing ontologies for their fields
- Germline defining new germlines and annotations
- Ensure all AIRR groups are working in mutually compatible ways (in terms of data)
- Ensure we have liaisons on all other relevant working groups
- Work on representation of provenance of data sets
Co-leaders: Uri Laserson & Scott Christley
Members: Aaron Rosenfeld, Anna Fowler, Ahmad Chan, Brian Corrie, Bojan Zimonja, Chaim Schramm, Corey Watson, Daniel Gadala-Maria, Duncan Ralph, Felix Breden, Jason Vander Heiden, Jerome Jaglale, Jessica Finn, Nishanth Marthandan, Richard Bruskiewich, Scott Christley, Steve Kleinstein, Susanna Marquez
Visit the AIRR Standards Documentation.