Team:Wageningen UR/Wikibase


iGEM Wageningen 2021

The BioParts Wikibase

library-database

During the development of Cattlelyst we ran into a problem when redesigning metabolic pathways: “How do we efficiently and nonredundantly search the iGEM registry for parts of interest?” Given that this is a problem all iGEM teams will face, we aimed to create a tool to alleviate this problem. In our project we wanted to query the biological function of a Biobrick part using standard web identifiers, such as Enzyme Commission numbers (EC) and KEGG Ontologies (KO). So, our software The iGEM PIPE, could search Biobricks on enzyme function which it uses in its computations. It is for these reasons that the wikibase project started.

logo
logo
Figure   1: Provenance of information used by the iGEM PIPE. The figure shows the connection of the BioParts wikibase to both the PIPE and the iGEM registry of standard biological parts. It also shows that the SynBioHub is connected to this registry, but not to the wikibase, nor is used by the iGEM PIPE.

Web Ontologies

As shown with the development of the iGEM PIPE, machines are playing a more and more important part in the scientific community and in pushing research forward, it is important that sources and databases facilitate this trend via accessibility. One of the FAIR principles in data management, which stand for Findability, Accessibility, Interoperability and Reusability of data [1]. These conventions are the minimal standards for good data management and sharing. While computers do not understand human readable text they can search for relevant information using resource ontologies. Databases should allow for knowledge to be represented in both a computer- and human-readable fashion. To facilitate computer-readability, clear set properties that link entities together have to be designed. Such Ontology Web Languages (OWL) allow computers to make connections [2]. An example of such a linked web is Wikidata,

“…a free and open knowledge base that can be read and edited by both humans and machines.” [3]

Currently the iGEM Registry of Standard Biological Parts does not comply FAIR principles [1] and cannot be iterated via a machine. The language in which it is written is not computer readable. Therefore it is not possible for machines to search the collection of Biobricks and biological information. An online database that allows for querying the iGEM registry is the SynBioHub [4], which in accordance with FAIR principles annotates a lot of the Biobricks originating from the iGEM registry. It was designed and created by James Alastrai McLaughlin Et al in 2018 [4]. This repository combats many of the problems the iGEM registry has, such as the provenance of parts, but focuses predominantly on the capturing of genetic design information such as plasmid design (Figure 2). This Hub is an improvement upon the iGEM registry, This Hub is an improvement on the iGEM repository as it allows for computer readable queries to be used. However, the information we require for The iGEM PIPEis not contained in the hub, so we have created the BioParts wikibase to incorporate standardized function identifies and complement our pipeline A comparison between the iGEM registry, SynBioHub and BioParts wikibase can be seen in Figure 2.

logo
logo
Figure   2: Comparison between online databases of the iGEM registry. The iGEM registry is lacking in many FAIR principles, which the SynBioHub resolves to a certain extend. While the SynBioHub provides numerous standardized identifiers and facilitates machine querying -it lacks in the function identifiers that are used in the . The BioParts wikibase undertakes this problem and has standardized function identifiers for numerous biological parts.

The BioParts wikibase

To create a database that facilitated our needs the Biobrick parts have been reuploaded to a wikibase instance, named the BioParts wikibase. During this iGEM project a platform is created which allows for the storage and the uniform annotation of Biobricks. This allows students and iGEM enthusiasts to comprehensively search for biological functions and facilitate and encourage research using these biological parts. This platform performs several computations on the Biobricks during uploading to this wikibase instance. It annotates the biological parts with functions, restriction sites, and interconnects information of parts of the Biobricks that are composed of other biological parts. It is a semantic database and the BioParts wikibase is an instance that helps navigate the iGEM registry in a computer readable manner, with a focus on enzymatic function.

The wikibase and the iGEM PIPE

The search for biological functions using a machine is shown in the iGEM PIPE where during the gapfilling procedure a biological function is sought. The iGEM PIPE searches for standardized biological function IDs, such as an EC numbers. During the uploading of the iGEM registry to the wikibase the Biobricks have been annotated with a conserved and standardized ID, the EC number. As such the pipeline can also search the biological parts that originate from the iGEM registry. This means that the pipeline can suggest Biobricks and will automatically give the parts originating from the registry a higher selection score, and will be more likely to suggest these parts for the design approach. These parts are more attainable, due to the Material Transfer Agreement (MTA) of the iGEM foundation, which is a crucial aspect for many iGEM teams. This is an example where FAIR principles both in data sharing (the wikibase) and sample sharing (the MTA) are considered in designing an approach.

Using the wikibase

Biological data and information of the parts can be obtained via several methods. For instance the search bar can be used, provided on any wikimedia commons database, but also computer generated queries can be used. The SPARQL Protocol and RDF Query Language (SPARQL) is a commonly used and standardized language for querying semantic databases. This query language can retrieve information from databases, like the BioParts wikibase in an orderly fashion and queries can be generated both by machines and humans. Here the benefit of using an OWL really shows. With this query method both the provenance, the origin of the biological part, and relevant information can be retrieved without taking up valuable time of the scientist. For an extensive guide on how to use SPARQL, the W3C website is the perfect source.

A simple example is finding Biobricks with certain EC numbers. In the example query and output of Table 1 you can see that using the property qualifier P10 (EC number stored in the pq: ontology) on the alignment (P47, p: ontology) of the item, we can select 8 different Biobricks within the Wikibase that contain this EC number “1.2.1.22” (Lactaldehyde Dehydrogenase).

SELECT ?item ?label WHERE {
?item p:P47 ?alignment .
?alignment pq:P10 "1.2.1.22" .
?item rdfs:label ?label .
}


Table 1: An example query on the BioParts wikibase, where using SPARQL Biobricks with EC number “1.2.1.22” are found. As can be seen multiple Biobricks are output and both the link to the item in the wikibase as the corresponding label is shown. This example makes uses of prefixes that are disregarded in this image and the pq, p and rdf ontologies. Consult this website for more extensive information on querying wikibases and using SPARQL.
Item link to wikibase Biobrick code
http://bioparts.wiki.opencura.com/entity/Q8971 BBa_K892013
http://bioparts.wiki.opencura.com/entity/Q15885 BBa_K936019
http://bioparts.wiki.opencura.com/entity/Q18610 BBa_S05043
http://bioparts.wiki.opencura.com/entity/Q18660 BBa_S05095
http://bioparts.wiki.opencura.com/entity/Q22811 BBa_S05044
http://bioparts.wiki.opencura.com/entity/Q24249 BBa_K936016
http://bioparts.wiki.opencura.com/entity/Q25132 BBa_K2216000
http://bioparts.wiki.opencura.com/entity/Q26942 BBa_K936017

Adding to the wikibase

The wikibase as it stands now only scratches the surface of possibilities and we, Cattlelyst, would like to wholeheartedly invite all fellow and future iGEM students to participate in extending the semantic web. This community-based project will allow for iGEMmers and researchers to quickly and non-redundantly search all biological information related to Biobricks. It is very important that as much information is added to the database and especially data of actual experiments, in accordance with FAIR principles, should be uploaded. It is suggested to look at our GitHub repository, where the entire code used for generating the wikibase can be used, in addition to a more extensive manual on how to do so.

  • References
    arrow_downward
    1. Jacobsen, Annika et al. 2020. “Fair Principles: Interpretations and Implementation Considerations.” Data Intelligence 2(1-2):10–29.
    2. Hawke S, Herman I, Archer P, Prud’hommeaux E. W3C. Published 2013. Accessed October 13, 2020. https://www.w3.org/
    3. Contributors MediaWiki. How to contribute. MediaWiki. Published 2020. https://www.mediawiki.org/w/index.php?title=How_to_contribute&oldid=4166757
    4. McLaughlin JA et al. 2018. “Synbiohub: A Standards-Enabled Design Repository for Synthetic Biology.” Acs Synthetic Biology 7(2):682–88.
About Cattlelyst

Cattlelyst is the name of the iGEM 2021 WUR team. Our name is a mix of 1) our loyal furry friends, cattle, and 2) catalyst, which is something that increases the rate of a reaction. We are developing “the something” that converts the detrimental gaseous emissions of cattle, hence our name Cattlelyst.

Are you curious about our journey? We have written about our adventures in our blog, which you can find here: