Biotechnology Awards 2022

GHP / Biotechnology Awards 2022 9 , Jan22683 Bridging the gap between Data Collection and Data Analysis It’s no secret that Life Sciences is both blessed and challenged by the exponential growth in the volume of biological research data. Science is creating data at machine scale, yet in many cases working to manage data and leverage that data at human scale. By some measures, scientists spend 50% to 60% of their time finding and organizing data for analysis. And this doesn’t even consider that at least part of the reproducibility challenge stems from a lack of sufficient assay metadata or other data provenance issues. “Researchers have become habituated to the struggle”, says Emerson Huitt, Founder and CEO of Snthesis. “It’s difficult for many to imagine answering their questions quickly and gaining insight into the research pipeline that can yield organisational impacts.” Prior to founding Snthesis in 2018, Huitt spent over a dozen years building one-off custom software solutions focused on addressing this challenge for biological research firms. Those custom solutions generally got the job done, but they were inflexible, difficult to evolve, and prohibitively expensive for all but the larger players in industry or academia. Huitt knew there was a need for a broadly available, sourceagnostic data integration and harmonization platform that utilized Semantic Web Technology / Natural Language Processing to create broad datasets that are easily searchable in a more comprehensive fashion. Instead of searching for a certain type of data in a certain place or a particular word in a spreadsheet, this technology would let the researcher ask the question “Which samples that Kathy tested last year gave us positive results in a particular media?” After working with a handful of private clients for a couple of years, the Snthesis Bio® Platform was recently released for commercial availability. The platform is source agnostic – ingesting data from spreadsheets, Electronic Lab Notebooks, LIMS systems, Sequencers, EHR data and even public datasets such as Refine.Bio or NCBI and others. Even if the data sources include thousands of spreadsheets collected by many people over a period of years, the combined results are easily searched using the platform’s comprehensive graphical query tools. Huitt notes “We process the data and analyze the shapes of the data to define categories for the system to recognize. These categories may include the origin of samples and who collected them – we work with our clients to identify and extract the things that are important to them”. The result is a clean, harmonized, and living collection of all the research data within the organisation – all searchable, shareable, and reusable. Ordinarily, an implementation of Snthesis Bio® involves ingesting years of archival research data (including thousands of old spreadsheets), integrating with ELN or LIMS tools, and then creating processes with the client to ingest and incorporate new data going forward as it’s created. “When implementing this solution, an organisation doesn’t need to have its data properly structured,” reports Joe Insinga, Snthesis’s Chief Growth Officer. “The organisation just needs a vision of how it wants its data to be structured.” And they don’t have to do that alone. “We have a workshop on day one,” Insinga says. “We discuss what companies want the results to look like and what vernacular or data classifications they want the system to use.” He adds “Different teams often have different names for the same things, and researchers often have an inconsistent approach to labeling spreadsheet columns from week to week or project to project”. The Snthesis Bio® Platform delivers detailed data rather than conclusions or analytical results – Snthesis is focused on cleaning, integrating, and harmonizing the data so it’s available for any of the powerful analysis tools that their clients choose to use. Towards the end of 2021, Snthesis spun out a portion of their Semantic Comprehension Engine (the “intelligence” behind the Platform) and created Snthesis Merge®, a tool for harmonizing and extending public and private datasets. “There is a tremendous amount of valuable data within some public databases, such as Refine.Bio or NCBI” Huitt says, “but once you pull a particular dataset you may be dealing with the impossible task of cleaning that public data so it can be put to good use”. In one recent example, a dataset pulled from Refine.Bio ran to over 7,000 columns of data, with one fundamental datapoint – the name of the disease – scattered across more than 40 of those columns amid thousands of rows. After configuring the Merge® tool on what to look for, the gene expression results were pulled within minutes and matched up with ICD10 disease codes from a cohort within the UK BioBank database. Let’s repeat that: Within minutes after being configured, the Snthesis Merge® tool matched UK BioBank patient records with relevant gene expression data. And it can do the same for nearly any public or private datasets. Durham, NC based Snthesis has been revenue-funded from inception. While there may be venture capital or an equity partner in their future, they are focused for now on adding a few commercial customers and continuing to add new features to both the Bio® platform and the Merge® tool. Website: Twitter: @SnthesisInc Inquiries: Joe Insinga, CGO via email [email protected] Snthesis, Inc