From fragmented data to intelligent decision making

By Steve Hartman

The volume of data that goes into the creation of a new drug or therapy can be measured in hundreds of terabytes spread across any number of sources or instruments.

Unfortunately, too many pharmaceutical companies are working in siloes with fragmented data and manually transforming data, which leads to delays, errors, rework, lost information and more. This disconnection makes it harder to bring that information together, impeding good decision making.

How can pharma companies better aggregate high-quality data, streamline collaboration and minimize data loss and the diminishment of operational overhead?

And how can artificial intelligence (AI) help unlock the full potential of these companies?

The hidden costs of fragmented data

In many pharma companies today, research and development (R&D) workflows remain fragmented across instruments, applications and teams. Experimental intent, material lineage and decision rationale are rarely captured in a structured, reusable form as work advances from research to development and production. This disconnect increases scale-up risk, slows industrialization, limits reproducibility and reduces the return on AI investments, resulting in:

More time spent on mundane tasks or searching for information rather than fostering innovation
Lost knowledge at critical handoffs and teams reconstructing or reinterpreting earlier findings
Manual changes to information, leading to version drift and transcription errors
Information stored in local spreadsheets increasing unreliability
Multiple clinical research organizations (CROs) providing data in various formats
Slow quality control (QC) reports that can delay projects and hide the source of an issue

As work moves between teams and systems, critical scientific context is often lost. Without knowledge and intent, pharma companies reduce their ability to scale innovation effectively beyond individual projects or small teams.

How not to lose $500,000 every day

Each day that a clinical trial is delayed due to poor data access can represent up to $500,000 in lost revenue.¹ On top of that, the pharmaceutical industry alone spends over $150 million annually just to support the data transfer between systems and organizations; an incredible amount of waste that could be allotted to science instead.²

To reduce this hidden tax on pharmaceutical drug discovery and development, organizations must move beyond digitized records and toward structured workflows that unite scientific reasoning with coordinated development and production. This requires:

Unified multimodal data foundations
Orchestrated laboratory environments
Lifecycle continuity

These attributes preserve process knowledge and experimental context from discovery through commercial production. When scientific work is structured and connected, insight no longer stalls at the lab bench. Instead, it carries forward into development, scale-up and manufacturing.

By aligning scientific reasoning with reproducible, structured scientific work within a continuous digital thread across the R&D lifecycle, organizations can create scalable products and outcomes, accelerating time-to-market while ensuring quality, traceability and operational confidence.

When companies support innovation with structured workflows, they realize the value of AI, accelerate tech transfer and reliably transform promising discoveries into reproducible therapies.

Supporting innovation by empowering human-in-the-loop science

What does it mean to keep human-in-the-loop? It means people will continue to provide feedback to an automated or AI system to improve its performance over time and maintain accuracy and reliability. An always-on pipeline versus managing ad hoc, siloed projects keep scientists, team members and leaders in the loop. Here’s how it works:

Edge-based agents facilitate automated data acquisition

These agents, deployed on instrument computers, continuously monitor for new files, performing packaging, checksum verification and secure upload to a designated cloud service, thereby eliminating manual file transfer processes.

Streamlined data parsing

The parsing engine intelligently recognizes hundreds of instruments and common formats, such as TXT, CSV, TIF and others. It extracts critical values, transforming them into structured, searchable metadata that seamlessly accompanies the original file, ensuring data is always ready for use.

Intuitive modeling for scientific data

Data easily integrates into a unified model that directly mirrors the team’s scientific understanding of plates, samples, constructs, batches, runs and results. Leverage out-of-the-box applications for rapid deployment, or configure them to align with specific naming conventions, entities and quality rules.

Empowering roles with personalized insights

Deliver customized experiences based on roles. For example: biologists gain specific insights, analytical chemists access relevant data and program leads get a high-level view. Live reports ensure full traceability to raw data, guaranteeing audit readiness.

Downstream integration for enhanced decision-making

Ensure the capture-parse-model pipeline is reliable before connecting to workflow orchestration, analytics and AI features. A consistent data foundation improves decision-making capabilities.

This unified multimodal context transforms fragmented scientific work into a structured, connected foundation for scalable innovation. By capturing and modeling relationships across entities, samples, assays and instruments within a single contextual framework, organizations can preserve scientific intent and continuity from discovery through development, keeping humans in the loop throughout the entirety of the project.

Revolutionizing pharma lab operations

Pharma companies need to be able to run thousands of instruments and load more than one hundred million files that total hundreds of terabytes. Regardless of the size of the operation, whether there are 10 instruments or 2,000, the criteria for success don’t change.

Automated data ingestion: Seamless, zero-touch transfer of data from instruments into a centralized, secure environment enhances precision, quality and compliance.
Intelligent data contextualization: Automatic extraction of metadata for every file makes data easily searchable and understandable.
Scientifically aligned data views: Relational data representations accurately reflect scientific processes rather than being limited by IT constraints.
Effortless reporting and analytics: One-click, no-code reporting updates with changes in customer, project or run save significant time, improving communication and minimizing errors.
Comprehensive data lineage: Full traceability from any chart or table back to its original source file ensures transparency and auditability.
Automated QC: Explicit and automated quality control processes with built-in audit trails that replace manual and inconsistent methods.

Aggregating data from multiple CROs

Most organizations partner with multiple CROs that likely send data in different ways. Once they’re inside the network, someone usually manually aggregates and integrates information into a single spreadsheet.

The better approach is one that considerably reduces time stitching and reworking, so scientists work on science, not spreadsheets. This can be achieved by:

Standardizing data ingestion: Establish a unified entry point for all incoming files, whether from email, SharePoint or S3, and normalize them through a single ingestion service.
Harmonizing data formats: Map diverse CRO outputs into a consistent internal data model during the parsing stage and enable project teams to compare data uniformly.
Enforcing upfront quality control: Immediately reject or flag non-compliant files to ensure that any necessary rework occurs before reports are generated.
Creating dynamic, attributable reports: Generate client-ready reports that allow for instantaneous context switching with a single click, while maintaining full attribution back to the original raw files.

Accelerating science with automated quality control

Manual QC is inherently biased and time consuming. By contrast, automated QC makes quality criteria explicit and testable, and with the integration of AI technology, it can achieve new levels of precision and foresight.

Automated peak detection for chromatograms: Define retention ranges and thresholds to automatically flag the presence or absence of target compounds in HPLC or similar data. This allows scientists to efficiently focus on outliers rather than manually scrutinizing every plot. With AI, models can learn complex peak patterns, identify subtle anomalies that might be missed by rule-based thresholds and even predict potential issues before they become critical, leading to more robust and adaptive quality control.
Proactive operational error detection: Immediately highlight issues such as missing barcodes, incorrect sample IDs or liquid handler malfunctions as they occur. This creates an error queue, enabling technicians to address and rectify problems before they escalate and impact downstream processes. AI can analyze historical data to predict common failure points or anticipate equipment malfunctions, enabling proactive maintenance and minimizing downtime.
Plate-centric joins: Integrate liquid handler instructions directly with plate reader results allowing for immediate problem identification within the proper context, facilitating rapid resolution and closing the feedback loop efficiently. Leveraging AI, systems can automatically correlate vast amounts of experimental data, identify intricate relationships between experimental parameters and outcomes and even suggest optimal adjustments to experimental protocols to improve success rates.

The shift from manual QC to automatic is both cultural and technical. QC transforms into a system of shared, enforced rules, moving away from reliance on individual memory or extensive training for new hires. Integrating AI ensures these systems become continuous learning partners, refining their understanding of quality over time and adapting to new scientific challenges.

How dashboards can truly support R&D decision-making

Dashboards support bringing structured experimental data, material lineage and analytical context into coherent, lifecycle-aware views. They are critical when confirming high-stakes R&D decisions.

Rather than relying on disconnected reports or manual aggregation, decision makers gain real-time visibility into experimental outcomes and process status, comparability across studies and the downstream implications of scientific choices.

Curate experiences for each role
Collapse reporting time
Use AI to summarize, match and predict on top of structured, lineage-rich data

By grounding dashboards and portfolio views in a unified scientific foundation, pharma companies can make confident go/no-go prioritization and scale-up decisions with full traceability and context.

The result is not just clearer visualization; it’s also faster, higher-integrity decision making that reduces risk, protects investment and aligns discovery with development and commercial impact.

How Luma redefines how science gets done

Dotmatics is now part of Siemens

Siemens’ integrated material research and design solution helps you transform raw material data into actionable insights, accelerating your discovery-to-development pipeline and ensuring your materials meet the demands of tomorrow’s products.

When you can seamlessly connect experimental data, simulation results and material properties across the entire R&D lifecycle, you’ll foster an environment that promotes data-driven decisions and significantly reduces time-to-market for novel materials.

Integrating the vast research data in Siemens’ AI solutions with the Luma platform reimagines R&D as a lab-in-a-loop, a connected adaptive ecosystem where:

Data flows seamlessly between wet and dry labs
AI guides decisions in real time
Researchers accelerate discoveries without compromising accuracy or compliance

Luma is an AI-native, cloud-first Scientific Intelligence Platform that unites every corner of R&D, from molecules to materials, into one unified workspace. Scientists can design workflows, build apps and trace every decision without touching a line of code, breaking data silos and speeding up discovery.

Luma rapidly converts complex, unstructured instrument data into clean, AI‑ready insights. Through automated ingestion, scientific data modeling and advanced data management, it helps laboratories seamlessly integrate and realize the full value of their instrument data. Luma manages edge capture, high‑volume data parsing and metadata generation, while delivering the modeling layer, intuitive user experiences and live, no‑code reporting that align with your entities, controlled vocabularies and quality standards.

Learn more.

FAQs about data integration and human-in-the-loop science

1. What are the main challenges pharmaceutical companies face due to fragmented data?

Fragmented data in pharmaceutical companies leads to significant issues such as delays, errors, rework and lost information. It also increases scale-up risk, slows industrialization, limits reproducibility and reduces the return on AI investments. This ultimately hinders effective decision making and innovation, costing companies valuable time and resources.

2. How can AI help pharmaceutical companies overcome data fragmentation and improve R&D?

AI can help by enabling automated data acquisition through edge-based agents, streamlining data parsing to create structured, searchable metadata and providing intuitive modeling for scientific data. It also empowers roles with personalized insights, enhances decision making through downstream integration and revolutionizes lab operations with automated quality control and comprehensive data lineage.

3. What is “human-in-the-loop science” in the context of pharmaceutical R&D?

“Human-in-the-loop science” means that people continuously provide feedback to automated or AI systems to improve their performance, accuracy and reliability over time. This approach ensures that scientists, team members and leaders remain actively involved in the R&D process, fostering a continuous learning and adaptive ecosystem rather than managing isolated projects.

4. How does Luma address the challenges of data fragmentation in pharmaceutical R&D?

The Luma platform unifies all aspects of R&D into a single workspace. It facilitates seamless data flow between wet and dry labs, uses AI to guide real-time decisions and accelerates discoveries while ensuring compliance. Luma automates data ingestion, scientific data modeling and advanced data management, converting complex instrument data into clean, AI-ready insights.

1(3) Tufts CSDD Publishes Record Number of Manuscripts & Hosts First Ever Research Symposium to Honor Former Director | LinkedIn

2 Getting Value out of your R&D Data: Outcome-Driven Insights with Luma Lab Connect