A robust and scalable data ingestion system for The Ocean Cleanup

The Ocean Cleanup, a non-profit organization developing advanced technologies to rid the oceans of plastic, has partnered with Xomnia to build its data ingestion infrastructure. To do this, Xomnia built a scalable Azure-based data streaming platform, moving it from prototype to production in under three months.

This infrastructure will help The Ocean Cleanup’s data engineering team centrally ingest and store data collected from various locations across the world. This will eliminate challenges such as fragmentation in the IT landscape, unclear data lineage and scalability issues. Ultimately, this will enable The Ocean Cleanup’s multiple teams to do their jobs in a better and faster manner.

The data ingestion platform does exactly what we need it to do. It is easy to maintain, requires little effort to add new data sources and scales along with our operations. These guys know what they are doing and are just as motivated to deliver quality as we are.
Maarten van Berkel, Project Manager & Data Architect at The Ocean Cleanup

Case

The Ocean Cleanup designs and develops advanced technologies to rid the oceans of plastic. It does this by cleaning up pollution already present in the oceans, and intercepting plastic on its way to the ocean via rivers. Their ocean technology is moved by the ocean forces to passively catch and retain plastic. The Interceptor™, their solution for river debris, is a 100% solar-powered device which autonomously halts and extracts plastic from rivers before it reaches the ocean.

The Ocean Cleanup believes that to develop the most optimal cleanup technologies, it’s essential to truly understand the problem. That is why they have been conducting extensive research using data ranging from low-tech sources (visual counting from bridges) to high-tech sources (automated camera monitoring). The ingestion of these various sources located all over the world is no easy task for the data engineering team. Over time, this has led to some internal challenges: fragmentation in the IT landscape, unclear data lineage and scalability issues.

The Ocean Cleanup needed a data ingestion framework which could scale together with the ever increasing list of sources and which standardized the way data was handled by the rest of the organization.

Solution

Together with The Ocean Cleanup’s data engineering team, we discussed the current architecture and saw how the new ingestion framework would fit in. During these brainstorming sessions, we set out to design a system that would not only work for the current processes, but also for future use cases. This resulted in the following design goals:

  • Standardized: the initial flow of data should be the same for every source
  • Flexible: data should be allowed in a wide variety of formats (files, binary and json)
  • Transparent: Clear trail about the origins of the data
  • Open: Applications should be able to access sources which they are authorized to
  • Maintainable: Tools and libraries already in use and well known to the team

With these design goals in mind, Xomnia's data engineers started working on the ingestion framework. Leveraging on our expertise and experience, we moved from an initial prototype to a productionized system in just three months. During this period, Xomnia closely collaborated with The Ocean Cleanup to ensure a seamless integration into the existing Azure Cloud Environment.

The framework provides an API to which data providers can push their payloads and metadata. These are ingested and stored in a raw form before being posted on a messaging bus. This all happens independent of the data source and does not require updating the framework when new sources are added. Once the data is on the messaging bus applications (e.g. image recognition algorithms, business logic, cleansing & parsing) apply further processing in a streaming manner.

The framework uses the following services: Azure Kubernetes Service (AKS), Azure Service Bus and Azure Blob Storage.

Impact

The framework is already live in the production environment and in use by some data providers. It enables The Ocean Cleanup to keep focusing on the things that matter the most, designing and developing advanced cleanup technology. The ingestion framework has gone fully operational already.

Xomnia has helped overcome the various challenges posed by the wide variety of data sources that The Ocean Cleanup uses. It has built a strong foundation which is flexible by nature, which is a framework that moves with the dynamic environment requiring little to no changes when new data sources are added. It can scale with the volume of data coming in and allows for processing data in a streaming manner.