OVHCloud knows well Novagen and its commitment to innovation, and when they proposed to be part of the early testers of their new product of Data Processing built on top of Apache Spark as a service, Novagen felt “honoured and eager to test it.”
When it comes to select a new technology for our Data activities, Novagen wants the following characteristics to be fully addressed :
- Ability to foster Innovation & Creativity,
- Functionalities, additional value, easy to use,
- Functionalities, additional value, easy to use,
- Efficiency, Cost-effectiveness,
- Intrinsic performances
- Adaptive architectures allow to adjust infrastructure to customer needs,
- Standards and Governance,
- Customers adopt cloud or multi-cloud strategies. Relying on standards limits efforts to deploy on different targets and preserves reversibility
- Compliance:
- Most companies have sensitive data and must know to which regulatory rules their cloud provider must follow.
Novagen, as a Data Consultancy Company, are extensive Apache Spark Users !
Apache Spark is the swiss army knife to process data:
- Works at extremely high scale of data,
- Addresses Data engineering and Data Science,
- Processing of Data at rest and streaming data
- De facto standard for data workloads on-premises and in cloud
- Built-in APIs for Python, Scala, Java and R
Novagen has progressively developed software assets on top of Apache Spark to address recurring challenges :
- ETL processing in Data Lake environnements,
- Quality KPIs on top of Data Lake sources,
- Machine Learning Algorithm for Natural Language Processing, Time Series predictions…
First step : select the best Novagen Use Case
Novagen has first considered the following characteristics of OVHCloud Data processing
⇒ Jobs start after a few seconds (vs minutes to launch a cluster)
⇒ Ability to adjust power dedicated to different spark jobs : start with low power (1 driver and 1 executor with 4 cores and 8Gb of memory) to high scale processing (potential hundreds of cores and Gb of memories)
⇒ A full Compute/Storage separation aligned with standard of cloud architectures, including S3 APIs to access data stored in Object Storage layer.
⇒ Jobs execution and monitoring through Command Line Interface and API
These characteristics led Novagen to chose their Quality Assessment Process as an ideal use case which requires both interactivity and adjustable power: Deliver quality KPIs through spark processes.
Second step : OVHCloud Data Processing at work
The corresponding command generated by the quality software is :
The command which is quite similar to a usual spark-submit, except for the jar path, which requires the binary to be in an Object Storage bucket that is accessed with swift url specification. (NB : this command could have been created with a call to the OVHCloud Data Processing API).
Starting from this point, Novagen can now finely tune its processes portfolio and play with the allocation of different power with little limitation (except quotas of you public cloud project).
Finally, for tuning and post-mortem job analysis, one can take advantage of the saved log files. It is noteworthy that Data Processing offers a real time display of job logs, which is very convenient, and a complementary supervision through Grafana dashboards.
This is a first yet significant test of Data Processing. Until now it proved an excellent match with the Novagen quality process use case and allowed to validate several crucial point when it comes to test a Data solution.
“This is the beginning of this product, and we will have a close look at the upcoming functionalities. OVHCloud team unveiled part of its roadmap, and it looks really promising.”
Novagen : We are Data innovators !!!
⇒ As a consultancy company, We build complete and innovative data strategies for our demanding customers
- Top Fortune Bank, Reglemetary, Retail, Fashion, Transportation,
- BI at extreme scale, Data Lake creation and management, Business innovation with Data Science
⇒ With our Data Lab, We are continuously improving our technology portfolio :
- Selecting, assessing, benchmarking solutions,
- Developing ‘boosters’ : ready to deploy or customized data assets.
We periodically communicate about our innovations. For instance :
- How to leverage Kubernetes to empower your multi-cloud strategy,
- Apache Kylin, Apache Druid : technologies for ultra high scale Business Intelligence
- Data science automation, from notebooks to operational Machine Learning Models