IntroductionΒΆ

As of today, data scientists and modern data engineers use a variety of Python- and R-Project-modules both open- and closed-source to create relevant insights based on data from various sources. This ecosystem is fueled by computational libraries like Numpy, Pandas, Scikit-Learn and a wealth of libraries for visualization, and interactive notebooks like Bokeh and Jupyter.

However these packages do not support the data engineer or data scientist in workflow automation, insight distribution, collaboration, and packaging. Instead the Python data and analytics scene is more focused on the ad hoc analysis aspect to answer single, specific business questions. If there is a need for broader business applications, incorporating more than a single business question, it is usually left to other teams like backend developers, frontend developers and system administrators.

core4os fills this gap and enables the community to easily integrate their processing chain of creating such insights into a fault-tolerant distributed system and thereby automating the whole process without having to think about underlying software or hardware.

The core4os framework takes care of everything that is essential to using and operating such a distributed system, from central logging and configuration to deployment, all this while scaling to the 100ths of servers.

Note

For brevity and in line with the package name of core4os this documentation uses both terms core4os and core4 interchangibly refering to the same platform: core4os.

Continue reading with

  • a high-level feature summary of core4os
  • concrete examples when and how to use core
  • further details on why core4os from
    1. data engieering perspective,
    2. data science perpective and
    3. business user and application perspective
Further information on python in data and analytics: