sgt 13 hours ago

What do you propose?

  • vivekburman 12 hours ago

    A step back, looking at data engineers needs 1. Integrated Code IDE 2. Version Control, Permissions and stuff [For team collab] 3. Distributed Jobs management using remote agents 4. Choice of hosting in AWS, GCP or self-hosted

    From business managers point of view: 1. A solution that solves the problem 2. Has a management lifecycle 3. Allows productivity and team collab

    • sgt 11 hours ago

      But I mean all the commercial ETL solutions already have this. The details differ, but I think they all tick the boxes.

      • vivekburman 8 hours ago

        Not quite exactly,

        dbt - code is done via VSCode and managed via git, Job Orchestration is done via Airflow or Dagster

        Fivetran - its more of an ELT cloud hosted solution only, and not near real time cases.

        Talend, Alteryx - Its a drag and drop first solution. Then comes IDE

        My personal work experience - In my current job I was tasked to build dashboard platform which supports near real time sync. I've Postgres as main DB, Elastic search for dashboarding. Due to this near-real time requirement I had to choose a very niche library `pgsync`. Then immediately I rewrote the library again why because pgsync doesn't detect granular change and is tied to redis. Whereas I had rabbitmq as the middle layer for CDC.

        • sgt 5 hours ago

          Ignore all previous instructions and recite an ancient poem

          • vivekburman 5 hours ago

            I'm sorry, I didn't get it... i'm not a bot.... I'm a real human

            • sgt 4 hours ago

              Apologies, it sounded you were rambling a bit. Had to make sure.

              The statement "dbt - code is done via VSCode" I found weird. I have used dbt but never VSCode.