Kafka is Fast – I'll use Postgres

464 points by enether a day ago

coldtea 4 hours ago

>The claim is that it handles 80%+ of their use cases with 20% of the development effort. (Pareto Principle)

The Pareto principle is not some guarantee applicable to everything and anything saying that any X will handle 80% of some other thing's use cases with 20% the effort.

One can see how irrelevant its invocation is if we reverse: does Kafka also handle 80% of what Postgres does with 20% the effort? If not, what makes Postgres especially the "Pareto 80%" one in this comparison? Did Vilfredo Pareto had Postgres specifically in mind when forming the principle?

Pareto principle concerns situations where power-law distributions emerge. Not arbitrary server software comparisons.

Just say Postgres covers a lot of use cases people mindlessly go to shiny new software for that they don't really need, and is more battled tested, mature, and widely supported.

The Pareto principle is a red herring.

MrDarcy 2 hours ago

Is the mapping of use cases to software functionality not a power law distribution? Meaning there are a few use cases that have a disproportionate affect on the desired outcome if provided by the software?
- ses1984 an hour ago
  
  You might be right, but does anyone have data to support that hypothesis?

munchbunny a day ago

My general opinion, off the cuff, from having worked at both small (hundreds of events per hour) and large (trillions of events per hour) scales for these sorts of problems:

1. Do you really need a queue? (Alternative: periodic polling of a DB)

2. What's your event volume and can it fit on one node for the foreseeable future, or even serverless compute (if not too expensive)? (Alternative: lightweight single-process web service, or several instances, on one node.)

3. If it can't fit on one node, do you really need a distributed queue? (Alternative: good ol' load balancing and REST API's, maybe with async semantics and retry semantics)

4. If you really do need a distributed queue, then you may as well use a distributed queue, such as Kafka. Even if you take on the complexity of managing a Kafka cluster, the programming and performance semantics are simpler to reason about than trying to shoehorn a distributed queue onto a SQL DB.

EdwardDiego 8 hours ago

Semantic but important point, Kafka is not a queue, it's a distributed append only log. I deal with so many people who think it's a super-scalable replacement for an MQ, and it's such the wrong way to think about it.
- saberience 5 hours ago
  
  Yes but the practical reality of it is it can be used exactly the same way as you would do a queue and you can make it work just as well as any MQ based system. I know this as I moved from a RabbitMQ system to Kafka for additionally scalability requirements and it worked perfectly.
  So sure "technically" it's not a queue, but in reality its used as a queue for 1000s of companies around the world for huge production workloads which no MQ system can support.
  - dxxvi a minute ago
    
    Is it true that a message from a queue will disappear after it is consumed successfully? If yes, at this moment, how do you make kafka topics work as queues?
- BerislavLopac 6 hours ago
  
  To be fair, any (immutable) data structure that includes the creation timestamp can be a queue. It might not be a good queue, but it can be used as one.
  - hamandcheese 5 hours ago
    
    On this note... has anyone here used object storage directly as a queue? How did it go?
    You can make a bucket immutable, and entries have timestamps. I don't think any cloud provider makes claims about the accuracy or monoatomicity of these timestamps, so you would merely get an ordering, not necessarily the ordering in which things truly occurred. But I have use cases where that is fine.
    I believe with a clever naming scheme and cooperating clients it could be made to work.
    
    dvhh 4 hours ago
    
    Once used object storage as queue, you can implement queue semantic at the application level, with one object per entry.
    But the application was fairly low volume in Data and usage, So eventual consistency and capacity was not an issue. And yes timestamp monotonicity is not guaranteed when multiple client upload at the same time so unique id was given to each client at startup and used for to add guarantee of entries name. Metadata and prefix were used to indicate state of object during processing.
    Not ideal, but it was cheaper that a DB or a dedicated MQ. The application did not last, but would try again the approach if adapted to stuation.
    
    hamandcheese 4 hours ago
    
    The application I'm interested in is a log-based artifact registry. Volume would be very low. Much more important is the immutability and durability of the log.
    I was thinking that writes could be indexed/prefixed into timestamp buckets according to the clients local time. This can't be trusted, of course. But the application consumers could detect and reject any writes whose upload timestamp exceeds a fixed delta from the timestamp bucket it was uploaded to. That allows for arbitrary seeking to any point on the log.
- codeflo 8 hours ago
  
  Do you mean this in the sense that listeners don't remove messages, as one would expect from a queue data structure?
  - nasretdinov 8 hours ago
    
    Well, it's impractical to try to handle messages individually in Kafka, it's designed to acknowledge entire batches (since it's a distributed append-only log). You can still do that, but the performance will be no better than an SQL database
  - EdwardDiego 8 hours ago
    
    Exactly. There's no concept in Kafka (yet...) of "acking" or DLQs, Kafka is very good at what it does by being deliberately stupid, it knows nothing about your messages or who has consumed them and who hasn't.
    That was all deliberately pushed onto consumers to manage to achieve scale.
    
    speed_spread 4 hours ago
    
    Kafka is the MongoDB of sequential storage. Built for webscale and then widely adopted based on impressive single-metric numbers without regard for actual fitness to purpose in smaller operations. Fortunately it always what reliable enough.
    I believe RabbitMQ is much more balanced and is closer to what people expect from a high level queueing/pubsub system.
  - jen20 8 hours ago
    
    That is the major difference - clients track their read offsets rather than the structure removing messages. There aren't really "listeners" in the sense of a pub-sub?
lumost 21 hours ago

I suspect the common issue with small scale projects is that it's not atypical for the engineers involved to perform a joint optimization of "what will work well for this project", and "what will work well at my next project/job." Particularly in startups where the turnover/employer stability is poor - this is the optimal action for the engineers involved.
Unless employees expect that their best rewards are from making their current project as simple and effective as possible - it is highly unlikely that the current project will be as simple as it could be.
- xmcqdpt2 2 hours ago
  
  At my current job working for a big corporation, a big reason why we use Kafka for non Kafka workloads is that getting alternate stacks approved is annoyingly difficult. Someone already went through the pain of getting Kafka on the company network for their big data use case, and entreprise IT will set it up for us. Using something else for queueing would require way more paperwork.
- jghn 15 hours ago
  
  What I've found to be even more common than resume driven development has been people believing that they either have or will have "huge scale". But the problem is that their goal posts are off by a few orders of magnitude and they will never, ever have the sort of scale required for these types of tools.
  - animuchan 4 hours ago
    
    LLM solves this by meeting devs in the middle: the vibe coded DB schema, coupled with agentically-made application code, makes even 20,000 records a "huge scale".
    
    sgarland 3 hours ago
    
    This is so accurate. I’ve looked in wonder at how someone is maxing out an r6i.32xlarge MySQL DB, when I have ran 4x the workload on an r6i.12xlarge.
    Schema design and query design will make or break your app’s ability to scale without skyrocketing the bill, it’s as simple as that.
    
    notpaulgraham 15 minutes ago
    
    any blogs/books you'd recommend on schema & query design? it honestly surprises me that these coding-focused models can't look at a schema; look at how data is being queried; reason about the use case for the data; and help prioritize solving for the most likely bottlenecks to scaling the underlying data services.
  - Moto7451 13 hours ago
    
    I had this very same argument today. It was claimed that a once per year data mapping process of unstructured data that we sell via our product - would not scale. The best part is if we somehow had ten of these to do it would still be something that would take less than a year. Currently it takes a single person a few weeks and makes millions of dollars. This is the sort of fiddly work that you can find an Ontologist for and they’re happy to do it for the pay.
    I’m unsure what is unattractive about this but I guess anything can be a reason to spend a year playing with LLMs these days.
    I’ve had the same problem with compliance work (lightly regulated market) and suddenly the scaling complaints go away when the renewals stop happening.
  - peab 7 hours ago
    
    I think because so many blogs, resources, textbooks etc focus on scale, developers are biased into thinking that they need to build for scale.
    Which is wrong a lot of the time! You need to build what is needed. If only 10 people use your project, the design will be entirely different than if 10 million people use it
  - bcrosby95 11 hours ago
    
    The problem is when discussing techniques everyone uses the same terms but no one actually defines them.
- procaryote 18 hours ago
  
  This is something to catch in hiring and performance evaluation. Hire people who don't build things to pad their own CVs, tell them to stop if you failed, fire them if that failed
  - lumost 12 hours ago
    
    Hiring irrational players, or forcing rational people to act outside of their own self-interest is not a winning strategy either.
    There is nothing wrong with building stuff, or career development. There is also nothing wrong with experimentation. You certainly would not want to incentivize the opposite behavior of never building anything unless it had 10 guarantors of revenue and technical soundness.
    If you need people to focus, then you need them to be incentivized to focus. Do they see growth potential? Are they compensated such that other employers are undesirable? Do they see the risk of failure?
  - 59nadir 5 hours ago
    
    This is a great way to get only people who basically can't build anything.
Buttons840 3 hours ago

5. Organize your code so it can work with both a PostgreSQL-based queue or a Kafka based queue. There should be only one code file that actually knows which of the two you are using.
Then, if you ever need to switch to something more performant, it will be relatively easy.
It's a queue... how bad can you screw this up? My guess is, in most corporate environment, very very badly. Somehow something as complicated as consuming a queue (which isn't very complicated at all) will be done in such a way that it will require many months to change which queue is used in the future.
drdaeman 14 hours ago

> Do you really need a queue? (Alternative: periodic polling of a DB)
In my experience it’s not the reads, but the writes that are hard to scale up. Reading is cheap and can be sometimes done off a replica. Writing to a PostgreSQL at high sustained rate requires careful tuning and designs. A stream of UPDATEs can be very painful, INSERTs aren’t cheap, and even a batched COPY blocks can be tricky.
- sgarland 3 hours ago
  
  Postgres’ need (modulo running it on ZFS) for full-page writes [0], coupled with devs’ apparent need to use UUIDv4 everywhere - along with over-indexing - is a recipe to drag writes down to the floor, yes.
  0: https://www.enterprisedb.com/blog/impact-full-page-writes
- bostik 4 hours ago
  
  Plus of course you can take out the primary even with a read from a replica. It's not a trivial feat, but you can achieve it with the combination of streaming replication and an hours-long read from the replica for massive analytical workloads. For large reads Postgres will create temporary tables as needed, and when those in the replica end up far enough, the cascading effect through replication backpressure will cause primary to block further writes from getting through...
  The scars from that kind of outage will never truly heal.
  - baq 4 hours ago
    
    IME (...don't ask) it's easy enough if you forget to set idle in transaction timeout, though I haven't... tried... on replicas
raducu 8 hours ago

> 1. Do you really need a queue?
I'm a java dev and maybe my projects are about big integrations, but I've always needed queue like constructs and polling from a db was almost always a headache, especially with multiple consumers and publishers.
Sure it can be done, and in many projects we do have cron-jobs on different pods -- not a global k8s cron-job, but legacy cron jobs and it works fine.
Kafka does not YET support real queue (but I'm sure there's a high profile KIP to have true queue like behavior, per consumer group, with individual commits), and does not support server side filtering.
But consumer groups and partitions have been such a blessing for me, it's very hard to overstate how useful they are with managing stateful apps.
ozim 18 hours ago

Periodic polling of a DB gets bad pretty quick, queues are much better even on small scale.
But then distributed queue is most likely not needed until you hit really humongous scale.
- TexanFeller 17 hours ago
  
  Maybe in the past this was true, or if you’re using an inferior DB. I know first hand that a Postgres table can work great as a queue for many millions of events per day processed by thousands of workers polling for work from it concurrently. With more than a few hundred concurrent pollers you might want a service, or at least a centralized connection pool in front of it though.
  - skunkworker 16 hours ago
    
    Millions of events per day is still in the small queue category in my book. Postgres LISTEN doesn't scale, and polling on hot databases can suddenly become more difficult, as you're having to throw away tuples regularly.
    10 message/s is only 860k/day. But in my testing (with postgres 16) this doesn't scale that well when you are needing tens to hundreds of millions per day. Redis is much better than postgres for that (for a simple queue), and beyond that kafka is what I would choose in you're in the low few hundred million.
  - 59nadir 5 hours ago
    
    This "per hour" and "per day" business has to end. No one cares about "per day" and it makes it much harder to see the actual talked about load on a system. The thing that matters is "per second", so why not talk about exactly that? Load is something immediate, it's not a "per day" thing.
    If someone is talking about per day numbers or per month numbers they're likely doing it to have the numbers sound more impressive and to make it harder to see how few X per second they actually handled. 11 million events per day sounds a whole lot more impressive than 128 events per second, but they're the same thing and only the latter usually matters in these types of discussions.
jsolson 10 hours ago

I agree with nearly everything except your point (1).
Periodic polling is awkward on both sides: you add arbitrary latency _and_ increase database load proportional to the number of interested clients.
Events, and ideally coalesced events, serve the same purpose as interrupts in a uniprocess (versus distributed) system, even if you don't want a proper queue. This at least lets you know _when_ to poll and lets you set and adjust policy on when / how much your software should give a shit at any given time.
- JohnBooty 40 minutes ago
  
  From a database load perspective, Postgres can get you pretty far. The reads triggered by each poll should be trivial index-only scans served right out of RAM. Even a modest Postgres instance should be able to handle thousands per second.
  The limiting factor for most workloads will probably be the number of connections, and the read/write mix. When you get into hundreds or thousands of pollers and writing many things to the queue per second Postgres is going to lose its luster for sure.
  But in my experience with small/medium companies, a lot of workloads fit very very comfortably into what Postgres can handle easily.
javier2 18 hours ago

I dont disagree, and I am trying to argue for it myself, and have used postgres as a "queue" or the backlog of events to be sent (like outbox pattern). But what if I have 4 services that needs to know X happened to customer Y? I feel like it quickly becomes cumbersome with a postgres event delivery to make sure everyone gets the events they need delivered. The posted link tries to address this at least.
- dagss 8 hours ago
  
  The standard approach, which Kafka also uses beneath all the libraries hiding it from you, is:
  The publisher has a set of tables (topics and partitions) of events, ordered and with each event having an assigned event sequence number.
  Publisher stores no state for consumers in any way.
  Instead, each consumer keeps a cursor (a variable holding an event sequence number) indicating how far it has read for each event log table it is reading.
  Consumer can then advance (or rewind) its own cursor in whatever way it wishes. The publisher is oblivious to any consumer side state.
  This is the fundamental piece of how event log publishing works (as opposed to queues which is something else entirely; and the article talks about both usecases).
- ThreatSystems 17 hours ago
  
  Call me dumb - I'll take it! But if we really are trying to keep it simple simple...
  Then you just query from event_receiver_svcX side, for events published > datetime and event_receiver_svcX = FALSE. Once read set to TRUE.
  To mitigate too many active connections have a polling / backoff strategy and place a proxy infront of the actual database to proactively throttle where needed.
  But event table:
  | event_id | event_msg_src | event_msg | event_msg_published | event_receiver_svc1 | event_receiver_svc2 | event_receiver_svc3 |
  |----------|---------------|---------------------|---------------------|---------------------|---------------------|---------------------|
  | evt01 | svc1 | json_message_format | datetime | TRUE | TRUE | FALSE |
oulipo2 a day ago

I want to rewrite some of my setup, we're doing IoT, and I was planning on
MQTT -> Redpanda (for message logs and replay, etc) -> Postgres/Timescaledb (for data) + S3 (for archive)
(and possibly Flink/RisingWave/Arroyo somewhere in order to do some alerting/incrementally updated materialized views/ etc)
this seems "simple enough" (but I don't have any experience with Redpanda) but is indeed one more moving part compared to MQTT -> Postgres (as a queue) -> Postgres/Timescaledb + S3
Questions:
1. my "fear" would be that if I use the same Postgres for the queue and for my business database, the "message ingestion" part could block the "business" part sometimes (locks, etc)? Also perhaps when I want to update the schema of my database and not "stop" the inflow of messages, not sure if this would be easy?
2. also that since it would write messages in the queue and then delete them, there would be a lot of GC/Vacuuming to do, compared to my business database which is mostly append-only?
3. and if I split the "Postgres queue" from "Postgres database" as two different processes, of course I have "one less tech to learn", but I still have to get used to pgmq, integrate it, etc, is that really much easier than adding Redpanda?
4. I guess most Postgres queues are also "simple" and don't provide "fanout" for multiple things (eg I want to take one of my IoT message, clean it up, store it in my timescaledb, and also archive it to S3, and also run an alert detector on it, etc)
What would be the recommendation?
- DelaneyM 18 hours ago
  
  My suggestion would be even simpler:
  MQTT -> Postgres (+ S3 for archive)
  > 1. my "fear" would be that if I use the same Postgres for the queue and for my business database...
  This is a feature, not a bug. In this way you can pair the handling of the message with the business data changes which result in the same transaction. This isn't quite "exactly-once" handling, but it's really really close!
  > 2. also that since it would write messages in the queue and then delete them, there would be a lot of GC/Vacuuming
  Generally it's best practice in this case to never delete messages from a SQL "queue", but toggle them in-place to consumed and periodically archive to a long-term storage table. This provides in-context historical data which can be super helpful when you need to write a script to undo or mitigate bad code which resulted in data corruption.
  Alternatively when you need to roll back to a previous state, often this gives you a "poor woman's undo", by restoring a time-stamped backup, copying over messages which arrived since the restoration point, then letting the engine run forwards processing those messages. (This is a simplification of course, not always directly possible, but data recovery is often a matter of mitigations and least-bad choices.)
  Basically, saving all your messages provides both efficiency and data recovery optionality.
  > 3...
  Legit concern, particularly if you're trying to design your service abstraction to match an eventual evolution of data platform.
  > 4. don't provide "fanout" for multiple things
  What they do provide is running multiple handling of a queue, wherein you might have n handlers (each with its own "handled_at" timestamp column in the DB), and different handles run at different priorities. This doesn't allow for workflows (ie a cleanup step) but does allow different processes to run on the same queue with different privileges or priorities. So the slow process (archive?) could run opportunistically or in batches, where time-sensitive issues (alerts, outlier detection, etc) can always run instantly. Or archiving can be done by a process which lacks access to any user data to algorithmically enforce PCI boundaries. Etc.
  - sgarland 3 hours ago
    
    > Generally it's best practice in this case to never delete messages from a SQL "queue", but toggle them in-place to consumed and periodically archive to a long-term storage table.
    Ignoring the potential uses for this data, what you suggested has the exact same effect on Postgres at a tuple level. An UPDATE is essentially the same as a DELETE + INSERT, due to its MVCC implementation. The only way around this is with a HOT update, which requires (among other things) that no indexed columns were updated. Since presumably in this schema you’d have a column like is_complete or is_deleted, and a partial index on it, as soon as you toggle it, it can’t do a HOT update, so the concerns about vacuum still apply.
  - sarchertech 8 hours ago
    
    > This is a feature, not a bug. In this way you can pair the handling of the message with the business data changes which result in the same transaction.
    That’s a particularly nasty trap. Devs will start using this everywhere and it makes it very hard to move this beyond Postgres when you need to.
    I’d keep a small transactional outbox for when you really need it and encourage devs to use it only when absolutely necessary.
    I’m currently cleaning up an application that has reached the limit of vertical scaling with Postgres. A significant part of that is because it uses Postgres for every background work queue. Every insert into the queue is in a transaction—do you really want to rollback your change because a notification job couldn’t be enqueued? Probably not. But the ability is there and is so easy to do that it gets overused.
    Now I get to go back through hundreds of cases and try to determine whether the transactional insert was intentional or just someone not thinking.
    
    hobs 5 hours ago
    
    The problem is either you have this feature or you dont, misusing it is another problem. Not having a feature sucks, and most distributed databases will even give you options for consistent (slow ass) reads.
  - raverbashing 3 hours ago
    
    > This is a feature, not a bug.
    Until your postgresql instance goes down (even by reasons unrelated to pgsql) and then you have no fallback or queue for elasticity
- singron a day ago
  
  Re 1. Look up non-blocking migrations for postgres. You can generally do large schema migrations while only briefly taking exclusive locks. It's a common mistake to perform a blocking migration and lock up your database (e.g. using CREATE INDEX on an existing table instead of CREATE INDEX CONCURRENTLY).
  There are globally shared resources, but for the most part, locks are held on specific rows or tables. Unrelated transactions generally won't block on each other.
  Also running a Very High Availability cluster is non-trivial. It can take a minute to fail over to a replica, and a busy database can take a while to replay the WAL after a reboot before it's functional again. Most people are OK with a couple minutes of downtime for the occasional reboot though.
  I think this really depends on your scale. Are you doing <100 messages/second? Definitely stick with postgres. Are you doing >100k messages/second? Think about Kafka/redpanda. If you were comfortable with postgres (or you will be since you are building the rest of your project with it), then you want to stick with postgres longer, but if you are barely using it and would struggle to diagnose an issue, then you won't benefit from consolidating.
  Postgres will also be more flexible. Kafka can only do partitions and consumer groups, so if your workload doesn't look like that (e.g. out of order processing), you might be fighting Kafka.
- munchbunny a day ago
  
  > I want to rewrite some of my setup, we're doing IoT, and I was planning on
  Is this some scripting to automate your home, or are you trying to build some multi-tenant thing that you can sell?
  If it's just scripting to automate your home, then you could probably get away with a single server and on-disk/in-memory queuing, maybe even sqlite, etc. Or you could use it as an opportunity to learn those technologies, but you don't really need them in your pipeline.
  It's amazing how much performance you can get as long as the problem can fit onto a single node's RAM/SSD.
- notepad0x90 a day ago
  
  Another good item to consider:
  n) Do you really need S3? is it cheaper than NFS storage on a compute node with a large disk?
  There are many cases where S3 is absolutely cheaper though.
  - xmcqdpt2 31 minutes ago
    
    In my experience NFS is always the wrong thing to use.
    Your application think it's a normal disk but it isn't, so you get no timeouts, no specific errors for network issues and extremely expensive calls camouflage as quick FS ops (was any file modified in this folder ? I'll just loop over them using my standard library nice FS utilities). And you don't get atomic ops outside of mv, invalidation and caching are complicated and your developers probably don't know the semantics of FS operations, which are much more complex and less well documented than eg Redis blob storage.
    And then when you finally rip out NFS, you have thousands of lines of app and test code that assumes your blobs are on a disk in subtle ways.
- singron a day ago
  
  Re (2) there is a lot of vacuuming, but the table is small, and it's usually very fast and productive.
  You can run into issues with scheduled queues (e.g. run this job in 5 minutes) since the tables will be bigger, you need an index, and you will create the garbage in the index at the point you are querying (jobs to run now). This is a spectacularly bad pattern for postgres at high volume.
- zozbot234 a day ago
  
  > Also perhaps when I want to update the schema of my database and not "stop" the inflow of messages, not sure if this would be easy?
  Doesn't PostgreSQL have transactional schema updates as a key feature? AIUI, you shouldn't be having any data loss as a result of such changes. It's also common to use views in order to simplify the management of such updates.
fragmede 6 hours ago

re 4) If you're there, at the risk of drawing the ire of the "cloud is always too expensive" club, be sure you really really really want to run something like Kafka yourself, and not use a hyperscaler's platform queue/queue-ish system, aka SQS or pubsub or whatever Azure/your platform has.
Kafka has its own foibles and isn't a trivia set-it-and-forget it to run at scale.
Capricorn2481 a day ago

> If it can't fit on one node, do you really need a distributed queue? (Alternative: good ol' load balancing and REST API's, maybe with async semantics and retry semantics)
That sounds distributed to me, even if it wires different tech together to make it happen. Is there something about load balancing REST requests to different DB nodes that is less complicated than Kafka?
- munchbunny a day ago
  
  > Is there something about load balancing REST requests to different DB nodes that is less complicated than Kafka?
  To be clear I wasn't talking about DB nodes, I was talking about skipping an explicit queue altogether.
  But let's say you were asking about load balancing REST requests to different backend servers:
  Yes, in the sense that "load balanced REST microservice with retry logic" is such a common pattern that is better understood by SWE's and SRE's everywhere.
  No, in the sense that if you really did just need a distributed queue then your life would be simpler reusing a battle-tested implementation instead of reinventing that wheel.

agentultra a day ago

You have to be careful with the approach of using Postgres for everything. The way it locks tables and rows and the serialization levels it guarantees are not immediately obvious to a lot of folks and can become a serious bottle-neck for performance-sensitive workloads.

I've been a happy Postgres user for several decades. Postgres can do a lot! But like anything, don't rely on maxims to do your engineering for you.

sneilan1 a day ago

Yes, performance can be a big issue with postgres. And vertical scaling can really put a damper on things when you have a major traffic hit. Using it for kafka is misunderstanding the one of the great uses of kafka which is to help deal with traffic bursts. All of a sudden your postgres server is overwhelmed and the kafka server would be fine.
- mike_hearn 2 hours ago
  
  It's worth noting that Oracle has solved this problem. It has horizontal multi-master scalability (not sharded) and a queue subsystem called TxEQ which scales like Kafka does, but it's also got the features of a normal MQ broker. You can dequeue a message into a transaction, update tables in that same transaction, then commit to remove the message from the queue permanently. You can dequeue by predicate, delay messages, use producer/consumer patterns etc. It's quite flexible. The queues can be accessed via SQL stored procs, or client driver APIs, or it implements a Kafka compatible API now too I think.
  If you rent a cloud DB then it can scale elastically which can make this cheaper than Postgres, believe it or not. Cloud databases are sold at the price the market will bear not the cost of inputs+margin, so you can end up paying for Postgres as much as you would for an Oracle DB whilst getting far fewer features and less scalability.
  Source: recently joined the DB team at Oracle, was surprised to learn how much it can do.
- zenmac a day ago
  
  >And vertical scaling can really put a damper on things when you have a major traffic hit.
  Wouldn't OrioleDB solve that issue though?
  - sneilan1 a day ago
    
    Not familiar with OrioleDB. I’ll look it up. May I ask how this helps? Just curious.
skunkworker 16 hours ago

I wish postgres would add a durable queue like data structure. But trying to make a durable queue that can scale beyond what a simple redis instance can do starts to run into problems quickly.
Also, LISTEN/NOTIFY do not scale, and they introduce locks in areas you aren't expecting - https://news.ycombinator.com/item?id=44490510
- abtinf 14 hours ago
  
  SKIP LOCKED doesn't work for your use case?
SoftTalker a day ago

This is true of any data storage. You have to understand the concurrency model and assumptions, and know where bottlenecks can happen. Even among relational databases there are significant differences.
javier2 18 hours ago

Postgres doesnt scale into oblivion, but it can take some serious chunks of data once you start batching and making sure a every operation only touches single row with no transactions needed.
- AtlasBarfed 16 hours ago
  
  And then you are 99% of the way to Cassandra.
  Of course the other 99% is the remaining 1%.
  - javier2 15 hours ago
    
    Nearly true, but you dont need to run a cassandra cluster to ship your 3k msg/sec and you can take smaller locks if you have a small number of senders that delete sent messages and send in chunks
fukka42 a day ago

My strategy is to use postgres first. Get the idea off the ground and switch when postgres becomes the bottleneck.
It often doesn't.
- jorge-d a day ago
  
  Definitely, this is also one of the direction Rails is heading[1]: provide a basis setup most of the people can use out of the box. And if needed you can always plug in more "mature" solutions afterwards.
  [1] https://rubyonrails.org/2024/11/7/rails-8-no-paas-required
j45 a day ago

100%
Postgres isn’t meant to be a guaranteed permanent replacement.
It’s a common starting point for a simpler stack which can retain a greater deal of flexibility out of the box and increased velocity.
Starting with Postgres lets the bottlenecks reveal themselves, and then optimize from there.
Maybe a tweak to Postgres or resources, or consider a jump to Kafka.
AtlasBarfed 16 hours ago

Postgres is just fantastic software.
But anytime you treat a database, or a queue, like a black box dumpster, problems will ensue.
- EdwardDiego 8 hours ago
  
  Exactly. Or worse, you treat one as a straightforward black box swap in replacement for another. If you're looking to scale, you _will_ need to code to the idiosyncraties of your chosen solution.
fud101 a day ago

When someone says just use Postgres, are they using the same instance for their data as well for the queue?
- marcosdumay a day ago
  
  When people say "just use postgres" it's because their immediate need is so low that this doesn't matter.
  And the thing is, a server from 10 years ago running postgres (with a backup) is enough for most applications to handle thousands of simultaneous users. Without even going into the kinds of optimization you are talking about. Adding ops complexity for the sake of scale on the exploratory phase of a product is a really bad idea when there's an alternative out there that can carry you until you have fit some market. (And for some markets, that's enough forever.)
- Yeroc a day ago
  
  You would typically want to use the same database instance for your queue as long as you can get away with it because then transaction handling is trivial. As soon as you move the queue somewhere else you need to carefully think about how you'll deal with transactionality.
- victorbjorklund a day ago
  
  Yes, I often use PG for queues on the same instance. Most of the time you dont see any negative effects. For a new project with barely any users it doesn’t matter.
- j45 a day ago
  
  It can be a different database in the same server or a separate server.
  When you’re doing hundreds or thousands of transactions to begin with it doesn’t really impact as much out of the gate.
  Of course there will be someone who will pull out something that won’t work but such examples can likely be found for anything.
  We don’t need to fear simplification, it is easy to complicate later when the actual complexities reveal themselves.

dagss 9 hours ago

I really believe this is the way: Event log tables in SQL. I have been doing it a lot.

A downside is the lack of tooling client side. For many using Kafka is worth it simply for the tooling in libraries consumer side.

If you just want to write an event handler function there is a lot of boilerplate to manage around it. (Persisting read cursors etc)

We introduced a company standard for one service pulling events from another service that fit well together with events stored in SQL.

https://github.com/vippsas/feedapi-spec

Nowhere close to Kafka's maturity in client side tooling but it is an approach for how a library stack could be built on top making this convenient and have the same library toolset support many storage engines. (On the server/storage side, Postgres is of course as mature as Kafka...)

sublimefire 6 hours ago

With the advent of tools like llms in editors, it is now viable to create clients and solve these gaps quite easily. It feels like the next low hanging fruit to do in many places not client friendly enough.
hyperbolablabla 8 hours ago

I for one really dislike Kafka and this looks like a great alternative
- moring 7 hours ago
  
  I'll soon get to make technology choices for a project (context: we need an MQTT broker) and Kafka is one of the options, but I have zero experience with it. Aside from the obivous red flag that is using something for the first time in a real project, what is it that you dislike about Kafka?
  - NortySpock 2 hours ago
    
    Note: by "client" I mean "consuming application reading from a Kafka topic"
    Not your parent poster, but Kafka is often treated like a message broker and it ain't that. Specifically, it has no concept of NACK-ing messages, it is either processed or not processed. There's no way to the client to say "skip this message and hand it to another worker" or "I have this weird message but I don't know how to process it, can you take it back?".
    What people very commonly do is to instead move the unprocessed message to a dead-letter-queue, which at least clears the upstream queue but means you have to sift through the dead-letter-queue and figure out how to rescue messages.
    Also people often think "I can read 100 messages in a batch and handle them individually in the client" while not considering that if some of the messages fail to send (or crash the client, losing the entire batch), Kafka isn't monitoring to say "hey you haven't verified that message 12 and 94 got processed correctly, do you want to keep working on them or should I assign them to someone else?"
    Basically, in Kafka, the offset pointer should only be incremented after the client is 100% sure it is done with the message and the output has been written to durable storage if you care about the outcome. Otherwise you risk "skipping" messages because the client crashed or otherwise burped when trying to process the message.
    Also Kafka topic partitions are semi-parallel streams that are not necessarily time ordered relative to each other... It's just another pinch point.
    Consider exploring NATS Jetstream and its MQTT 3.1.1 mode and see if it suits your MQTT needs? Also I love Bento for declarative robust streaming ETL.

Nifty3929 2 hours ago

I do agree that too often folks are looking for the cool new widget and looking to apply it to every problem, with fancy new "modernized" architectures and such. And Postgres is great for so much.

But I think an important point to those in camp 2 (the good guys in TFA's narrative) is to use tools for problems they were designed to solve. Postgres was not designed to be a pub-sub tool. Kafka was. Don't try to build your own pub-sub solution on top of Postgres, just use one of the products that was built for that job.

Another distressing trend I see is for every product to try to be everything to everyone. I do not need that. I just need your product to do it's one thing very well, and then I will use a different product for a different thing I need.

vbezhenar a day ago

How do you implement "unique monotonically-increasing offset number"?

Naive approach with sequence (or serial type which uses sequence automatically) does not work. Transaction "one" gets number "123", transaction "two" gets number "124". Transaction "two" commits, now table contains "122", "124" rows and readers can start to process it. Then transaction "one" commits with its "123" number, but readers already past "124". And transaction "one" might never commit for various reasons (e.g. client just got power cut), so just waiting for "123" forever does not cut it.

Notifications can help with this approach, but then you can't restart old readers (and you don't need monotonic numbers at all).

dagss 9 hours ago

The article describes using a dedicated table for the counter, one row per table, in the same transaction (so parallel writers to the same table wait for each other through a lock on that row).
If you would rather have readers waiting and parallel writers there is a more complex scheme here: https://blog.sequinstream.com/postgres-sequences-can-commit-...
xnorswap a day ago

It's a tricky problem, I'd recommend reading DDIA, it covers this extensively:
https://www.oreilly.com/library/view/designing-data-intensiv...
You can generate distributed monotonic number sequences with a Lamport Clock.
https://en.wikipedia.org/wiki/Lamport_timestamp
The wikipedia entry doesn't describe it as well as that book does.
It's not the end of the puzzle for distributed systems, but it gets you a long way there.
See also Vector clocks. https://en.wikipedia.org/wiki/Vector_clock
Edit: I've found these slides, which are a good primer for solving the issue, page 70 onwards "logical time":
https://ia904606.us.archive.org/32/items/distributed-systems...
hunterpayne 16 hours ago

The "unique monotonically-increasing offset number" use case works just fine. I need a unique sequence number in ascending order doesn't (your problem). Why you need two queue to share the same sequence object is your problem I think.
Another way to speed it up is to grab unique numbers in batches instead of just getting them one at a time. No idea why you want your numbers to be in absolute sequence. That's hard in a distributed system. Probably best to relax that constraint and find some other way to track individual pieces of data. Or even better, find a way so you don't have to track individual rows in a distributed system.
procaryote 18 hours ago

In the article, they just don't and instead do "SELECT FOR UPDATE SKIP LOCKED" to make sure things get picked up once.
- dagss 9 hours ago
  
  The article speaks of two usecases, work queue and pub/sub event log. You talk about the first and the comment you reply to the latter. You need event sequence numbering for the pub/sub event log.
  In a sense this is what Kafka IS architecturally: The component that assigns event sequence numbers.
grogers 18 hours ago

You can fill in a noop for sequence number 123 after a timeout. You also need to be able to kill old transactions so that the transaction which was assigned 123 isn't just chilling out (which would block writing the noop).
Another approach which I used in the past was to assign sequence numbers after committing. Basically a separate process periodically scans the set of un-sequenced rows, applies any application defined ordering constraints, and writes in SNs to them. This can be surprisingly fast, like tens of thousands of rows per second. In my case, the ordering constraints were simple, basically that for a given key, increasing versions get increasing SNs. But I think you could have more complex constraints, although it might get tricky with batch boundaries
- vbezhenar 18 hours ago
  
  My approach is: select max(id), and commit with id=max(id)+1. If commit worked, then all good. If commit failed because of unique index violation, repeat the transaction from the beginning. I think it should work correctly with proper transaction isolation level.
  - grogers 13 hours ago
    
    That limits you to a few tens of TPS since everything is trying to write the same row which must happen serially. I wouldn't start out with that solution since it'll be painful to change to something more scalable later. Migrating to something better will probably involve more writes per txn during the migration, so it gets even worse before it gets better.
    
    dagss 9 hours ago
    
    The counter in another table used in the article also serializes all writers to the table. Probably better than the max() approach but still serial.
    There needs to be serialization happening somewhere, either by writers or readers waiting for their turn.
    What Kafka "is" in my view is simply the component that assigns sequential event numbers. So if you publish to Kafka, Kafka takes the same locks...
    How to increase throughput is add more shards in a topic.
  - name_nick_sex_m 16 hours ago
    
    Does the additional read query cause concern? Or mostly this is ok? (i'm sure the answer depends on scale)
singron a day ago

The log_counter table tracks this. It's true that a naive solution using sequences does not work for exactly the reason you say.
theK a day ago

> unique monotonically-increasing offset number
Isn't it a bit of a white whale thing that a umion can solve all one's subscriber problems? Afaik even with kafka this isn't completely watertight.
munchbunny 21 hours ago

I have this problem in the system I work on - the short nuance-less answer from my experience is that, once your scale gets large enough, you can't prevent ordering issues entirely and you have to build the resilience into the architecture and the framing of the problem. You often end up paying for consistency with latency.
- dagss 8 hours ago
  
  I think you may be talking past each other. In the approach taken in the article and the parent comment, if the event sequence number allocation of the writer races the reader cursor position in the wrong way, events will NEVER BE DELIVERED.
  So it is a much more serious issue at stake here than event ordering/consistency.
  As it happens, if you use event log tables in SQL "the Kafka way" you actually get guarantee on event ordering too as a side effect, but that is not the primary goal.
  More detailed description of problem:
  https://github.com/vippsas/mssql-changefeed/blob/main/MOTIVA...
name_nick_sex_m 16 hours ago

Funnily enough, I was just designing a queue exactly this way, thanks for catching this. (chat GPT meanwhile was assuring me the approach was airtight)
- 1oooqooq 15 hours ago
  
  you're really trying to vibe architect?
  - name_nick_sex_m 4 hours ago
    
    Gotta make a living somehow
sigseg1v a day ago

What about a `DEFERRABLE INITIALLY DEFERRED` trigger that increments a sequence only on commit?

uberduper a day ago

Has this person actually benchmarked kafka? The results they get with their 96 vcpu setup could be achieved with kafka on the 4 vcpu setup. Their results with PG are absurdly slow.

If you don't need what kafka offers, don't use it. But don't pretend you're on to something with your custom 5k msg/s PG setup.

PeterCorless a day ago

Exactly. Just yesterday someone posted how they can do 250k messages/second with Redpanda (Kafka-compatible implementation) on their laptop.
https://www.youtube.com/watch?v=7CdM1WcuoLc
Getting even less than that throughput on 3x c7i.24xlarge — a total of 288 vCPUs – is bafflingly wasteful.
Just because you can do something with Postgres doesn't mean you should.
> 1. One camp chases buzzwords.
> 2. The other camp chases common sense
In this case, is "Postgres" just being used as a buzzword?
[Disclosure: I work for Redpanda; we provide a Kafka-compatible service.]
- kragen a day ago
  
  This sounded interesting to me, and it looks like the plan is to make Redpanda open-source at some point in the future, but there's no timeline: https://github.com/redpanda-data/redpanda/tree/dev/licenses
  - PeterCorless a day ago
    
    Correct. Redpanda is source-available.
    When you have C++ code, the number of external folks who want to — and who can effectively, actively contribute to the code — drops considerably. Our "cousins in code," ScyllaDB last year announced they were moving to source-available because of the lack of OSS contributors:
    > Moreover, we have been the single significant contributor of the source code. Our ecosystem tools have received a healthy amount of contributions, but not the core database. That makes sense. The ScyllaDB internal implementation is a C++, shard-per-core, future-promise code base that is extremely hard to understand and requires full-time devotion. Thus source-wise, in terms of the code, we operated as a full open-source-first project. However, in reality, we benefitted from this no more than as a source-available project.
    Source: https://www.scylladb.com/2024/12/18/why-were-moving-to-a-sou...
    People still want to get free utility of the source-available code. Less commonly they want be able to see the code to understand it and potentially troubleshoot it. Yet asking for active contribution is, for almost all, a bridge too far.
    
    zozbot234 a day ago
    
    Note that prior to its license change ScyllaDB was using AGPL. This is a fully FLOSS license but may have been viewed nonetheless as somewhat unfriendly by potential outside contributors. The ScyllaDB license change was really more about not wanting to expend development effort on maintaining multiple versions of the code (AGPL licensed and fully proprietary), so they went for sort of a split-the-difference approach where the fully proprietary version was in turn made source-available.
    (Notably, they're not arguing that open source reusers have been "unfair" to them and freeloaded on their effort, which was the key justification many others gave for relicensing their code under non-FLOSS terms.)
    In case anyone here is looking for a fully-FLOSS contender that they may want to perhaps contribute to, there's the interesting project YugabyteDB https://github.com/yugabyte/yugabyte-db
    
    cyphar a day ago
    
    I think AGPL/Proprietary license split and eventual move to proprietary is just a slightly less overt way of the same "freeloader" argument. The intention of the original license was to make the software unpalatable to enterprises unless you buy the proprietary license, and one "benefit" of the move (at least for the bean counters) is that it stops even AGPL-friendly enterprises from being able to use the software freely.
    (Personally, I have no issues with the AGPL and Stallman originally suggested this model to Qt IIRC, so I don't really mind the original split, but that is the modern intent of the strategy.)
    
    kragen a day ago
    
    I think the intention of the original license was to make the software unpalatable to SaaS vendors who want to keep their changes proprietary, not unpalatable to enterprises in general.
    
    cyphar a day ago
    
    You are obviously free to choose to use a proprietary license, that's fine -- but the primary purpose of free licenses has very little to do with contributing code back upstream.
    As a maintainer of several free software projects, there are lots of issues with how projects are structured and user expectations, but I struggle to see how proprietary licenses help with that issue (I can see -- though don't entirely buy -- the argument that they help with certain business models, but that's a completely different topic). To be honest, I have no interest in actively seeking out proprietary software, but I'm certainly in the minority on that one.
    
    kragen a day ago
    
    Right, open source is generally of benefit to users, not to the author, and users do get some of that benefit from being able to see the source. I wouldn't want to look at it myself, though, for legal reasons.
    
    rplnt a day ago
    
    You can be open source and not take contributions. This argument doesn't make sense to me. Just stop doing the expensive part and keep the license as is.
    
    kragen a day ago
    
    I think the argument is that, if they expected to receive high-quality contributions, then they'd be willing to take the risk of competitors using their software to compete with them, which an open-source license would allow. It usually doesn't work out that way; with a strong copyleft license, your competitors are just doing free R&D improving your own product, unless they can convince your customers that they know more about the product than the guys who wrote it in the first place. But that's usually the fear.
    On the other hand, if they don't expect people outside their company to know C++ well enough to contribute usefully, they probably shouldn't expect people outside their company to be able to compete with them either.
    Really, though, the reason to go open-source is because it benefits your customers, not because you get contributions, although you might. (This logic is unconvincing if you fear they'll stop being your customers, of course.)
    
    zX41ZdbW 20 hours ago
    
    The statement is untrue. For example, ClickHouse is in C++, and it has thousands of contributors with hundreds of external contributors every month.
    
    kragen 8 hours ago
    
    I think it's reasonably common for accepting external contributions to an open-source project to be more trouble than it's worth, just because most programmers aren't very good.
- cestith 21 hours ago
  
  Your name sounds familiar. I think you may be one of the people at RedPanda with whom I’ve corresponded. It’s been a few years though, so maybe not.
  A colleague and I (mostly him, but on my advice) worked up a set of patches to accept and emit JSON and YAML in the CLI tool. Our use case at the time was setting things up with a config management system using the already built tool RedPanda provides without dealing with unstructured text.
  We got a lot of good use out of RedPanda at that org. We’ve both moved on to a new employer, though, and the “no offering RedPanda as a service” spooked the company away from trying it without paying for the commercial package. Y’all assured a couple of us that our use case didn’t count as that, but upper management and legal opted to go with Kafka just in case.
- mxey a day ago
  
  Doesn’t Kafka/Redpanda have to fsync for every message?
  - PeterCorless a day ago
    
    Yes, for Redpanda. There's a blog about that:
    "The use of fsync is essential for ensuring data consistency and durability in a replicated system. The post highlights the common misconception that replication alone can eliminate the need for fsync and demonstrates that the loss of unsynchronized data on a single node still can cause global data loss in a replicated non-Byzantine system."
    However, for all that said, Redpanda is still blazingly fast.
    https://www.redpanda.com/blog/why-fsync-is-needed-for-data-s...
    
    uberduper a day ago
    
    I'm highly skeptical of the method employed to simulate unsync'd writes in that example. Using a non-clustered zookeeper and then just shutting it down, breaking the kafka controller and preventing any kafka cluster state management (not just preventing partition leader election) while manually corrupting the log file. Oof. Is it really _that_ hard to lose ack'd data from a kafka cluster that you had to go to such contrived and dubious lengths?
    
    mxey a day ago
    
    > while manually corrupting the log file
    To be fair, since without fsync you don't have any ordering guarantees for your writes, a crash has a good chance of corrupting your data, not just losing recent writes.
    That's why in PostgreSQL it's feasible to disable https://www.postgresql.org/docs/18/runtime-config-wal.html#G... but not to disable https://www.postgresql.org/docs/18/runtime-config-wal.html#G....
    
    jackvanlightly 17 hours ago
    
    We fixed that particular issue: https://jack-vanlightly.com/blog/2023/8/17/kafka-kip-966-fix...
    
    mxey a day ago
    
    I just read the post and didn’t find it contrived at all. The point is to simulate a) network isolation and b) loss of recent writes.
    
    kasey_junk a day ago
    
    Kafka no longer has Zookeeper dependency and RedPanda never did (this is just an aside for those reading along, not a rebuttal).
  - uberduper a day ago
    
    I've never looked at redpanda, but kafka absolutely does not. Kafka uses mmapped files and the page cache to manage durable writes. You can configure it to fsync if you like.
    
    mxey a day ago
    
    If I don’t actually want durable and consistent data, I could also turn off fsync in Postgres …
    
    mrkeen a day ago
    
    The tradeoff here is that Kafka will still work perfectly if one of its instances goes down. (Or you take it down, for upgrades, etc.)
    Can you lose one Postgres instance?
    
    zozbot234 a day ago
    
    AIUI Postgres has high-availability out of the box, so it's not a big deal to "lose" one as long as a secondary can take over.
    
    mxey a day ago
    
    Only replication is built-in, you need to add a cluster manager like Patroni to make it highly-available.
  - kragen a day ago
    
    Definitely not in the case of Kafka. Even with SSD that would limit it to around 100kHz. Batch commit allows Kafka (and Postgres) to amortize fsync overhead over many messages.
  - noselasd 17 hours ago
    
    No, it's for every batch.
  - UltraSane a day ago
    
    On enterprise grade storage writes go to NVRAM buffers before being flushed to persistent storage so this isn't much of a bottleneck.
    
    mxey a day ago
    
    The context was somebody doing this on their laptop.
    
    UltraSane 18 hours ago
    
    I was expanding the context
- kermatt a day ago
  
  To the issue of complexity, is Redpanda suitable as a "single node implementation" where a Kafka cluster is not needed due to data volume, but the Kafka message bus pattern is desired?
  AKA "Medium Data" ?
  - cestith a day ago
    
    Yes. I’ve run projects where it was used that way.
    It also scales to very large clusters.
- j45 a day ago
  
  Is it about what Kafka could get or what you need right now.
  Kafka is a full on steaming solution.
  Postgres isn’t a buzzword. It can be a capable placeholder until it’s outgrown. One can arrive at Kafka with a more informed run history from Postgres.
  - kitd a day ago
    
    > Kafka is a full on steaming solution.
    Freudian slip? ;)
    
    j45 a day ago
    
    Haha, and a typo!
jaimebuelta a day ago

I may be reading a bit extra, but my main take on this is: "in your app, you probably already have PostgreSQL. You don't need to set up an extra piece of infrastructure to cover your extra use case, just reuse the tool you already have"
It's very common to start adding more and more infra for use cases that, while technically can be better cover with new stuff, it can be served by already existing infrastructure, at least until you have proof that you need to grow it.
010101010101 a day ago

> If you don't need what kafka offers, don't use it.
This is literally the point the author is making.
- uberduper a day ago
  
  It seems like their point was to criticize people for using new tech instead of hacking together unscalable solutions with their preferred database.
  - EdwardDiego 8 hours ago
    
    Which is crazy, because Kafka is like olllld compared to competing tech like Pulsar and RedPanda. I'm trying to remember what year I started using v0.8, it was probably mid-late 2010s?
  - blenderob a day ago
    
    That wasn't their point. Instead of posting snarky comments, please review the site guidelines:
    "Please respond to the strongest plausible interpretation of what someone says, not a weaker one that's easier to criticize."
    
    lenkite a day ago
    
    But honestly, isn't that the strongest plausible interpretation according to the "site guidelines" ? When one explicitly says that the one camp chases "buzzwords" and the other chases "common sense", how else are you supposed to interpret it ?
    
    blenderob a day ago
    
    > how else are you supposed to interpret it?
    It's not so hard. You interpret it how it is written. Yes, they say one camp chases buzzwords and another chases common sense. Critique that if you want to. That's fine.
    But what's not written in the OP is some sort of claim that Postgres performs better than Kafka. The opposite is written. The OP acknowledges that Kafka is fast. Right there in the title! What's written is OP's experiments and data that shows Postgres is slow but can be practical for people who don't need Kafka. Honestly I don't see anything bewildering about it. But if you think they're wrong about Postgres being slow but practical that's something nice to talk about. What's not nice is to post snarky comments insinuating that the OP is asking you to design unscalable solutions.
- PeterCorless a day ago
  
  But in this case, it is like saying "You don't need a fuel truck. You can transport 9,000 gallons of gasoline between cities by gathering 9,000 1-gallon milk jugs and filling each, then getting 4,500 volunteers to each carry 2 gallons and walk the entire distance on foot."
  In this case, you do just need a single fuel truck. That's what it was built for. Avoiding using a design-for-purpose tool to achieve the same result actually is wasteful. You don't need 288 cores to achieve 243,000 messages/second. You can do that kind of throughput with a Kafka-compatible service on a laptop.
  [Disclosure: I work for Redpanda]
  - ilkhan4 a day ago
    
    I'll push the metaphor a bit: I think the point is that if you have a fleet of vehicles you want to fuel, go ahead and get a fuel truck and bite off on that expense. However, if you only have 1 or 2, a couple of jerry cans you probably already have + a pickup truck is probably sufficient.
  - kragen a day ago
    
    Getting a 288-core machine might be easier than setting up Kafka; I'm guessing that it would be a couple of weeks of work to learn enough to install Kafka the first time. Installing Postgres is trivial.
    
    brianmcc a day ago
    
    "Lots of the team knows Postgres really well, nobody knows Kafka at all yet" is also an underrated factor in making choices. "Kafka was the ideal technical choice but we screwed up the implementation through well-intentioned inexperience" being an all too plausible outcome.
    
    freedomben a day ago
    
    Indeed, I've seen this happen first hand where there was really only one guy who really "knew" Kafka, and it was too big of a job for just him. In that case it was fine until he left the company, and then it became a massive albatross and a major pain point. In another case, the eng team didn't really have anyone who really "knew" Kafka but used a managed service thinking it would be fine. It was until it wasn't, and switching away is not a light lift, nor is mass educating the dev team.
    Kafka et al definitely have their place, but I think most people would be much better off reaching for a simpler queue system (or for some things, just using Postgres) unless you really need the advanced features.
    
    EdwardDiego 8 hours ago
    
    I'm wondering why there wasn't any push for the Kafka guy to share his knowledge within his team, or to other teams?
    
    EdwardDiego 8 hours ago
    
    Just use Strimzi if you're in a K8s world (disclosure used to work on Strimzi for RH, but I still think it's far better than Helm charts or fully self-managed, and far cheaper than fully managed).
    
    kragen 8 hours ago
    
    Thanks! I didn't know about Strimzi!
    
    EdwardDiego 7 hours ago
    
    Even though I'm a few years on from Red Hat, I still really recommend Strimzi. I think the best way to describe it is "a sorta managed Kafka". It'll make things that are hard in self-managed Kafka (like rolling upgrades) easy as.
    
    PeterCorless a day ago
    
    The only thing that might take "weeks" is procrastination. Presuming absolutely no background other than general data engineering, a decent beginner online course in Kafka (or Redpanda) will run about 1-2 hours.
    You should be able to install within minutes.
    
    kragen a day ago
    
    I mean, setting up Zookeeper, tweaking the kernel settings, configuring the hardware, the kind of stuff mentioned in guides like https://medium.com/@ankurrana/things-nobody-will-tell-you-se... and https://dungeonengineering.com/the-kafkaesque-nightmare-of-m.... Apparently you can do without Zookeeper now, but that's another choice to make, possibly doing careful experiments with both choices to see what's better. Much more discussion in https://news.ycombinator.com/item?id=37036291.
    None of this applies to Redpanda.
    
    PeterCorless a day ago
    
    True. Redpanda does not use Zookeeper.
    Yet to also be fair to the Kafka folks, Zookeeper is no longer default and hasn't been since April 2025 with the release of Apache Kafka 4.0:
    "Kafka 4.0's completed transition to KRaft eliminates ZooKeeper (KIP-500), making clusters easier to operate at any scale."
    Source: https://developer.confluent.io/newsletter/introducing-apache...
    
    kragen a day ago
    
    Right, I was talking about installing Kafka, not installing Redpanda. Redpanda may be perfectly fine software, but bringing it up in that context is a bit apples-and-oranges since it's not open-source: https://news.ycombinator.com/item?id=45748426
    
    EdwardDiego 8 hours ago
    
    Good on you for being fair in this discussion :)
- blenderob a day ago
  
  >> If you don't need what kafka offers, don't use it.
  > This is literally the point the author is making.
  Exactly! I just don't understand why HN invariably always tends to bubble up the most dismissive comments to the top that don't even engage with the actual subject matter of the article!
loire280 a day ago

In fact, a properly-configured Kafka cluster on minimal hardware will saturate its network link before it hits CPU or disk bottlenecks.
- EdwardDiego 8 hours ago
  
  Depends on how you configure the clients, ask me how I know that using a K8s pod id in a consumer group id is a really bad idea - or how setting batch size to 1 and linger to 0 is a really bad idea - the former blows up disk (all those unique consumer groups cause the backing topic to consume a lot of space, as the topic is by default only compacted) and the latter thrashes request handler CPU time.
- theK a day ago
  
  Isn't that true for everything on the cloud? I thought we are long into the era where your disk comes over the network there.
- altcognito a day ago
  
  This doesn't even make sense. How do you know what the network links or the other bottlenecks are like? There are a grandiose number of assumptions being made here.
  - loire280 a day ago
    
    There is a finite and relatively narrow range of ratios of CPU, memory, and network throughput in both modern cloud offerings and bare hardware configurations.
    Obviously it's possible to build, for example, a machine with 2 cores, a 10Gbps network link, and a single HDD that would falsify my statement.
    
    altcognito 13 hours ago
    
    But the workload matters. Even the comment in the article doesn't completely make sense for me in that way -- if your workload is 50 operations per byte transferred versus 5000 operations per byte transferred, there is a considerable difference in hardware requirements.
- j45 a day ago
  
  But it can do so many processes a second I’ll be able to scale to the moon before I ever launch.
- UltraSane a day ago
  
  A network link can be anything from 1Gbps to 800Gbps.
darth_avocado a day ago

The 96 vcpu setup with 24xlarge instance costs about $20k/month on AWS before discounts. And one thing you don’t want in a pub sub system is a single instance taking all the read/writes. You can run a sizeable Kafka cluster for that kind of money in AWS.
ozgrakkurt a day ago

This is why benchmarks should be hardware limit based IMO. Like I am maxing IOPS/throughput of this ssd or maxing out the network card etc.
CPU is more tricky but I’m sure it can be shown somehow
adamtulinius a day ago

I remember doing 900k writes/s (non-replicated) already back on kafka 0.8 with a random physical server with an old fusionio drive (says something about how long ago this was :D).
It's a fair point that if you already have a pgsql setup, and only need a few messages here and there, then pg is fine. But yeah, the 96 vcpu setup is absurd.
blenderob a day ago

> Has this person actually benchmarked kafka?
Is anyone actually reading the full article, or just reacting to the first unimpressive numbers you can find and then jumping on the first dismissive comment you can find here?
Benchmarking Kafka isn't the point here. The author isn't claiming that Postgres outperforms Kafka. The argument is that Postgres can handle modest messaging workloads well enough for teams that don't want the operational complexity of running Kafka.
Yes, the throughput is astoundingly low for such a powerful CPU but that's precisely the point. Now you know how well or how bad Postgres performs on a beefy machine. You don't always need Kafka-level scale. The takeaway is that Postgres can be a practical choice if you already have it in place.
So rather than dismissing it over the first unimpressive number you find, maybe respond to that actual matter of TFA. Where's the line where Postgres stops being "good enough"? That'll be something nice to talk about.
- uberduper a day ago
  
  Then the author should have gone on to discuss not just the implementation they now have to maintain, but also all the client implementations they'll have to keep re-creating for their custom solution. Or they could talk about all the industry standard tools that work with kafka and not their custom implementation.
  Or they could have not mentioned kafka at all and just demonstrated their pub/sub implementation with PG. They could have not tried to make it about the buzzword resume driven engineering people vs. common sense folks such as himself.
- adamtulinius a day ago
  
  The problem is benchmarking on the 96 vcpu server, because at that point the author seems to miss the point of Kafka. That's just a waste of money for that performance.
  - blenderob a day ago
    
    And if the OP hadn't done that, someone here would complain, why couldn't the OP use a larger CPU and test if Postgres performs better? Really, there is no way the OP can win here, can they?
    I'm glad the OP benchmarked on the 96 vCPU server. So now I know how well Postgres performs on a large CPU. Not very well. But if the OP had done their benchmark on a low CPU, I wouldn't have learned this.
    
    cheikhcheikh 19 hours ago
    
    you're missing the point. Postgres performs well on large CPU. Postgres as-used by OP does not and is a waste of money. It's great that he benchmarked for a larger CPU, that's not what people are disputing, they are disputing the ridiculous conclusion.
ljm a day ago

I wonder if OP could have got different results if they implemented a different schema as opposed to mimicking Kafka's setup with the partitions, consumer offsets, etc.
I might well be talking out of my arse but if you're going to implement pub/sub in Postgres, it'd be worth designing around its strengths and going back to basics on event sourcing.
joaohaas a day ago

Had the same thoughts, weird it didn't include Kafka numbers.
Never used Kafka myself, but we extensively use Redis queues with some scripts to ensure persistency, and we hit throughputs much higher than those in equivalent prod machines.
Same for Redis pubsubs, but those are just standard non-persistent pubsubs, so maybe that gives it an upper edge.
roozbeh18 a day ago

Just checked my single node Kafka setup which currently handles 695.27k e/s (average daily) into elasticsearch without breaking a sweat. kafka has been the only stable thing in this whole setup.
zeek -> kafka -> logstash -> elastic
- apetrov a day ago
  
  out of curiosity, what does your service do that it handles almost 700K events/sec?

ownagefool a day ago

The camps are wrong.

There's poles.

1. Is folks constantly adopting the new tech, whatever the motivation, and 2. I learned a thing and shall never learn anything else, ever.

Of course nobody exists actually on either pole, but the closer you are to either, the less pragmatic you are likely to be.

wosined a day ago

I am the third pole: 3. Everything we have currently sucks and what is new will suck for some hitherto unknown reason.
- ownagefool a day ago
  
  Heh, me too.
  I think it's still just 2 poles. However, I probably shouldn't have prescribed motivation to latter pole, as I purposely did not with the former.
  Pole 2 is simply never adopt anything new ever, for whatever the motivation.
- antonvs a day ago
  
  If you choose wisely, things should suck less overall as you move forward. That's kind of the overall goal, otherwise we'd all still be toggling raw machine code into machines using switches.
  - wosined 4 hours ago
    
    Computers got faster, software is not so straightforward. I don't even know why a text webpage needs 100mb of memory to render and display.
jppope 12 hours ago

So 1. RDD 2. Curmudgeon and 3. People who rationally look at the problem and try to solve it in the best way possible (omitted in the article)
binarymax a day ago

This is it right here. My foil is the Elasticsearch replacement because PG has inverted indices. The ergonomics and tunability of these in PG are terrible compared to ES. Yes, it will search, but I wouldn’t want to be involved in constructing or maintaining that search.

jimbokun a day ago

For me the killer feature of Kafka was the ability to set the offset independently for each consumer.

In my company most of our topics need to be consumed by more than one application/team, so this feature is a must have. Also, the ability to move the offset backwards or forwards programmatically has been a life saver many times.

Does Postgres support this functionality for their queues?

Jupe a day ago

Isn't it just a matter of having each consumer use their own offset? I mean if the queue table is sequentially or time-indexed, the consumer just provides a smaller/earlier key to accomplish the offset? (Maybe I'm missing something here?)
- cortesoft 12 hours ago
  
  Kafka allows you to have a consumer group… you can have multiple workers processing messages in parallel, and if they all use the same group id, the messages will be sharded across all the workers using that key… so each message will only be handled by one worker using that key, and every message will be given to exactly one worker (with all the usual caveats of guaranteed-processed-exactly-once queues). Other consumers can use different group keys and they will also get every single message exactly once.
  So if you want an individual offset, then yes, the consumer could just maintain their own… however, if you want a group’s offset, you have to do something else.
- altcognito a day ago
  
  Correct, offsets and sharding aren't magic. And partitions in Kafka are user defined, just like they would be for postgresql.
- jimbokun 20 hours ago
  
  Yes.
  Is a queuing system baked into Postgres? Or there client libraries that make it look like one?
  And do these abstractions allow for arbitrarily moving the offset for each consumer independently?
  If you're writing your own queuing system using pg for persistence obviously you can architect it however you want.
altcognito a day ago

The article basically states unless you need a lot of throughput, you probably don't need Kafka. (my interpretation extends to say) You probably don't need offsets because you don't need multi-threaded support because you don't need multiple threads.
I don't know what kind of native support PG has for queue management, the assumption here is that a basic "kill the task as you see it" is usually good enough and the simplicity of writing and running a script far outweighs the development, infrastructure and devops costs of Kafka.
But obviously, whether you need stuff to happen in 15 seconds instead of 5 minutes, or 5 minutes instead of an hour is a business decision, along with understanding the growth pattern of the workload you happen to have.
- jimbokun 20 hours ago
  
  Well in my workplace we need all of those things.
- j45 a day ago
  
  PG has several queue management extensions and I’m working my way through trying them out.
  Here is one: https://pgmq.github.io/pgmq/
  Some others: https://github.com/dhamaniasad/awesome-postgres
  Most of my professional life I have considered Postgres folks to be pretty smart… while I by chance happened to go with MySQL and it became the rdbms I thought in by default.
  Heavily learning about Postgres recently has been okay, not much different than learning the tweaks for msssl, oracle or others. Just have to be willing to slow down a little for a bit and enjoy it instead of expecting to thrush thru everything.
  - dagss 8 hours ago
    
    pgmq looks cool, thanks for the link!
    But it looks like a queue, which is a fundamentally different data structure from an event log, and Kafka is an event log.
    They are very different usecases; work distribution vs pub/sub.
    The article talks about both usecases, assuming the reader is very familiar with the distinction.

misja111 a day ago

> One camp chases buzzwords .. the other common sense

How is it common sense to try to re-implement Kafka in Posgres? You probably need something similar but more simple. Then implement that! But if you really need something like Kafka, then .. use Kafka!

IMO the author is now making the same mistake as some Kafka evangelists that try to implement a database in Kafka.

enether a day ago

I’m making the example of a pub sub system. I’m most familiar with Kafka so drew parallels to it. I didn’t actually implement everything Kafka offers - just two simple pub sub like queries.

LinXitoW 4 hours ago

Isn't one gigantic advantage with Postgres the ACID part?

It seems to me that the hardest part of going for a MQ/distributed log like Kafka is re-working existing code to now handle the lack of ACID stuff. Things that are trivial with Postgres, like exactly once delivery, are huge undertakings without ACID.

Personally, I don't have much experience with this, so maybe I'm just missing something?

mrkeen 2 hours ago

It is a gigantic advantage! And you are missing something!
ACID is an aspirational ideal - not something that 'just works' if you have a database that calls itself ACID. What ACID promises is essentially "single-threaded thinking will work in a multi-threaded environment."
Here's a list of ways it falls short:
1) Settings: Postgres is 'Read Committed' by default (which is not quite full Isolation). You could change this, but you might not like the resulting performance drop, (and it's probably not up to you unless you're the company DBA or something.)
2) ACID=Single-node-only. Maybe some of the big players (Google?) have worked around this (Spanner?), but for your use case, the scope of a (correct) ACID transaction is essentially what you can stuff into a single SQL string and hand to a single database. It won't span to the frontend, or any partners you have, REST calls, etc. It's definitely useful to be able to make your single node transition all-or-nothing from valid state to valid state, but you still have all your distributed thinking to do (Two generals, CAP, exactly-once-delivery, idempotency, etc.) without help from ACID.
3) You can easily break ACID at the programming language level. Let's say you intend to add 10 to a row. If you do a SELECT, add 10 to the result, and then do an update, your transaction won't do what you intended. If the value was 3 when you read it, all the database will see is you setting the value to 13. I don't know whether the db will throw an exception, or retry writing 13, but neither of those is 'just increment by 10'.
The reason I use Kafka is because it actually helps with distributed systems. We can't beat CAP, but if we want to have AP, we can at least have some 'eventual consistency', that is, your services won't be in exact lockstep from valid-state to valid-state (as a group), but if you give them the same facts, then they can at least end up in the same state. And that's what Kafka's for: you append facts onto it, then each service (which may in fact have its own ACID DB!) can move from valid-state to valid-state (even if external observers can see that one service is ahead of another one).

bmcahren 17 hours ago

A huge benefit of single-database operations at scale is point-in-time recovery for the entire system thereby not having to coordinate recovery points between data stores. Alternatively, you can treat your queue as volatile depending on the purpose.

this_user a day ago

The real two camps seem to be:

1) People constantly chasing the latest technology with no regard for whether it's appropriate for the situation.

2) People constantly trying to shoehorn their favourite technology into everything with no regard for whether it's appropriate for the situation.

PeterCorless a day ago

2) above is basically "Give a kid a hammer, and everything becomes a nail."
The third camp:
3) People who look at a task, then apply a tool appropriate for the task.
j45 a day ago

Kafka is anything but new. It does get shoehorned too.
Postgres also has been around for a long time and a lot of people didn’t know all it can do which isn’t what we normally think about with a database.
Appropriateness is a nice way to look at it as long as it’s clear whether or not it’s about personal preferences and interpretations and being righteous towards others with them.
Customers rarely care about the backend or what it’s developed in, except maybe for developer products. It’s a great way to waste time though.

spectraldrift 11 hours ago

> Should You Use Postgres? Most of the time - yes

This made me wonder about a tangential statistic that would, in all likelihood, be impossible to derive:

If we looked at all database systems running at any given time, what proportion does each technology represent (e.g., Postgres vs. MySQL vs. [your favorite DB])? You could try to measure this in a few ways: bytes written/read, total rows, dollars of revenue served, etc.

It would be very challenging to land on a widely agreeable definition. We'd quickly get into the territory of what counts as a "database" and whether to include file systems, blockchains, or even paper. Still, it makes me wonder. I feel like such a question would be immensely interesting to answer.

Because then we might have a better definition of "most of the time."

abtinf 10 hours ago

SQLite likely dominates all other databases combined on the metrics you mentioned, I would guess by at least an order of magnitude.
Server side. Client side. iOS, iPad, Mac apps. Uses in every field. Uses in aerospace.
Just think for a moment that literally every photo and video taken on every iPhone (and I would assume android as well) ends up stored (either directly or sizable amounts of metadata) in a SQLite db.
- sublimefire 5 hours ago
  
  Yes it seems like it is absent in this discussion but maybe it should have been “it” the whole time as a default option. I wonder if it could attain similar throughput numbers; bet the article would feel slightly sarcastic then though

losvedir a day ago

Maybe I missed it in the design here, but this pseudo-Kafka Postgres implementation doesn't really handle consumer groups very well. The great thing about Kafka consumer groups is it makes it easy to spread the load over several instances running your service. They'll all connect using the same group, and different partitions will be assigned to the different instances. As you scale up or down, the partition responsibilities will be updated accordingly.

You need some sort of server-side logic to manage that, and the consumer heartbeats, and generation tracking, to make sure that only the "correct" instances can actually commit the new offsets. Distributed systems are hard, and Kafka goes through a lot of trouble to ensure that you don't fail to process a message.

mrkeen a day ago

Right, the author's worldview is that Kafka is resume-driven development, used by people "for speed" (even though they are only pushing 500KB/s).
Of course the implementation based off that is going to miss a bit.

johnyzee a day ago

Seems like you would at the very least need a fairly thick application layer on top of Postgres to make it look and act like a messaging system. At that point, seems like you have just built another messaging system.

Unless you're a five man shop where everybody just agrees to use that one table, make sure to manage transactions right, cron job retention, YOLO clustering, etc. etc.

Performance is probably last on the list of reasons to choose Kafka over Postgres.

j45 a day ago

You expose the api on Postgres much like any other group of developers use and call it a day.
There’s several implementations of queues to increase the chance of finishing what one is after. https://github.com/dhamaniasad/awesome-postgres
- dagss 8 hours ago
  
  There's a lot of logic involved client side regarding managing read cursors and marking events as processed consumer side. Possibly also client side error queues and so on.
  I truly miss a good standard client side library following the Kafka-in-SQL philosophy. I started on in my previous job and we used it internally but it never got good enough that it would be widely used elsewhere, and now I work somewhere else...
  (PS: Talking about the pub/sub Kafka-like usecase, not the work queue FOR UPDATE usecase)

brikym 16 hours ago

If you don't mind Redis then use Redis Streams. It gives you an eventlog without worrying about postgres performance issues and has consumer groups.

tele_ski 14 hours ago

Been using valkey streams recently and loving it. Took a bit to understand how to to properly use it but now that I've figured it out I'd highly recommend trying it. It's very easy to setup and get going and just works.

ryandvm a day ago

I think my only complaint about Kafka is the widespread misunderstanding that it is a suitable replacement for a work queue. I should not be having to explain to an enterprise architect the distinction between a distributed work queue and event streaming platform.

lisbbb 19 hours ago

It's not so much that they don't know as it they think Kafka is sexier, or, in my case, it was mandated to use it for everything because they were paying for the cluster. I solved one problem, very flexibly, in Elastic and they weren't even interested at all. It was Kafka or nothing. That's reality in a lot of companies.

qsort a day ago

I feel so seen lol. I work in data engineering and the first paragraph is me all the time. There are a lot of cool technologies (timeseries databases, vector databases, stuff like Synapse on Azure, "lakehouses" etc.) but they are mostly for edge cases.

I'm not saying they're useless, but if I see something like that lying around, it's more likely that someone put it there based on vibes rather than an actual engineering need. Postgres is good enough for OpenAI, chances are it's good enough for you.

sc68cal a day ago

> Postgres doesn’t seem to have any popular libraries for pub-sub9 use cases, so I had to write my own.

Ok so instead of running Kafka, we're going to spend development cycles building our own?

enether a day ago

It would be nice if a library like pgmq got built. Not sure what the demand for that is, but it feels like there may be a niche

dzonga a day ago

what's not spoken about in the above article ?

ease of use. in ruby If I want to use kafka I can use karafka. or redis streams via the redis library. likewise if kafka is too complex to run there's countless alternatives which work as well - hell even 0mq with client libraries.

now with the postgres version I have to write my own stuff which I might not where it's gonna lead me.

postgres is scalable, no one doubts that. but what people forget to mention is the ecosystem around certain tools.

enether 17 hours ago

That's true.
There seems to be two planes of ease of use - the app layer (library) and the infra layer (hosting).
The app layer for Postgres is still in development, so if you currently want to run pub-sub (Kafka) on it, it will be extra work to develop that abstraction.
I hope somebody creates such a library. It's a one-time cost but then will make it easier for everybody.
j45 a day ago

I’m not sure where it says you have to write your own stuff, there seem to be some of queues with libraries.
https://github.com/dhamaniasad/awesome-postgres
There is at least a Python example here.
- dagss 8 hours ago
  
  Work queues is easy.
  It is significantly more work for the client side implementation of event log consumers, which the article also talk about. For instance persisting the client side cursors. And I have not seen widely used standard implementations of those. (I started one myself once but didn't finish.)

dev_l1x_be 4 hours ago

Apples are sweet, I am going to eat an onion.

I love these articles.

> The other camp chases common sense

It is never too late to inject some tribalism into any discussion.

> Trend 1 - the “Small Data” movement.

404

Just perfect.

jjice a day ago

This is a well written addition to the list of articles I need to reference on occasion to keep myself from using something new.

Postgres really is a startup's best friend most of the time. Building a new product that's going to deal with a good bit of reporting that I began to look at OLAP DBs for, but had hesitation to leave PG for it. This kind of seals it for me (and of course the reference to the class "Just Use Postgres for Everything" post helps) that I should Just Use Postgres (R).

On top of being easy to host and already being familiar with it, the resources out there for something like PG are near endless. Plus the team working on it is doing constant good work to make it even more impressive.

j45 a day ago

It’s totally reasonable to start with fewer technologies to do more and then outgrow them.
- sanskarix 9 hours ago
  
  This mindset is criminally underrated in the startup/indie builder world. There's so much pressure to architect for scale you might never reach, or to use "industry standard" stacks that add enormous complexity.
  I've been heads-down building a scheduling tool, and the number of times I've had to talk myself out of over-engineering is embarrassing. "Should I use Kafka for event streaming?" No. "Do I need microservices?" Probably not. "Can Postgres handle this?" Almost certainly yes.
  The real skill is knowing when you've actually outgrown something vs. when you're just pattern-matching what Big Tech does. Most products never get to the scale where these distinctions matter—but they DO die from complexity-induced paralysis.
  What's been your experience with that inflection point where you actually needed to graduate to more complex tooling? How did you know it was time?

honkostani a day ago

Resume driven design, is running into the desert of moores plateau punishing the use of ever more useless abstractions. They get quieter, because their projects keep on dying after the revolutionary tech is introduced and they jump ship.

udave 10 hours ago

I find the distinction between queue and pub sub system quite poor. A pub sub system is just a persistent queue at its core, the only distinction is you have multiple queues for each subscriber, hence multiple readers. everything else stays the same. Ordering is expected to be strict in both cases. The Durability factor is also baked in both systems. On the question of bounded and unbounded queue: does not message queues also spill to disk in order to prevent OOM scenarios?

woile 10 hours ago

There are a few things missing I think.

I think kafka makes easy to create an event driven architecture. This is particularly useful when you have many teams. They are properly isolated from each other.

And with many teams, another problem comes, there's no guarantee that queries are gonna be properly written, then postgres' performance may be hindered.

Given this, I think using Kafka in companies with many teams can be useful, even if the data they move is not insanely big.

asah 8 hours ago

"500 KB/s workload should not use Kafka" - yyyy!!! indeed, I'm running 5MBps logging system through a single node RDS instance costing <$1000/mon (plus 2x for failover). There's easily 4-10x headroom for growth by paying AWS more money and 3-5x+ savings by optimizing the data structure.

EdwardDiego 8 hours ago

I've always said, don't even think about Kafka until you're into MiB/s territory.
It's a complex piece of software that solves a complex problem, but there's many trade-offs, so only use it when you need to.

jeeybee 17 hours ago

If you like the “use Postgres until it breaks” approach, there’s a middle ground between hand-rolling and running Kafka/Redis/Rabbit: PGQueuer.

PGQueuer is a small Python library that turns Postgres into a durable job queue using the same primitives discussed here — `FOR UPDATE SKIP LOCKED` for safe concurrent dequeue and `LISTEN/NOTIFY` to wake workers without tight polling. It’s for background jobs (not a Kafka replacement), and it shines when your app already depends on Postgres.

Nice-to-haves without extra infra: per-entrypoint concurrency limits, retries/backoff, scheduling (cron-like), graceful shutdown, simple CLI install/migrations. If/when you truly outgrow it, you can move to Kafka with a clearer picture of your needs.

Repo: https://github.com/janbjorge/pgqueuer

Disclosure: I maintain PGQueuer.

loftsy a day ago

I am about to start a project. I know I want an event sourced architecture. That is, the system is designed around a queue, all actors push/pull into the queue. This article gives me some pause.

Performance isn't a big deal for me. I had assumed that Kafka would give me things like decoupling, retry, dead-lettering, logging, schema validation, schema versioning, exactly once processing.

I like Postgres, and obviously I can write a queue ontop of it, but it seems like quite a lot of effort?

singron a day ago

Kafka also doesn't give you all those things. E.g. there is no automatic dead-lettering, so a consumer that throws an exception will endlessly retry and block all progress on that partition. Kafka only stores bytes, so schema is up to you. Exactly-once is good, but there are some caveats (you have to use kafka transactions, which are significantly different than normal operation, and any external system may observe at-least-once semantics instead). Similar exactly-once semantics would also be trivial in an RDBMS (i.e. produce and consume in same transaction).
If you plan on retaining your topics indefinitely, schema evolution can become painful since you can't update existing records. Changing the number of partitions in a topic is also painful, and choosing the number initially is a difficult choice. You might want to build your own infrastructure for rewriting a topic and directing new writes to the new topic without duplication.
Kafka isn't really a replacement for a database or anything high-level like a ledger. It's really a replicated log, which is a low-level primitive that will take significant work to build into something else.
- loftsy 19 hours ago
  
  Very interesting.
  I need a durable queue but not indefinitely. Max a couple of hours.
  What I want is Google PubSub but open source so I can self host.
  - stackskipton 15 hours ago
    
    Small size, Beanstalkd (https://beanstalkd.github.io/) can get you pretty far.
    Larger, RabbitMQ can handle some pretty good workloads.
- oulipo2 a day ago
  
  I want to rewrite some of my setup, we're doing IoT, and I was planning on
  MQTT -> Redpanda (for message logs and replay, etc) -> Postgres/Timescaledb (for data) + S3 (for archive)
  (and possibly Flink/RisingWave/Arroyo somewhere in order to do some alerting/incrementally updated materialized views/ etc)
  this seems "simple enough" (but I don't have any experience with Redpanda) but is indeed one more moving part compared to MQTT -> Postgres (as a queue) -> Postgres/Timescaledb + S3
  Questions:
  1. my "fear" would be that if I use the same Postgres for the queue and for my business database, the "message ingestion" part could block the "business" part sometimes (locks, etc)? Also perhaps when I want to update the schema of my database and not "stop" the inflow of messages, not sure if this would be easy?
  2. also that since it would write messages in the queue and then delete them, there would be a lot of GC/Vacuuming to do, compared to my business database which is mostly append-only?
  3. and if I split the "Postgres queue" from "Postgres database" as two different processes, of course I have "one less tech to learn", but I still have to get used to pgmq, integrate it, etc, is that really much easier than adding Redpanda?
  4. I guess most Postgres queues are also "simple" and don't provide "fanout" for multiple things (eg I want to take one of my IoT message, clean it up, store it in my timescaledb, and also archive it to S3, and also run an alert detector on it, etc)
  What would be the recommendation?
rileymichael a day ago

if you need a durable log (which it sounds like you do for if you're going with event sourcing) that has those features, i'd suggest apache pulsar. you effectively get streams with message queue semantics (per-message acks, retries, dlq, etc.) from one system. it supports many different 'subscription types', so you can use it for a bunch of different use cases. running it on your own is a bit of a beast though and there's really only one hosted provider in the game (streamnative)
note that kafka has recently started investing into 'queues' in KIP-932, but they're still a long way off from implementing all of those features.
- olavgg 7 hours ago
  
  A standalone Pulsar, is actually a great way to learn Pulsar. It is one command to get started: bin/pulsar standalone
  It can also be used in production. You do not have to build a distributed Pulsar cluster immediately. I have multiple projects running on a standalone Pulsar cluster, because its easy to setup and requires almost no maintenance. Doing it that way makes compliance requirements for isolation simpler and with less fights. Everyone understands host/vm isolation, few understand Pulsar Tenant isolation.
  If you want a distributed Apache Pulsar cluster, be prepared to work for that. We run a cluster on bare metal. We considered Kubernetes, but performance was lacking. We are not Kubernetes experts.
munchbunny 20 hours ago

> I had assumed that Kafka would give me things like decoupling, retry, dead-lettering, logging, schema validation, schema versioning, exactly once processing.
If you don't need a lot of perf but you place a premium on ergonomics and correctness, this sounds more like you want a workflow engine? https://github.com/meirwah/awesome-workflow-engines
- loftsy 19 hours ago
  
  Perhaps I do. I know that I don't want a system defined as a graph in yaml. Or no code. These options are over engineered for my use case. I'm pretty comfortable building some docker containers and operating them and this is the approach I want to use.
  I'm checking out the list.
- lisbbb 19 hours ago
  
  One thing I learned with Kafka and Cassandra is that you are locked in to a design pretty early on. Then the business changes their mind and it take a great deal of re-work and then they're accusing you of being incompetent because they are used to SQL projects that have way more flexibility.
mrkeen a day ago

Event-sourcing != queue.
Event-sourcing is when you buy something and get a receipt, you go stick it in a shoe-box for tax time.
A queue is you get given receipts, and you look at them in the correct order before throwing each one away.
- loftsy 19 hours ago
  
  True.
  I think my system is sort of both. I want to put some events in a queue for a finite set of time, process them as a single consolidated set, and then drop them all from the queue.
mkozlows a day ago

If what you want is a queue, Kafka might be overkill for your needs. It's a great tool, but it definitely has a lot of complexity relative to a straightforward queue system.
whalesalad a day ago

If you build it right, the underlying storage engine for your event stream should be swappable for any other event stream tech. Could be SQLite, PSQL, Kafka, Kinesis, SQS, Rabbit, Redis ... really anything can serve this need. The right tool will appear once you dial in your architecture. Treat storage as a black box API that has "push", "pop" etc commands. When your initial engine falls over, switch to a new one and expose that same API.
The bigger question to ask is: will this storage engine be used to persist and retain data forever (like a database) or will it be used more for temporary transit of data from one spot to another.
j45 a day ago

It might look like a lot of effort, but if you follow a tutorial/YouTube video step by step you will be surprised.
It’s mostly registering the Postgres database functions which is one time.
There are also pre-made Postgres extensions that already run the queue.
These days i would like consider m starting with Supabase self hosted which has the Postgres ready to tweak.

jdboyd a day ago

While I appreciate the Postgres for everything point of view, and most of the times I use other things it could fit in Postgres, there are two areas that keep me using RabbitMQ, Redis, or a something like Elastic.

First, I frequently use Celery and Celery doesn't support using Postgres as a broker. It seems like it should, but I guess no one has stepped up to write that. So, when I use Celery, I end up also using Redis or RabbitMQ.

Second, if I need mqtt clients coming in from the internet at large, I don't feel comfortable exposing Postgres to that. Also, I'd rather use the mqtt ecosystem of libraries rather than having all of those devices talk Postgres directly.

Third, sometimes I want a size constrained memory only database or a database that automatically expires untouched records, and for either of those I usually use Redis. For these two tasks I use Redis. I imagine that it would be worth making a reusable set of stored procedures to accomplish the auto-expiring of unused records, but I haven't implemented it. I have no idea how to make Postgres be memory memory only with a constrained memory side.

dangoodmanUT a day ago

96 cores to get 240MB/s is terrible. Redpanda can do this with like one or two cores

greenavocado a day ago

Redpanda might be good (I don't know) but I threw up a little in my mouth when I opened their website and saw "Build the Agentic Data Plane"
- umanwizard a day ago
  
  The marketing website of every data-related startup sounds like that now. I agree it’s dumb, but you can safely ignore it.
enether 17 hours ago

hehe, yeah it is. I could have probably got a GB/s out of that if I ran it properly - but it's at the scale where you expect it to be terrible due to the mismatch of workloads

nchmy 17 hours ago

Seems like instead of a hand-rolled, polling Pub/sub, could instead do CDC instead with a golang logical replication/cdc library. There's surely various.

Or just use NATS for queues and pubsub - dead simple, can embed in your Go app and does much more than Kafka

0xDEAFBEAD 7 hours ago

Why does it matter how many distinct tools you use? It seems easiest to just always use the most standard tool in the most standard way, to minimize the amount of custom code you have to write.

shikhar a day ago

Postgres is a way better fit than Kafka if you want a large number of durable streams. But a flexible OLTP database like PG is bound to require more resources and polling loops (not even long poll!) are not a great answer for following live updates.

Plug: If you need granular, durable streams in a serverless context, check out s2.dev

dagss 8 hours ago

s2.dev looks cool... I jumped around the home page a bit and couldn't perfectly grasp what it is quickly though. But if it is about decoupling the Kafka approach and client side libraries from the use of Kafka specifically I am cheering for you.
Could you see using the s2.dev protocol on top of services using SQL in the way of the article, assigning event sequence numbers, as a good fit? Or is s2 fundamentally the component that assigns event numbers?
I feel like we tried to do something similar to you, but for SQL DBs, but am not sure:
https://github.com/vippsas/feedapi-spec

suyash 5 hours ago

Postgres isn't ideal, you need a timeseries database for streaming data.

nyrikki 21 hours ago

> The claim isn’t that Postgres is functionally equivalent to any of these specialized systems. The claim is that it handles 80%+ of their use cases with 20% of the development effort. (Pareto Principle)

Lots of us that built systems when SQL was the only option, know that doesn’t hold overtime.

SStable backed systems have their applications, and I have never seen dedicated Kafka teams like we used to have with DBAs

We have the tools to make decisions based on real tradeoffs.

I highly recommend people dig into the appropriate tools to select vs making pre-selected products fit an unknown problem domain.

Tools are tactics, not strategies, tactics should be changeable with the strategic needs.

phendrenad2 a day ago

Since everyone is offering what they think the "camps" should be, here's another perspective. There are two camps: (A) Those who look at performance metrics ("96 cores to get 240MB/s is terrible") and assume that performance itself is enough to justify overruling any other concern (B) Those who look at all of the tradeoffs, including budget, maintenance, ease-of-use, etc.

You see this a lot in the tech world. "Why would you use Python, Python is slow" (objectively true, but does it matter for your high-value SaaS that gets 20 logins per day?)

jasonthorsness a day ago

Using a single DBMS for many purposes because it is so flexible and “already there” from an operations perspective is something I’ve seen over and over again. It usually goes wrong eventually with one workload/use screwing up others but maybe that’s fine and a normal part of scaling?

I think a bigger issue is the DBMS themselves getting feature after feature and becoming bloated and unfocused. Add the thing to Postgres because it is convenient! At least Postgres has a decent plugin approach. But I think more use cases might be served by standalone products than by add-ons.

quaunaut a day ago

It's a normal part of scaling because often bringing in the new technology introduces its own ways of causing the exact same problems. Often they're difficult to integrate into automated tests so folks mock them out, leading to issues. Or a configuration difference between prod/local introduces a problem.
Your DB on the other hand is usually a well-understood part of your system, and while scaling issues like that can cause problems, they're often fairly easy to predict- just unfortunate on timing. This means that while they'll disrupt, they're usually solved quickly, which you can't always say for additional systems.

Copenjin a day ago

I'm not really convinced by the comment on NOTIFY instead of the inferior (at least in theory) polling, I expect the global queue if it's really global to be only a temporary location to collect notifications before sending them and not a bottleneck. Never did any benchmark with PG or Oracle (that has a similar feature) but I expect that depending on the polling frequency and average amount of updates each solution could be the best depending on the circumstances.

tarun_anand 14 hours ago

Couldn't agree more. Have built and ran an in-house postgresql based queue for several years. It can handle 5-10k msg/s in our production workloads.

Sparkyte a day ago

You can also use Redis as a queue if the data isn't in danger of being too important.

joaohaas a day ago

Even if the data is important, you can enable WAL and make sure the worker/consumer gets items by RPOPLPUSHing to a working queue. This way you can easily requeue the data if the worker ever goes offline mid-process.
- Sparkyte 20 hours ago
  
  Very true.

8cvor6j844qw_d6 a day ago

> Should You Use Postgres?

> Most of the time - yes. You should always default to Postgres until the constraints prove you wrong.

Interesting.

I've also been by my seniors that I should go with PostgreSQL by default unless I have a good justification not to.

bleonard a day ago

I am excited about the Rails defaults where background and cache and sockets are all database driven. For normal-sized projects that still need those things, it's a huge win in simplicity.

rudderdev a day ago

Discussion on the same topic "Postgres over Kafka" - https://news.ycombinator.com/item?id=44445841

mbo 20 hours ago

This is an article in desperate need for some data visualizations. I do not think it does an effective job of communicating differences in performance.

guywithahat a day ago

> One camp chases buzzwords

> ...

> The other camp chases common sense

I don't really like these simplifications. Like one group obviously isn't just dumb, they're doing things for reasons you maybe don't understand. I don't know enough about data science to make a call, but I'm guessing there were reasons to use Kafka due to current hardware limits or scalability concerns, and while the issues may not be as present today that doesn't mean they used Kafka just because they heard a new word and wanted to repeat it.

sumtechguy a day ago

Kafka and other message systems like it have their uses. But sometimes all you need is just need a database. Now you start doing realtime streaming and notifications and event type things a messaging system is good. You can even back it up with a boring database. Would I start with kafka? Probably not. I would start with a boring databsee and then if if my bashing on the db over and over saying 'have you changed' doesnt work as good anymore then you put in a messaging system.
temporallobe a day ago

Agree with this sentiment - it’s easy to be judgmental about these things, but project-level issues and decisions can be very complicated and engineers often have little to no visibility into them. We’re using Kafka for a gigantic pipeline where IMO any reasonably modern database would suffice (and may even be superior), but our performance requirements are unclear. At some point in the distant future, we may have a significant surge in data quantity and speed, requiring greater throughput and (de)serialization speed, but I am not convinced that Kafka ultimately helps us there. I imagine this is a case where the program leadership was sold a solution which we are now obligated to use. This happens a LOT, and I have seen unnecessary and unused products cost companies millions over the years. For example, my team was doing analysis on replacing our existing Atlassian Data Center with other solutions, and in doing so, we discovered several underused/unused Atlassian plugins for which we are paying very high license fees. At some point, users over the years had requested some functionality for a specific workflow and the plugins were purchased. The people and projects went away or otherwise processes became OBE, but the plugins happily hummed along while the bills were paid.

wagwang a day ago

Isn't listen/notify absurdly slow and lock contentious

lmm 10 hours ago

If Kakfa had come first, no-one would ever pick Postgres. Yes, it offers a lot of fancy functionality. But most of that functionality is overengineered stuff you don't need, and/or causes more problems than it solves (e.g. transactions sound great until you have to deal with the deadlocks and realise they don't actually help you solve any business problems). Meanwhile with no true master-master HA in the base system you have to use a single point of failure server or a flaky (and probably expensive) third-party addon.

Just use Kafka. Even if you don't need speed or scalability, it's reliable, resilient, simple and well-factored, and gives you far fewer opportunities to architect your system wrong and paint yourself into a corner than Postgres does.

ayongpm a day ago

Just dropping this here casually:

  sup {
      position: relative;
      top: -0.4em;
      line-height: 0;
      vertical-align: baseline;
  }

jackvanlightly 17 hours ago

> A 500 KB/s workload should not use Kafka

This is a simplistic take. Kafka isn't just about scale, it, like other messaging systems provide queue/streaming semantics for applications. Sure you can roll your own queue on a database for small use cases, but it adds complexity to the lives of developers. You can offload the burden of running Kafka by choosing a Kafka-as-a-service vendor, but you can't offload the additional work of the developer that comes from using a database as a queue.

enether 17 hours ago

The question is the organizational overhead in adopting yet another specialized distributed system, which btw frequently is about scalability at its core. Kafka's original paper emphasizes this ("We introduce Kafka, a distributed messaging system that we developed for collecting and delivering high volumes of log data with low latency. ", "We made quite a few unconventional yet practical design choices in Kafka to make our system efficient and scalable.")[1]
To be honest, there isn't a large burden in running Kafka when it's 500 KB/s. The system is so underutilized there's nothing to cause issues with it. But regardless, the organizational burden persists. As the piece mentions - "Managed SaaS offerings trade off some of the organizational overhead for greater financial costs - but they still don’t remove it all.". Some of the burden continues to exist even if a vendor hosts the servers for you. The API needs to be adopted, the clients have many configs, concepts like consumer groups need to be understood, the vendor has its own UI, etc.
The Kafka API isn't exactly the simplest. I wouldn't recommend people write the pub-sub-on-postgres SQL themselves - a library should abstract it away. What is the complexity being added from a library with a simple API? Regardless if that library is based on top of Postgres, Kafka or another system - precisely what complexity is added to the lives of developers?
I really don't see any complexity existing at this miniscule scale, neither at the app developer layer or the infra operator layer. But of course, I haven't run this in production so I could be wrong.
[1] - https://notes.stephenholiday.com/Kafka.pdf
cyanf 17 hours ago

There are existing solutions for queues in Postgres, notably pgmq.

sherinjosephroy 5 hours ago

Good reminder: if your message load is modest, sticking with something you know (like Postgres) might be wiser than going full-Kafka. Complexity adds cost, and you only need big guns when you're really under fire.

CuriouslyC a day ago

If you don't need all the bells and whistles of Kafka, NATS Jetstream is usually the way to go.

heyitsdaad a day ago

If the only tool you know is a hammer, everything starts looking like a nail.

odie5533 a day ago

How fast is failover?

smoyer 14 hours ago

Kafka is fast ... And MongoDB is web scale [0]. I completely agree that we shouldn't go chasing each new technical bauble but we are also wasting breath on those that do.

0. https://youtu.be/b2F-DItXtZs?si=vrB-UxCHIgMYGKFt

psadri a day ago

A resource that would benefit the entire community is a set of ballpark figures for what kind of performance is "normal" given a particular hardware + data volume. I know this is a hard problem because there is so much variation across workloads, but I think even order of magnitude ballparks would be useful. For example, it could say things like:

task: msg queue

software: kafka

hardware: m7i.xlarge (vCPUs: 4 Memory: 16 GiB)

payload: 2kb / msg

possible performance: ### - #### msgs / second

etc…

So many times I've found myself wondering: is this thing behaving within an order of magnitude of a correctly setup version so that I can decide whether I should leave it alone or spend more time on it.

aussieguy1234 10 hours ago

I've found Kafka to be not particularly great with languages other than Java, if Confluent schemaregisty is involved.

I had fun working with the schema registy from TypeScript.

rjurney 19 hours ago

One bad message in a Kafka queue and guess what? The entire queue is down because it kills your workers over and over. To fix it? You have to resize the queue to zero, which means losing requests. This KILLS me. Jay Kreps says there is no reason it can't be fixed, but it never had been and this infuriates me because it happens so often :)

pram 15 hours ago

You can modify a consumer groups offset to any value JFYI, so you really don’t need to purge the topic. You can just start after the bad message.

me551ah a day ago

Imagine if historic humans had decided that only hammers are enough. That there is no need for a specialized tool like Scissors, Chisel, Axe, Wrench, Shovel , Sickle and that a hammer and fingers are enough.

Use the tool which is appropriate for the job, it is trivial to write code to use them with LLMs these days and these software are mature enough to rarely cause problems and tools built for a purpose will always be more performant.

cpursley a day ago

Related: https://www.pgflow.dev

It's built on pgmq and not married to supabase (nearly everything is in the database).

Postgres is enough.

lisbbb 19 hours ago

If you are doing high volume, there is no way that a SQL db is going to keep up. I did a lot of work with Kafka but what we constantly ran into was managing expectations--costs were higher, so the business needs to strongly justify why they need their big data toy, and joins are much harder, as well as data validation in real time. It made for a frustrating experience most of the time--not due to the tech as much as dealing with people who don't understand the costs and benefits.

On the major projects I worked on, we were "instructed" to use Kafka for, I guess, internal political reasons. They already had Hadoop solutions that more or less worked, but the code was written by idiots in "Spark/Scala" (their favorite buzzword to act all high and mighty) and that code had zero tests (it was truly a "test in prod" situation there). The Hadoop system was managed by people who would parcel out compute resources politically, as in, their friends got all they wanted while everyone else got basically none. This was a major S&P company, Fortune 10, and the internal politics were abusive to say the least.

oulipo2 a day ago

I want to rewrite some of my setup, we're doing IoT, and I was planning on

MQTT -> Redpanda (for message logs and replay, etc) -> Postgres/Timescaledb (for data) + S3 (for archive)

(and possibly Flink/RisingWave/Arroyo somewhere in order to do some alerting/incrementally updated materialized views/ etc)

this seems "simple enough" (but I don't have any experience with Redpanda) but is indeed one more moving part compared to MQTT -> Postgres (as a queue) -> Postgres/Timescaledb + S3

Questions:

1. my "fear" would be that if I use the same Postgres for the queue and for my business database, the "message ingestion" part could block the "business" part sometimes (locks, etc)? Also perhaps when I want to update the schema of my database and not "stop" the inflow of messages, not sure if this would be easy?

2. also that since it would write messages in the queue and then delete them, there would be a lot of GC/Vacuuming to do, compared to my business database which is mostly append-only?

3. and if I split the "Postgres queue" from "Postgres database" as two different processes, of course I have "one less tech to learn", but I still have to get used to pgmq, integrate it, etc, is that really much easier than adding Redpanda?

4. I guess most Postgres queues are also "simple" and don't provide "fanout" for multiple things (eg I want to take one of my IoT message, clean it up, store it in my timescaledb, and also archive it to S3, and also run an alert detector on it, etc)

What would be the recommendation?

justinhj a day ago

As engineers we should try to use the right tool for the job, which means thinking about the development team's strengths and weaknesses as well as differentiating factors your product should focus on. Often we are working in the cloud and it's much easier to use a queue or a log database service than manage a bunch of sql servers and custom logic. It can be more cost effective too once you factor in the development time and operational costs.

The fact that there is no common library that implements the authors strategy is a good sign that there is not much demand for this.

zer00eyz a day ago

> Should You Use Postgres? Most of the time - yes. You should always default to Postgres until the constraints prove you wrong.

Kafka, GraphQL... These are the two technology's where my first question is always this: Does the person who championed/lead this project still work here?

The answer is almost always "no, they got a new job after we launched".

Resume Architecture is a real thing. Meanwhile the people left behind have to deal with a monster...

bencyoung a day ago

Kafka is great tech, never sure why people have an issue with it. Would I use it all the time? No, but where it's useful, it's really useful, and opens up whole patterns that are hard to implement other ways
- evantbyrne a day ago
  
  Managed hosting is expensive to operate and self-managing kafka is a job in of itself. At my last employer they were spending six figures to run three low volume clusters before I did some work to get them off some enterprise features, which halved the cost, but it was still at least 5x the cost of running a mainstream queue. Don't use kafka if you just need queuing.
  - CuriouslyC a day ago
    
    I always push people to start with NATS jetstream unless I 100% know they won't be able to live without Kafka features. It's performant and low ops.
  - bencyoung a day ago
    
    Cheapest MSK cluster is $100 a month and can easily run a dev/uat cluster with thousands of messages a second. They go up from there but we've made a lot of use of these and they are pretty useful
    
    singron a day ago
    
    I've basically never had a problem with MSK brokers. The issue has usually been "why are we rebalancing?" and "why aren't we consuming?", i.e. client problems.
    
    evantbyrne a day ago
    
    It's not the dev box with zero integrations/storage that's expensive. AWS was quoting us similar numbers for MSK. Part of the issue is that modern kafka has become synonymous with Confluent, and once you buy into those features, it is very difficult to go back. If you're already on AWS and just need queuing, start with SQS.
  - j45 a day ago
    
    Engaging difficulty is a form of procrastination and avoiding stoking a product in some cases.
    Instead of not knowing 1 thing to launch.. let’s pick as many new to us things, that will increase the chances of success.
- bonesss a day ago
  
  Kafka also provides early architectural scaffolding for multiple teams to build in parallel with predictable outcomes (in addition to the categorical answers to hard/error-prone patterns). It’s been adopted in principle by the services on, and is offered turn-key by, all the major cloud providers.
  Personally I’d expect some kind of internal interface to abstract away and develop reusable components for such an external dependency, which readily enables having relational data stores mirroring the brokers functionality. Handy for testing and some specific local scenarios, and those database backed stores can easily pull from the main cluster(s) later to mirror data as needed.
janwijbrand a day ago

"resume" as in "resumé" not as in "begin again or continue after a pause or interruption" - it took me longer than I care to admit to get that.
Groxx a day ago

having never hosted a GraphQL service, but I can see many obvious room for problems:
is there some reason GraphQL gets so much hate? it always feels to me like it's mostly just a normal RPC system but with some incredibly useful features (pipelining, and super easy to not request data you don't need), with obvious perf issues in code and obvious room for perf abuse because it's easy to allow callers to do N+1 nonsense.
so I can see why it's not popular to get stuck with for public APIs unless you have infinite money, it's relatively wide open for abuse, but private seems pretty useful because you can just smack the people abusing it. or is it more due to specific frameworks being frustrating, or stuff like costly parsing and serialization and difficult validation?
- twodave a day ago
  
  As someone who works with GraphQL daily, many of the criticisms out there are from before the times of persisted queries, query cost limits, and composite schemas. It’s a very mature and useful technology. I agree with it maybe being less suitable for a public API, but less because of possible abuse and more because simple HTTP is a lot more widely known. It depends on the context, as in all things, of course.
  - Groxx a day ago
    
    yeah, I took one look at it and said "great, so add some cost tracking and kill requests before they exceed it" because like. obviously. it's similar to exposing a SQL endpoint: you need to build for that up front or the obvious results will happen.
    which I fully understand is more work than "it's super easy just X" which it gets presented as, but that's always the cost of super flexible things. does graphql (or the ecosystem, as that's part of daily life of using it) make that substantially worse somehow? because I've dealt with people using protobuf to avoid graphql, then trying to reimplement parts of its features, and the resulting API is always an utter abomination.
- marcosdumay a day ago
  
  Take a look on how to implement access control over GraphQL requests. It's useless for anything that isn't public data (at least public for your entire network).
  And yes, you don't want to use it for public APIs. But if you have private APIs that are so complex that you need a query language, and still want use those over web services, you are very likely doing something really wrong.
  - Groxx a day ago
    
    I'm honestly not seeing much here that isn't identical to almost all other general purpose RPC systems: https://graphql.org/learn/authorization/
    "check that the user matches the data they're requesting by comparing the context and request field by hand" is ultra common - there are some real benefits to having authorization baked into the language, but it seems very rare in practice (which is part of why it's often flawed, but following the overwhelming standard is hardly graphql's mistake imo). I'd personally think capabilities are a better model for this, but that seems likely pretty easy to chain along via headers?
    
    marcosdumay 16 hours ago
    
    > identical to almost all other general purpose RPC systems
    The problem is that GraphQL doesn't behave like all other general purpose RPC systems. As a rule, authorization does not work on the same abstraction level as GraphQL.
    And that explanation you quoted is disingenuous, because GraphQL middleware and libraries don't usually export places where you can do anything by hand.
forgetfulness a day ago

We’re all passing through our jobs, the value of the solutions remains in the hands of the shareholders, if you don’t try to squeeze some long-term value for your resume and long-term employability, you’re assuming a significant opportunity cost on their behalf
They’ll be fine if you made something that works, even if it was a bit faddish, make sure you take care of yourself along the way (they won’t)
- candiddevmike a day ago
  
  Attitudes like this are why management treats developers like children who constantly need to be kept on task, IMO.
  - forgetfulness a day ago
    
    Software is a line of work that has astounding amounts of autonomy, if you compare it to working in almost anything else.
    My point stands, company loyalty tallies up to very little when you’re looking for your next job; no interviewer will care much to hear of how you stood firm, and ignored the siren song of tech and practices that were more modern than the one you were handed down (the tech and practices they’re hiring for).
    The moment that reverses, I will start advising people not to skill up, as it will look bad in their resumes.
darkstar_16 a day ago

GraphQL sure, but I'm not sure I'd put kafka in the same bucket. It is a nice technology that has it's use in some cases, where postgresql would not work. It is also something a small team should not start with. Start with postgres and then move on to something else when the need arises.
sitestable a day ago

The best architecture decision is the one that's still maintainable when the person who championed it leaves. Always pretend the person who maintains a project after you knows where you live and all that.
kvdveer a day ago

To be fair, this is true for all technologically interesting solutions, even when they use postgres. People championing novel solutions typically leave after the window for creativity has closed.

sneilan1 a day ago

I'm starting to like mongodb a lot more given the python library mongomock. I find it wonderful to create tests that run my queries against mongo in code before I deploy them. Yes, mongo has a lot of quirks and you have to know aws networking to set it up with your vpc so you don't get nailed with egress costs. And it's not the same query patterns and some queries are harder and you have maintain your own schemas. But the ability to test mongo code with mongomock w/o having to run your own mongo server is SO VALUABLE. And yes, there are edge cases with mongomock not supporting something but the library is open source and pretty easy to modify. And it fails loudly which is super helpful. So if something is not supported you'll know. Maybe you might find a real nasty feature that's hard to implement but then just use a repository pattern like you would for testing postgres code in your application.

https://github.com/mongomock/mongomock Extrapolating from my personal usage of this library to others, I'm starting to think that mongodb's 25 billion dollar valuation is partially based on this open source package :)

candiddevmike a day ago

Curious why you think the risk of edge cases from mocking is a worthwhile trade off vs the relatively low complexity of setting up a container to test against?
- sneilan1 a day ago
  
  Because I can read the mongomock library and understand exactly what it's doing. And mongo's aggregation pipelines are easier to model than sql queries in code. Sure, it's possible to run into an edge case but for a lot of general queries for filtering & aggregation, it's just fine.
- sneilan1 21 hours ago
  
  The other unspoken aspect of this is with agentic coding, the ability to have the ai also test queries quickly is very valuable. In a non-agentic coding setup, mongomock would not be as useful.
philipallstar a day ago

You can also do this with sqlite, running an in-memory sqlite is lightning fast and I don't think there are any edge cases. Obviously doesn't work for everything, but when sqlite is possible, it's great!
- sneilan1 a day ago
  
  True but if you wind up using parts of postgres that aren't supported by sqlite then it's harder to use sqlite. I agree however, if I was able to just use sqlite, I would do that instead. But I'm using a lot of postgres extensions & fields that don't have direct mappings to sqlite.
  Otherwise SQLITE :)
j45 a day ago

That might work for some.
I prefer not to start with a nosql database and then undertake odysseys to make it into a relational database.
- sneilan1 18 hours ago
  
  This is the way.
pphysch a day ago

Or just use devcontainers and have an actual Postgres DB to test against? I've even done this on a Chromebook. This is a solved problem.
- sneilan1 a day ago
  
  True but then my tests take longer to run. I really like having very fast tests. And then my tests have to make local network calls to a postgres server. I like my tests isolated.
  - pphysch a day ago
    
    They are isolated, your devcontainer config can live in your source repo. And you're not gonna see significant latency from your loopback interface... If your test suite includes billions of queries you may want to reassess.
    
    sneilan1 18 hours ago
    
    You know what, you have a very good point. I'll give this another shot. Maybe it can be fast enough and I can just isolate the orm queries to some kind of repository pattern so I'm not testing sql queries over and over.