Master the Art of Data Streamlining with Pub/Sub Pipelines

Rahul Kumar
Jun 13, 2024
2 min read

Updated: Sep 14, 2024

Data pipeline using pub/sub — Data pipeline

Creating a pipeline that requires sending some events to be processed in the background keeping scalability in mind is one of the key requirements for building applications today. In order to process heavy/time consuming jobs in your current application it’s generally better to separate it from your core application, and one way to achieve is to use a pub/sub architecture.

The idea here is as follows:

Maintain a queue of messages
A publisher pushes messages to the queue under a topic namespace
A subscriber subscribes to a one or more topic & all subscribers receives it whenever a message is published to that particular topic

The most popular services in this are Apache Kafka, Google’s pub/sub, AWS SQS/AppSync etc. In this article we will demonstrate using Azure’s pub/sub service which is service bus

2. Create a service bus namespace under standard pricing tier (allows us to create topics in the queue)

3. After that create a topic inside the service bus namespace

4. Now all that is left to do is create a subscription for the previously create topic

Optionally we can set the message lock duration or the maximum amount of time the message can be held by subscriber before lock on the message expires to 5 minutes (max for azure)

5. Now we are all set to publish messages to the topic and having a subscriber process/consume it. To do that we can use the Azure SDK

6. Obtain the access key from the portal to authenticate to the service

7. Create a publisher script which will publish messages to the topic

8. Create a subscriber that will keep listening to topic for a new message and process them as they arrive, that will keep processing long running jobs

And voilà, you have successfully created a pub/sub based data pipeline !

The messages that exceed the lock time period can optionally be sent back the queue or send to dead letter queue.

The messages will tried max delivery number of times before they are taken away or sent to dead letters if configured.

References:

https://learn.microsoft.com/en-us/azure/service-bus-messaging/service-bus-quickstart-topics-subscriptions-portal

Subscribe to our newsletter - Modern Data Stack

Master the Art of Data Streamlining with Pub/Sub Pipelines

Recent Posts

Comments