top of page

Subscribe to our newsletter - Modern Data Stack

Thanks for subscribing!

Writer's pictureRahul Kumar

Master the Art of Data Streamlining with Pub/Sub Pipelines

Updated: Sep 14, 2024

Data pipeline using pub/sub
Data pipeline

Creating a pipeline that requires sending some events to be processed in the background keeping scalability in mind is one of the key requirements for building applications today. In order to process heavy/time consuming jobs in your current application it’s generally better to separate it from your core application, and one way to achieve is to use a pub/sub architecture.


The idea here is as follows:

  • Maintain a queue of messages

  • A publisher pushes messages to the queue under a topic namespace

  • A subscriber subscribes to a one or more topic & all subscribers receives it whenever a message is published to that particular topic


The most popular services in this are Apache Kafka, Google’s pub/sub, AWS SQS/AppSync etc. In this article we will demonstrate using Azure’s pub/sub service which is service bus


  1. Login to azure portal and search for service bus

Service bus namespace
Service bus namespace

2. Create a service bus namespace under standard pricing tier (allows us to create topics in the queue)


3. After that create a topic inside the service bus namespace

Topic
Topic

4. Now all that is left to do is create a subscription for the previously create topic

Subscription for the topic
Subscription for the topic

Optionally we can set the message lock duration or the maximum amount of time the message can be held by subscriber before lock on the message expires to 5 minutes (max for azure)


5. Now we are all set to publish messages to the topic and having a subscriber process/consume it. To do that we can use the Azure SDK


6. Obtain the access key from the portal to authenticate to the service

Access key
Access key

7. Create a publisher script which will publish messages to the topic



8. Create a subscriber that will keep listening to topic for a new message and process them as they arrive, that will keep processing long running jobs



And voilà, you have successfully created a pub/sub based data pipeline !


The messages that exceed the lock time period can optionally be sent back the queue or send to dead letter queue.
The messages will tried max delivery number of times before they are taken away or sent to dead letters if configured.

References:

46 views0 comments

Recent Posts

See All

Comments


bottom of page