The Ultimate Guide to building an in-house chat platform from scratch

The Ultimate Guide to building an in-house chat platform from scratch
Photo by Daniel Korpai / Unsplash

This is a topic I've wanted to write about from a long time. I've built quite a few chat platforms over the years, and I've faced the build vs. buy debate many times. The problem is, most SaaS chat platforms cost exorbitant fees. Back when I had first started with this problem, SendBird was the market leader and their pricing was contact us, which typically means $xxxK per year.

What makes up a chat platform?

  • A database for storing your messages
  • A realtime server for sending and receiving messages
  • A frontend UI for interacting with the chat platform

These are the basics. Now, in terms of features, you probably want some (or all) of the following:

  • Typing indicators
  • Presence indicators (user is online/offline)
  • Read receipts
  • Media messages (send voice notes/images/videos/files)
  • 1 on 1 and group chat
  • Push notifications
  • Threaded messaging
  • Message reactions
  • Moderation tools to review and censor/delete messages
  • Censoring tools to hide unwanted words like f***
  • Working on any network, handling of disconnections and reconnections
  • And more. This list is certainly not exhaustive.

The Stack

Realtime messaging servers

The core of a chat platform is the realtime messaging service. It might use websockets, polling, comet, or any other protocol. It should fall back from sockets to polling in cases that sockets are blocked. So your realtime messaging service needs to be adaptive. Now you have various options here, from SocketIO, to SocketCluster, and there's tools like ejabberd as well.

Ejabberd

Ejabberd is a service based on the XMPP protocol, which has all these features we listed above, and more, built in. It's open source, and based on Erlang, making it highly scalable and fault tolerant. WhatsApp was built using it, and still uses their modified version of it from what I've read.

If you can use Ejabberd, you should. But for most people this will not be an option, because you need to know Erlang should you want to modify anything, and you will need to follow the XMPP protocol, which will slow you down vs. keeping it light. XMPP was made for interoperability between services, which adds some cruft to the interfaces. And to interact with an XMPP service, you need an XMPP client.

SendBird / Applozic / PubNub / etc.

Third party commercial options save you a lot of time, in exchange for money. But if you need some functionality that doesn't exist with your chosen provider, you're SOL. PubNub had some cloud functions that you could use to modify their chat platform but it was dizzying in its complexity.

And, Applozic, the one that had been my choice due to its feature rich nature and reasonable pricing, has shut down. This is another risk you face with commercial options. Applozic wasn't a small player.

SocketIO

This is what I prefer using. SocketIO has libraries for all your major platforms (web/iOS/Android), and it is very widely used. It also provides all the fallbacks, so it'll fall back from sockets to polling if needed. There's plugins available that let you scale it horizontally. If we zoom out for a second here

With one realtime server, when you are chatting with a friend and send a "typing" status to the server, the server will forward this message to your friend. Simple enough. What happens when there are two realtime servers?

Well, your message will be lost. You are connected to one server, your friend is connected to another. These servers aren't connected to each other. Typing messages are not stored in the DB, so the message never reaches your friend.

Now luckily, SocketIO provides you what you need to connect multiple servers so it doesn't matter how many servers you have and which users are connected to what servers, it'll all work as you expect. This module is called socket.io-redis.

For building an in-house chat platform, you can't go wrong with Socket.IO.

Databases

Any database should do, honestly. Only one I would recommend against, is Firebase. I've tried to build chat apps with Firebase, and I noticed 10s+ latencies for message propagation at times. This is completely unacceptable for a chat platform. Also, good luck querying that database. If you want NoSQL, use Mongo. If you want SQL, use Postgres. If you are already using a database, and you'd like to stick with it, you can do that.

Push Notifications

For push notifications, there's 3 kinds.

  • iOS
  • Android
  • Web

You can use something like OneSignal to manage this for you. You will need to deal with FCM and APNS. FCM works for web and Android, and iOS requires APNS. There's some ready-made modules for this like

node-pushnotifications
A cross-platform push service for node.js. Latest version: 2.1.0, last published: 2 months ago. Start using node-pushnotifications in your project by running `npm i node-pushnotifications`. There are 29 other projects in the npm registry using node-pushnotifications.

but if you can, just outsource this. It's generally quite cheap. But you should be fine in either case.

Object Store

If you are going to allow uploading of files, you will need an object store for them. There's a few things you need here:

  • S3 or similar object store to store the uploaded files
  • Thumbor or similar thumbnailing service to generate thumbnails of uploaded images to preserve bandwidth, speed up loading of images
  • Elastic transcoder or an FFMPEG based servive to transcode audio files if you want to support voice notes. You can transcode on device or on your servers, but you will probably need to transcode to mp3 and ogg, so you'll likely be doing this server side

Clients

Goes without saying, but you'll need clients that can connect with your realtime messaging servers. Any frontend should be fine, as long as the platform you are using provides libraries for it. Keep in mind, these libraries manage reconnections, retrying failed message deliveries, etc., you don't want to do this yourself. If you have an iOS app and found a cool service that has all the features you need but doesn't support iOS, you're better off using something else.

Features

Messaging

For the most basic messaging, you will send a message to the realtime messaging server when a user sends a message. The RMS (realtime messaging server) will store it to the DB, then fan it out to all other connected clients.

Typically, RMS' will have a concept of "rooms". You can use this to create "chat rooms" for 1 on 1 and group messaging use cases. Then have users send messages to rooms, so you can associate it with the right chat.

When a client sends a message, you send it to a specific room, say "{userId1}-{userId2}", or "{groupchat123}". Then you receive this message on your server, and you forward it to all the connected clients in that room, and SocketIO will avoid sending your message to the original sender by default, but this is based on their socket, so you want to deduplicate based on message id on the frontend, just in case.

Also keep in mind, if a client gets disconnected, any messages sent during their disconnection are not delivered to them, so you will want to have them request all the messages they missed on reconnection. You can either send the last 50 or ask them to send the timestamp or id of the last message they have stored locally and send the messages updated after that one.

When a new client connects, they should receive the last X messages, and you need infinite scrolling to load older messages as well. Make sure to implement proper maintaining of scroll position when doing this.

Typing Indicators

On keyup/keydown/keypress (pick your poison), with a debounce of a couple seconds, send a message to the connected room just saying "typing". Fan it out / forward it to other connected clients in the same room from the server. On the frontend, give this message a TTL of maybe 1 second more than your debounce.

This is a very basic example, you can tweak it as per your requirements

Presence

For tracking presence, ping the server every, say, 30, seconds. Store the timestamp in a DB. Keep a buffer period of say 60 seconds. When you receive no ping in 60 seconds, you can either run some kind of recurring cleanup job on the server to mark clients offline, or you can rely on the last seen timestamp being within the last 60 seconds to consider them online, else consider them offline.

Media Messages

Let users upload the kinds of files you want to support. If you want to support voice notes, which I find a pretty important feature, you should use WebRTC on the web, and the native counterparts on other platforms. There's libraries that will let you transcode the voice files in the browser. This is important because WebRTC will create HUGE WAV files by default. This is one of the plugins you can use for this in-browser transcoding

GitHub - Kagami/vmsg: :musical_note: Library for creating voice messages
:musical_note: Library for creating voice messages - GitHub - Kagami/vmsg: :musical_note: Library for creating voice messages

For images, either compress / resize them on device using similar plugins / packages, or use a image proxy like Thumbor that can let you compress and resize images on the fly. Put a CDN in front if you want the images to load quickly. You can show a thumbnail in the chat and open the full res version on click. For videos, just transcode on the server using FFMPEG or elastic transcoder, and serve the transcoded version to clients via a CDN. This adds a minor delay based on uploaded file size and duration, but unless you are able to do the transcoding on-device, I don't know of a better option.

Chat Rooms

Sooner or later, you're going to want to have chat rooms to allow multiple people to chat at once. Luckily this is a pretty standard feature in all RMS' and commercial SDKs. All your clients need to know the room names, and you need to put some kind of auth gate on the rooms, so not just anyone can join in and listen in on messages.

You will want to provide some ways for users to differentiate users on the frontend, and one user might have "mod" rights.

Moderation

You likely want to censor certain words, and generally be able to hop into any chat room and make sure everyone is being nice. For censoring, you can use several available plugins, or a dictionary of "bad words". This is a sample library:

GitHub - SpoonX/Censoring: Censor or highlight words and other patterns intelligently.
Censor or highlight words and other patterns intelligently. - GitHub - SpoonX/Censoring: Censor or highlight words and other patterns intelligently.

You probably want to track infractions by users so you can act against repeat offenders. You might want to store the original uncensored version of the message for internal purposes.

Apart from this, you want to create an admin UI so your moderators can access all chats when required, and edit/delete messages if needed.

Threading and Reactions

This is a matter of database design. For threading, you will want to attach the replies to the original message via a relationship, and then send the original thread start message with the replies, if you want to show the original message. You can also request all replies for a message if you want to show threads separately. In essence, a thread becomes another "chat room". But you don't need to create a separate SocketIO room for it.

For reactions, you can send reaction events to the chat room whenever someone reacts, and then when someone loads the room in the future, send all the reactions to the message along with the message itself.

Unreliable networks

SocketIO provides long-polling fallbacks, which should work across any network conditions, but websocket is preferable. Use SocketIO and you shouldn't really have to worry about this. Try to keep your chat servers close to your users, so your messages are delivered quickly. SocketIO has comprehensive guides on these issues, officially, and by the community, so you're in good hands here.

Scaling

As mentioned above, you can use these SocketIO adapters to scale horizontally.

Redis adapter | Socket.IO
How it works

I have supported thousands of concurrent clients on a single DO droplet with 1GB of RAM, SocketIO is pretty performant.

Ending Thoughts

I've been meaning to write this post for a long time. At one point, I had built so many chat platforms, I even wanted to create a SaaS chat platform. It's been quite long though, so if there's something I missed, please reach out to me over email, and I'll be happy to tweak this post.