
Building a Scalable Follower Feed with Firestore
15 min read
I have written several articles over the years about this subject, changed my thought process, saw other people's ideas, and changed my ideas again. Here I will cover everything you need to know in one article.
What is a Follower Feed?
A follower feed is NOT something trivial in noSQL: show a list of the latest posts by User's you follow. While a regular feed can show the latest posts by all users, a follower feed will show you only the posts by the users you follow.
Schema / Data Model
Whether you're using a GraphDB, noSQL, or SQL, the data model is still generally the same. You're going to have posts, users, and follows.
Posts
id
title
authorId
...
Users
id
username
...
Followers
follower_id
following_id
This translates to noSQL like so:
posts/{postId}
id
title
authorId
createdAt
...
users/{userId}
username
...
followers/{followerId}
following_id
...
Although, as you will see, this will not work from a querying perspective. You could also easily use subcollections instead of root collections for any of the collections, but the result is the same.
Queries
Here are the queries that show what we're tying to achieve:
SQL 1
SELECT * FROM Posts p WHERE authorId IN
(SELECT following FROM Followers WHERE follower = $UID)
ORDER BY createdAt DESC
SQL 2
SELECT * FROM Posts p
JOIN Followers f ON f.following = p.authorId
WHERE f.follower = $UID
ORDER BY createdAt DESC
GraphQL
query {
queryPost(
where: { authorId: { follower: { id: $UID } } },
order: { desc: createdAt }
) {
id
title
createdAt
...
}
}
So we really just end up with a many to many like so:
Posts <- Followers -> Users
If we want to translate that to Firestore noSQL, we get something like this:
const followersRef = db.collection('followers')
.where('follower', '==', $UID);
const following = (await followersRef.data()).following.
db.collection('posts').where('userId', 'IN', following);
But we are limited to following 30 people, and we are really doing two queries on the frontend, instead of a backend join.
Tag Feed
We also run into the problem of other desired feeds, like following tags. We may want one feed with posts about the latest tags we follow, or a feed with posts from the latest users we follow, or BOTH. Then you get into weighted queries. If a post has a tag and a user we follow, should it be more important? As we have seen with other social media, we may want to artificially promote, or demote certain types of posts, or we may want to use AI to create more addictive feeds. These advanced types are beyond the scope of this post, and not really fit for aggregations etc.
Imperfect Versions
So, let's see what we can do if we are not worried about scaling to millions of users.
Version 1 - Frontend Nightmare
Do all query combining, indexing, etc on the frontend. This will cost you a lot and be slow. No thanks.
A sister version is to have a following
array in users/{userID}
. You can then grab just one document with all the users you have to follow, then grab their posts on the frontend. Better, but still over-reading.
Version 2 - Build Your Own Feed
This is one of my ideas. Basically, when each user logs in, they update their feed on the spot. They will save the last updated date, and populate their feed in the background. This makes sense to me in certain circumstances, but is still not the best.
Version 3 - Fireship's Method
This method was quite complex, but is definitely worth understanding.
Basically he has this data model:
followers/{followerID}
recentPosts: [
...5 recent posts here
],
users: [
user_following_ids
]
posts/{postId}
...post content here
users/{userId}
...user content here
With this query:
const followedUsers = await db.collection('followers')
.where('users', 'array-contains', followerID)
.orderBy('lastPost', 'desc')
.limit(10)
.get();
You create a posts aggregation Firestore Trigger Function, to aggregate the latest 5 posts in recentPosts
.
This works great, but then you have a limit on the possible followers you can have, due to using an array, and a limit on the number of posts you have on the frontend. You still need to sort all the latest posts. This is a great idea, but a hack none-the-less.
It is interesting to note here that he believes mass-duplication is unscalable if you do the math, due to the cost of mass duplication for someone with millions of users. He is right and wrong here.
Version 4 - Albert's Version
This is the best hacked version I found from stackoverflow. It basically says store the posts like so:
users/{userId}
recentPosts: [
...1000 recent posts
],
recentPostsLastUpdatedAt: Date
posts/{postId}
...post content here
following/{followerId}
following: [
...users following
]
You aggregate up to 1000 documents on the user document in this version. Then it tells you to get all follower IDs in batches of 10 (30 OR clauses available now):
query(usersRef,
where(‘userID’, 'in', [FOLLOWEE_ID_1, FOLLOWEE_ID_2, …]),
where("recentPostsLastUpdatedAt", ">", LAST_QUERIED_AT)
)
Once you get all users, you have all user posts, which is potentially 1 million posts for the price of 1000 reads. I like the thinking here, but still not for me. Again, too much frontend sorting, and over-complicated when you're just starting out.
Version 5 - My Crazy Aggregation Version
So, I came up with a theoretical idea for a scalable version. It uses arrays to save money, but ultimately made no sense. Imagine using a 3 step aggregation to ultimately get a feed collection like this:
feed/{postID}
createdAt
followers: [
....first 1000 followers
]
This gave you a neat query like so:
db.collection('feed')
.where('followers', 'array-contains', userId)
.orderBy('createdAt', 'desc');
But ultimately, it was too unrealistic and unreliable. While arrays save money, they are limited and require more splitting.
Version 6 - Mass Aggregation Fan-out
The biggest problem with mass aggregation is the limits of Firebase Functions; it could time out. However, it can be solved.
Imagine creating an onWrite
function for the posts
collection. This could trigger a callable function, say populateFollowerFeed()
. This function could look like this:
populateFollowerFeed({
data: change.after.data(),
startId: '0x12slsl2sls`,
num: 20
});
Yes, you can call a function inside a function. This function would go through a follower collection (either subcollection, or a query within a root collection) to get all the user ids from followed users. It could add the created / updated posts to each user's feed collection.
This function would call itself again with the next startID, until there are no more follower ids. This prevents function timeouts. You should probably have it delete aggregated posts as well.
The beauty of this, is that you could have another callable function for populateTagsFeed()
. This could be important if you want to mix and match your posts by followed tags as well.
Yes, this gets expensive for writes. However, it is simple, idempotent, and cheap for smaller to middle size databases. I disagree that this is unfeasible, as Firestore is specifically built for reads, not writes. All noSQLs are made to think this way. If you have 1,000,000 users, the costs should be minor compared to your real needs.
The Firebase Way
Luckily, with the Firebase platform, you don't need all that. You could just use your Cloud Functions Trigger to offload the batching to another Google Platform like App Engine, Cloud Run, or Compute Engine with no or longer time limits.
Ideas from Twitter
Twitter does this Fan-out method to Redis: See Design twitter timeline. They actually manually grab the latest posts from users with large followers
, instead of a doing a fan-out on everyone, creating a hybrid method. This could be 10 million or 100 million followers, as it is still pretty fast.
Other Databases
Mass Aggregation may be the Firebase way. However, I suspect, considering they recommend Algolia for searching, that the Firebase team would recommend using an external database for your feeds.
However, keep in mind Algolia and other noSQL databases made for searching, cannot do the joins required for a simple follower feed.
My recommendation would be to use RedisGraph. I am a huge fan of it. No fan-out required, and the database itself is a cache. You would still need a posts
Trigger to keep it up to date. You can find several cloud hosted versions. It is also scalable, although potentially expensive. However, the speed is probably worth it for you. Another option may be to use Big Query with a Firebase Trigger.
Outside of these options, you may want to think about another database. However, unless you have millions of users, Firestore should work just fine for your use case with basic aggregation techniques.
J