I honestly still don’t know why Firestore has a Reference Type. You can’t do anything with it that you can’t do with a simple string that stores a document ID. I have rewritten this article several times from scratch and reworked the following code throughout the changes of the Firebase SDK since Firebase version 8. Firestore References have no actual use by default.
TL;DR#
Some of these custom functions can automatically inner join document references. However, using them will slow down your front end, and your queries could become costly as you query a new document for each document reference. However, advanced Firebase users and someone wanting to stay away from Cloud Functions could find a use case for these functions for a minimum viable example app. The code here could save time.
What is a Reference Type#
A reference type holds a document reference object. This means it will store the path to the document you are referencing. You cannot do inner joins with it; searching is just as valuable as storing an ID field.
Get all posts created by a user#
If you store the author ID in the createdBy
field, you can search for all posts using a specific user ID.
query(
collection(db, 'posts'),
where('createdBy', '==', userId)
);
However, like any other field, you can do the same with a reference type.
query(
collection(db, 'posts'),
where('authorDoc', '==', doc(db, 'users', userId))
);
The real use case is simply storing a full document reference, with the full document path, in your database. Nevertheless, querying is actually more complex, not simpler.
No Foreign Key Constraints#
If you type in the document path in your Firestore Reference field, the document does not have to exist. There is no constraint to keep this intact. This means documents can be deleted or updated, and keeping things in order is up to you. This is no surprise, as we are dealing with a schemaless NoSQL database after all.
Inner Joins#
We really want to use the Reference Type to handle inner joins like an SQL database does. This differs from what you should do, but I wanted to give you the option.
Inner Join Functions#
Let’s start with some helper functions I reworked and simplified from RxFire.
snapToData#
export function snapToData<T = DocumentData>(
snapshot: DocumentSnapshot<T>,
options: {
idField?: string,
} = {}
): T | undefined {
const data = snapshot.data();
if (!snapshot.exists() || typeof data !== 'object' || data === null) {
return data;
}
if (options.idField) {
(data[options.idField as keyof T] as string) = snapshot.id;
}
return data as T;
}
This snapToData
function could be reused in any Firebase app. We will want to get the Firestore data more often than the Firestore metadata. This will map the document ID to a key and return the data for our app to use.
docData#
This was modified and simplified from RxFire so that we can turn the Firebase onSnapshot
function directly into an observable.
export function docData<T = DocumentData>(
ref: DocumentReference<T>,
options: {
idField?: string
} = {}
): Observable<T> {
return new Observable<DocumentSnapshot<T>>((subscriber) =>
onSnapshot(ref, subscriber))
.pipe(map((snap) => snapToData(snap, options)!));
}
collectionData#
This is the RxFire collection observable version. Notice they both use the snapToData function in their code.
export function collectionData<T = DocumentData>(
query: Query<T>,
options: {
idField?: string
} = {}
): Observable<T[]> {
return new Observable<QuerySnapshot<T>>((subscriber) =>
onSnapshot(query, { includeMetadataChanges: true }, subscriber))
.pipe(map((arr) => arr.docs.map((snap) => snapToData(snap, options)!)));
}
getDocData#
My custom promise version mimics the observable version.
export async function getDocData<T = DocumentData>(
ref: DocumentReference<T>,
options: {
idField?: string
} = {}
): Promise<T> {
const snap = await getDoc(ref);
return snapToData(snap, options)!;
}
getDocsData#
Of course, I also had to create a version for collections and queries.
export async function getDocsData<T = DocumentData>(
query: Query<T>,
options: {
idField?: string
} = {}
): Promise<T[]> {
const querySnap = await getDocs(query);
return querySnap.docs.map((snap) => snapToData(snap, options)!);
}
Querying Each Reference#
In order to query the document references, we will have to figure out which fields are document references. That is really want makes a document reference field better than a document ID string. If it is a document reference, query the child document and connect it.
getDocRefs#
export async function getDocRefs<T = DocumentData>(
ref: DocumentReference<T>,
options: {
idField?: string,
fields?: string[]
} = {}
): Promise<T> {
const doc = await getDocData(ref, options);
// find all document reference fields
if (!options.fields?.length) {
options.fields = Object.keys(doc as keyof T).filter(
(k) => doc[k as keyof T] instanceof DocumentReference
);
}
const promises = [];
// create promises for each field
for (const field of options.fields) {
promises.push(
getDocData(
doc[field as keyof T] as DocumentReference
)
);
}
const childData = await Promise.all(promises);
// fetch all promises
for (const field of options.fields) {
(doc[field as keyof T] as DocumentData) = childData.shift()!;
}
return doc;
}
In this function, we can set the fields
for which nested document references we want to query (inner join) or find all of them using an instance of DocumentReference
. Notice we are setting an array of promises and using Promise.all()
to fetch them all at once. If we fetch them one at a time, it will be much slower. If you want better error handling, I suggest you use Promise.allSettled()
instead.
The usage is just like getDoc
, but without the metadata. All document references will automatically be fetched and replaced with the actual document data. So authorDoc
would be replaced with the actual author
document data.
const post = await getDocRefs(
doc(db, 'posts', 'AhEld80Vf0FOn2t8MlZG')
);
getDocsRefs#
We have to have a query and collection version as well. This works identically, but also has to filter through all documents.
export async function getDocsRefs<T = DocumentData>(
query: Query<T>,
options: {
idField?: string,
fields?: string[]
} = {}
): Promise<T[]> {
const docs = await getDocsData(query, options);
// find all document reference fields in first doc
if (!options.fields?.length) {
options.fields = Object.keys(docs[0] as keyof T).filter(
(k) => docs[0][k as keyof T] instanceof DocumentReference
);
}
const promises = [];
// create promises for each field
for (const doc of docs) {
for (const field of options.fields) {
promises.push(
getDocData(
doc[field as keyof T] as DocumentReference
)
);
}
}
const childData = await Promise.all(promises);
// fetch all promises
for (const doc in docs) {
for (const field of options.fields) {
(docs[doc][field as keyof T] as DocumentData) = childData.shift()!;
}
}
return docs;
}
The usage is what you would expect, and still very easy.
const posts = await getDocsRefs(
query(collection(db, 'posts'), limit(2)), {
idField: 'id'
}
);
⚠️ Warning
This will, of course, read every document in your query PLUS every document reference document in every document in the query. An inner join does this in SQL, but you don’t have to worry about pricing or query speed. This is usually done on the database level, which is currently impossible in Firestore.
Realtime Reference Observables#
We also need the observable versions. This can be done with the help of RXJS using combineLatest
. We subscribe to all documents at the same time, then populate the data using map
back into the fields.
expandDocRefs#
Here is the single document version, which combines and joins all document references into their own subscriptions. Notice this uses docData
.
export function expandDocRefs<T = DocumentData>(
obs: Observable<T>,
fields: string[] = []
): Observable<T> {
return obs.pipe(
switchMap((doc) => {
if (!doc) {
return of(doc);
}
// return all observables
return combineLatest(
(fields.length
? fields
: fields = Object.keys(doc)
.filter(
// find document references
(k) => doc[k as keyof T] instanceof DocumentReference
// create observables for each doc reference
)).map((f) =>
docData<T>(doc[f as keyof T] as DocumentReference<T>)
)
).pipe(
map((streams) => {
// replace field with inner join
return fields.reduce(
(prevFields, field) =>
({
...prevFields,
[field]: streams.shift()
})
, doc)
})
)
})
);
}
The usage also needs docData
from the parent document.
expandDocRefs(
docData(doc(db, 'posts', 'YMOaDWjMnWbn2MsLYRNv'), {
idField: 'id'
})
);
expandDocsRef#
We also need a collection query version to query all document references in a set of documents.
export function expandDocsRefs<T = DocumentData>(
obs: Observable<T[]>,
fields: string[] = []
): Observable<T[]> {
return obs.pipe(
switchMap((col) => {
// if no collection query
if (!col.length) {
return of(col);
}
// go through each document
return combineLatest(col.map((doc) => {
// if fields not defined, find them
if (!fields.length) {
fields = Object.keys(doc as keyof T).filter(
// search for doc reference fields only once
(k) => doc[k as keyof T] instanceof DocumentReference
);
}
// get the data for each doc reference
return fields.map((f) => {
const docRef = doc[f as keyof T] as DocumentReference<T>;
// return nested observables
return docData<T>(
docRef,
{ idField: 'id' }
);
});
}).reduce((acc, val) => {
// make one array instead of arrays of arrays
return acc.concat(val);
}))
.pipe(
map((streams) => {
return col.map((_doc) =>
fields.reduce(
(prevFields, field) => {
const fetchedData = streams.shift();
if (!fetchedData) {
return prevFields;
}
// replace field with inner join
return ({
...prevFields,
[field]: fetchedData
});
}
, _doc)
);
})
)
})
);
}
This works the exact same way, but it has to loop through each document in the set of documents. Again, this can get expensive and slow down your query.
Should you use this?#
You should not use any of these functions that combine observables or promises except in special circumstances. They are usually slow and can cost you extra reads. If you fetch the same document reference fields more than once, you are over-fetching. This is a common problem in ORMs with the N+1 problem. If you join data, you can fetch data you already have in memory. Even though you can optimize your queries, you should not use this method in a NoSQL database, especially in Cloud Firestore.
When is this okay?#
If you’re not good at creating Cloud Functions to aggregate your data, this could be an option for you. It could also be an option if you have a small result set in your query. This wouldn't be terrible if you’re querying a few documents with a few inner joins. This is best when you prefer the simplicity of reading versus dealing with complex aggregations, but it is not the optimized way to fetch data or subscribe to your data changes.
What should I do?#
Generally speaking, you should aggregate your data. This means, for example, you should copy a user
document’s important fields onto each post
document that the user created. If the user updates their information, it should populate each of their posts automatically. See Firebase Inner Joins.
These methods can be useful in certain circumstances, you always need to be aware of your options.
See Gist for full code.
J