Having to sync the filesystem state with a database, containing one-on-one relationships with the files, can be a tricky task. In this article, I will explain how I implemented my solution for this problem.
I have been developing a full-stack application called Mangatsu for storing, managing, and reading various image collections, such as manga. It consists of a backend server application written in Go with secure API access and user control, and a client application written in TypeScript and Next.js. In this article, I will focus on the backend server application.
The name "Mangatsu" is a play on the Japanese words "mangetsu" (満月, full moon) and "manga" (漫画, comic).
Mangatsu scans the directory paths provided by the user for manga, comics, doujinshi, and other collections (referred to as galleries from this point forward) and parses all the information it can from the filenames, included metadata files, and the pages themselves.
Mangatsu accepts two different directory structures:
As shown in the examples, numerous galleries are compressed in formats such as zip/cbz, rar/cbr, and 7zip. To avoid extracting the same file with each request, and to reduce server load and latency, it is not desirable to extract the file every time a user opens a gallery in the client. While caching the result on the client side is an option (and is used to some extent alongside server caching!), it can be unreliable, doesn't prevent abuse from bad actors, and, especially with applications like Mangatsu where the cached data can be huge, it is impractical for the client (browser, phone app, etc.).
Implementing server-side file cache
... which is relatively simple.
First, this is how I initialize the cache directories.
Next, implementing functions to read from the cache and write to it. When writing to cache, there are two validations before extracting the gallery:
Does the cache directory already exist?
If it does, are there any files (i.e. pages) residing there?
If yes, then proceed to just return the residing files.
※ If someone or something has corrupted the files in the cache, it would cause a problem here for the end user. I am considering writing additional validation here.
The next section will detail how to sync this file cache with the server and the database.
Implementing file locking
What happens if there are two or more requests trying to access a gallery at the same time? Especially in the case of a non-cached gallery, it would cause many unnecessary writes which in turn could cause corruption in the cached file themselves, and in the worst case, could get abused to crash the application or even the system.
To prevent this, I decided to use maps of mutexes. Essentially, I maintain a runtime map of all the cached galleries and their last access times, with mutexes to signal when a particular gallery cache is being written to.
Initializing the map and reading the existing cache into the map by iterating and verifying the cache directory:
The galleries will be read with the following functions with the logic flowing as follows:
Check if the gallery exists in the runtime cache map
If not, a new entry will be created.
If it does, only the access time will be updated.
Lock the mutex to block any operations for the gallery.
If the gallery resides in the cache already, only the filepaths will be read and returned. If not, it will be extracted and copied.
Return filepaths and their count.
Release the mutex (notice the defer statement).
Pruning expired cache entries according to their last access timestamps requires the use of mutexes as well. The PruneCache function runs every minute to clean up expired gallery caches.
The PruneCache function iterates through the entire runtime cache map, locks the mutex while checking the access timestamp, and if expired, removes the cache entry from the map and deletes the files.
To ensure safe file removal, I am cautious not to remove anything other than the cache directory that starts with a UUID. This is verified by checking that the path begins with a valid UUID.
Closing
In this article, we've explored how to implement a simple file caching mechanism with mutexes in Go for a server. By using runtime maps and mutexes, safe concurrent access to cached galleries can be ensured while efficiently managing the cache. This approach not only reduces server load but also prevents potential issues from simultaneous access and cache corruption.
As the project evolves, I will continue to update this article with new insights and improvements. Thank you for reading, and I hope you found this guide helpful for your own projects!