Remove Exif data for images hosted on Cloudflare R2

Back in May 2023, I wanted to reduce the number of VPS and root servers I had. Not only because of the monthly cost but also because everything was so cluttered. I used one of the servers to host image uploads. I uploaded screenshots simply using ShareX on Windows or Dropshare on macOS via SSH/SCP/SFTP.

I have written about my free movie poster hosting on Cloudflare R2. My usage on the movie poster bucket was quite low. As of now, I only have 505 MB of storage usage and a measly three-digit number of Class A/B operations. So there’s still a lot of headroom compared to the 10 GB of free storage + 1M class A and 10M class B operations per month. So why not set up another bucket for my image hosting?

Cloudflare R2 provides an S3-compatible API, so I can use any screenshot/upload client that also supports AWS S3. At that time, I had around 3 GB of screenshots/images uploaded on my server that I wanted to migrate. I planned to sort everything out later (spoiler: I never did). For the migration, I simply SSH’d into my server and uploaded everything via rclone. Afterward, I just had to connect a domain to the bucket, and the job was done.

I hadn’t touched this setup until last week.

Over the past few months, I started using the Dropshare client on my iPhone as well. However, I recently realized a downside: when I take photos, they contain Exif data that includes my GPS location. Dropshare does not remove this data, and perhaps I do not want to share my location with anyone who has a link to my uploads.

Cloudflare Images is one of their services to host images on steroids. They have extra features like filtering or resizing on the fly + caching everything. They provide an extra feature called Cloudflare Polish which is actually a performance feature by reducing file size without losing image quality but it also removes Exif data from images. Unfortunately, this service costs $6/month at a minimum and may scale up depending on usage. Not that $6 is a lot, but it adds up and I wanted to reduce costs overall.

I wanted to stick with R2 for now, so I adopted a similar approach to my movie poster hosting by using a Cloudflare Worker in front of the bucket. This worker could remove the Exif data on the fly. Since this process may take some time and computing power, I only need to execute it once and move the cleaned version to another “folder.”

Using external libraries is not possible in Cloudflare or at least it’s overly complicated so I needed a simple and pure JavaScript solution to strip the Exif data.

I found this solution on Stack Overflow which was basically all I need. It reads the image file as a buffer, looks for the Exif segment and simply removes it. So in the end I came up with this code for the worker:

let cache = caches.default
let image_path = 'cleaned/'

function cleanBuffer(arrayBuffer) {
  let dataView = new DataView(arrayBuffer)
  const exifMarker = 0xffe1
  let offset = 2

  while (offset < dataView.byteLength) {
      if (dataView.getUint16(offset) === exifMarker) {
          const segmentLength = dataView.getUint16(offset + 2, false) + 2
          arrayBuffer = removeSegment(arrayBuffer, offset, segmentLength)
          dataView = new DataView(arrayBuffer)
      } else {
          offset += 2 + dataView.getUint16(offset + 2, false)
      }
  }
  return arrayBuffer
}

function removeSegment(buffer, offset, length) {
  const modifiedBuffer = new Uint8Array(buffer.byteLength - length)
  modifiedBuffer.set(new Uint8Array(buffer.slice(0, offset)), 0)
  modifiedBuffer.set(new Uint8Array(buffer.slice(offset + length)), offset)
  return modifiedBuffer.buffer
}

export default {
  async fetch(request, env) {
    let url = new URL(request.url)
    let key = url.pathname.slice(1)

    if (request.method == 'GET') {
      let cached = await cache.match(request)
      if (cached !== undefined) {
        return cached
      }

      let object = await env.BUCKET.get(image_path+key)
      if (object === null) {
        let realobj = await env.BUCKET.get(key)
        if (realobj === null) {
          return new Response(null, {'status': 404})
        }

        let arrbuf = await realobj.arrayBuffer()
        let cleaned = cleanBuffer(arrbuf)
        await env.BUCKET.put(image_path + key, cleaned)
        object = await env.BUCKET.get(image_path + key)
        await env.BUCKET.delete(key, cleaned)
      }

      let headers = new Headers()
      object.writeHttpMetadata(headers)
      headers.set('etag', object.httpEtag)
      headers.set('cache-control', 'max-age=' + 60*60*24*30)
      let response = new Response(object.body, {headers})
      await cache.put(request, response.clone())
      return response
    }

   return new Response()
  }
}

This could be further improved by restricting the process to .jp(e)g files only, as there’s no Exif data in .png or .gif formats. The process follows these three steps:

  1. Check if the response to the request has already been cached. If so, return the cached response to save on R2 requests. More on caching here.
  2. If the response is not cached, verify if the file exists in the cleaned/ folder, indicating that it has been processed before. If found, return that object.
  3. If neither of the previous steps yields a result, check if the file exists at all. If it does, remove the Exif data, store it in the cleaned folder, cache it, and serve the object.

There are two downsides to this approach. Firstly, there might be a noticeable processing time during the initial load, especially for larger images. While this inconvenience isn’t significant, it’s something worth addressing in the future. Secondly, the workers always run before the cache, which limits the full utilization of Cloudflare’s cache to prevent requests to both the worker and the bucket. However, given the current low usage, this limitation isn’t a major concern.