This post only represents an approach to pull data from Facebook that appears to otherwise be unavailable to the end user. I am not a strong proponent of the “remove Facebook” tribe nor do I really care all that much about my data being shared. If you just want to skip to the photo downloading tool, click here.
Motivation
Facebook, provided a way to download all of your user data as part of GDPR. Most of this data is utterly uninteresting to most people, it’s mainly just a history of your activity on the site. For me, I really only cared about one thing, the photos shared on Facebook. I immediately thought it would contain every photo of me, and was excited to expand my offline photo collection with all of the photos taken of me that I had never had direct access to.
Sadly, when you download all of ”your data”, this does not include tagged photos of yourself. Apparently, it’s not your data, since you’re not the primary owner. I had never cared enough before to download them one-at-a-time or write a script to download them, but now I just had to have them, and this little project was born.
Getting the photos
Ok, I thought this was going to be super easy. They must have a development API that you can go in as an average user and make posts / view photos / whatever. They don’t. And it was not anywhere near as easy as it should have been. I went through many approaches before I finally landed on something that works.
Plan 0 : GDPR Data Export.
- This let me download all of the photos I uploaded to Facebook, and a ton of other stuff. Good start, however I was still missing more than 1000 tagged images. If this is good enough for you, just check out this page.
Plan 1 : Normal API
- Of course, this isn’t available. I took a long look at Facebook for Developers and found nothing resembling an end user API.
Plan 2 : Public scraper scripts.
- There are plenty of publicly available tools to get all of the images on a single page. But, probably for performance, Facebook has no single page where you can view all of your photos in full resolution. The closest page I could find was a page that had thumbnails, and you have to infinite scroll to get them. Not good enough.
- There is no consistent URL structure you can follow, as they include the size of the photos in the URL, and this is unknown to you from the thumbnail view. So at one point I collected all of the photo IDs, but still couldn’t generate the pages I needed to visit.
Plan 3 : Headless scraping.
- I have some experience using PhantomJS as an automation tool. It seemed like a great, easy way to scrape the photos off the page. However, visiting Facebook with such tools results in a “Browser Not Allowed” page.
Plan 4 : Selenium
- Finally, something that works. Using the thumbnail page I found earlier, I was able to use Selenium + ChromeDriver to write an automation that goes through the thumbnail page, clicks on every photo, downloads it, and saves them to an output directory. I’ve put the code up on my Github, if you want to try it for yourself.
Tough cookie that one.
A Cry For Help
Facebook, if you are listening, photos of me are certainly my data. Please make it easy to download them. Most people are not going to be able to write code to get those photos out of your system. Do it for legal reasons, before you get fined by the EU for the 15th time.