{"id":187,"date":"2019-08-03T21:48:53","date_gmt":"2019-08-03T21:48:53","guid":{"rendered":"http:\/\/jwonsever.com\/wp\/?p=187"},"modified":"2019-08-04T00:03:53","modified_gmt":"2019-08-04T00:03:53","slug":"getting-your-tagged-facebook-photos","status":"publish","type":"post","link":"https:\/\/jwonsever.com\/wp\/?p=187","title":{"rendered":"Getting your tagged Facebook photos"},"content":{"rendered":"<p><span style=\"font-weight: 400;\">This post only represents an approach to pull data from Facebook that appears to otherwise be unavailable to the end user.\u00a0 I am not a strong proponent of the \u201cremove Facebook\u201d tribe nor do I really care all that much about my data being shared. If you just want to skip to the photo downloading tool, <\/span><a href=\"https:\/\/github.com\/Jwonsever\/fbPhotos\"><span style=\"font-weight: 400;\">click here.<\/span><\/a><span style=\"font-weight: 400;\">\u00a0<\/span><\/p>\n<h2><span style=\"font-weight: 400;\">Motivation<\/span><\/h2>\n<p><span style=\"font-weight: 400;\">Facebook, provided a way to download all of your user data as part of <a href=\"https:\/\/eugdpr.org\/\">GDPR<\/a>.\u00a0 Most of this data is utterly uninteresting to most people, it\u2019s mainly just a history of your activity on the site.\u00a0 For me, I really only cared about one thing, the photos shared on Facebook. I immediately thought it would contain every photo of me, and was excited to expand my offline photo collection with all of the photos taken of me that I had never had direct access to.\u00a0\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Sadly, when you download all of \u201dyour data\u201d, this does not include tagged photos of yourself.\u00a0 Apparently, it\u2019s not your data, since you\u2019re not the primary owner. I had never cared enough before to download them one-at-a-time or write a script to download them, but now I just had to have them, and this little project was born.<\/span><\/p>\n<h2><span style=\"font-weight: 400;\">Getting the photos<\/span><\/h2>\n<p><span style=\"font-weight: 400;\">Ok, I thought this was going to be super easy.\u00a0 They must have a development API that you can go in as an average user and make posts \/ view photos \/ whatever. \u00a0 They don\u2019t. And it was not anywhere near as easy as it should have been. I went through many approaches before I finally landed on something that works.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Plan 0 : <a href=\"https:\/\/eugdpr.org\/\">GDPR<\/a> Data Export.<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">This let me download all of the photos I uploaded to Facebook, and a ton of other stuff.\u00a0 Good start, however I was still missing more than 1000 tagged images. If this is good enough for you, just check out <\/span><a href=\"https:\/\/www.facebook.com\/settings?tab=your_facebook_information\"><span style=\"font-weight: 400;\">this page.<\/span><\/a><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400;\">Plan 1 : Normal API<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Of course, this isn\u2019t available.\u00a0 I took a long look at <\/span><a href=\"https:\/\/developers.facebook.com\/products\"><span style=\"font-weight: 400;\">Facebook for Developers<\/span><\/a><span style=\"font-weight: 400;\"> and found nothing resembling an end user API.<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400;\">Plan 2 : Public scraper scripts.<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">There are <\/span><a href=\"https:\/\/www.google.com\/search?q=download+all+images+on+the+page&amp;oq=download+all+images+on+the+page&amp;aqs=chrome..69i57.3334j0j7&amp;sourceid=chrome&amp;ie=UTF-8\"><span style=\"font-weight: 400;\">plenty of publicly available tools<\/span><\/a><span style=\"font-weight: 400;\"> to get all of the images on a single page.\u00a0 But, probably for performance, Facebook has no single page where you can view all of your photos in full resolution.\u00a0 The closest page I could find was a page that had thumbnails, and you have to infinite scroll to get them. Not good enough.<\/span><\/li>\n<\/ul>\n<ul>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">There is no consistent URL structure you can follow, as they include the size of the photos in the URL, and this is unknown to you from the thumbnail view.\u00a0 So at one point I collected all of the photo IDs, but still couldn\u2019t generate the pages I needed to visit.<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400;\">Plan 3 : Headless scraping.<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">I have some experience using <\/span><a href=\"https:\/\/phantomjs.org\/\"><span style=\"font-weight: 400;\">PhantomJS<\/span><\/a><span style=\"font-weight: 400;\"> as an automation tool.\u00a0 It seemed like a great, easy way to scrape the photos off the page.\u00a0 However, visiting Facebook with such tools results in a \u201cBrowser Not Allowed\u201d page.<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400;\">Plan 4 : Selenium<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\"><b>Finally, something that works<\/b><span style=\"font-weight: 400;\">.\u00a0 Using the thumbnail page I found earlier, I was able to use <a href=\"https:\/\/www.seleniumhq.org\/\">Selenium<\/a> + <a href=\"https:\/\/chromedriver.chromium.org\/\">ChromeDriver<\/a> to write an automation that goes through the thumbnail page, clicks on every photo, downloads it, and saves them to an output directory.\u00a0 I\u2019ve put <\/span><a href=\"https:\/\/github.com\/Jwonsever\/fbPhotos\"><span style=\"font-weight: 400;\">the code<\/span><\/a><span style=\"font-weight: 400;\"> up on my Github, if you want to try it for yourself.<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400;\">Tough cookie that one.<\/span><\/p>\n<h2><span style=\"font-weight: 400;\">A Cry For Help<\/span><\/h2>\n<p><span style=\"font-weight: 400;\">Facebook, if you are listening, photos of me are certainly my data.\u00a0 Please make it easy to download them. Most people are not going to be able to write code to get those photos out of your system.\u00a0 Do it for legal reasons, before you get fined by the EU for the 15th time.<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>This post only represents an approach to pull data from Facebook that appears to otherwise be unavailable to the end user.\u00a0 I am not a strong proponent of the \u201cremove Facebook\u201d tribe nor do I really care all that much about my data being shared. If you just want to skip to the photo downloading<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-187","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/jwonsever.com\/wp\/index.php?rest_route=\/wp\/v2\/posts\/187","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/jwonsever.com\/wp\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/jwonsever.com\/wp\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/jwonsever.com\/wp\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/jwonsever.com\/wp\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=187"}],"version-history":[{"count":5,"href":"https:\/\/jwonsever.com\/wp\/index.php?rest_route=\/wp\/v2\/posts\/187\/revisions"}],"predecessor-version":[{"id":192,"href":"https:\/\/jwonsever.com\/wp\/index.php?rest_route=\/wp\/v2\/posts\/187\/revisions\/192"}],"wp:attachment":[{"href":"https:\/\/jwonsever.com\/wp\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=187"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/jwonsever.com\/wp\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=187"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/jwonsever.com\/wp\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=187"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}