Friday, December 10, 2021

GIF address retrieved despite Tumblr's cleverly evasive HTML fu



Originally posted at httrack in response to finding someone asking the same question i was -- after i found an answer (sort of) -- 

i know one does not need to know why something works to see that it does in fact work, and since it does indeed in fact work, thought it might help if i shared with the class…


PROBLEM:


a tumblr image address, such as this one which i will use as example:


https://64.media.tumblr.com/tumblr_lyxm3aeV5C1r4ghkoo1_400.gifv


is an html page obfuscating where the image is.  from the html source image tag the src:


https:\u002F\u002F44.media.tumblr.com\u002Ftumblr_lyxm3aeV5C1r4ghkoo1_540.gifv


which does not display as an image, soon refreshes to a feckless tumblr 404, & probably sets some punishing cookie on you.




SOLUTION:


i offer the following description of steps by which i solved it:



objective: 

retrieve the actual address of (what i thought was) a gif from tumblr with this address:


https://64.media.tumblr.com/tumblr_lyxm3aeV5C1r4ghkoo1_400.gifv



method:


get image resources used by browser when displaying the isolated image 



browser: 


Waterfox 2020.01 (64 bit), based on Firefox 78 ESR (see https://en.wikipedia.org/wiki/Waterfox#cite_note-22). (i want to say, 'waterfox, bitches' because awesome - do NOT be gaslit in this regard). 



tools: 


No media downloader found it.  I used the old version of Chris Pederick's Web Developer toolbar>images>outline image paths.  (this extension was the bomb diggity before Mozilla 'fixed' it -- as in, had it fixed, as one does with dogs and cats.  if you cannot find it, get the older version from the internet archive; if you cannot install it, set the IA URL root address (w/regex), as an officially trusted source to download addons ,with about:config, if it still will not install, scroll down for old versions, find what you want, right click the link and 'save as,' if its still greyed out or you canot get it, check the source to get the URL & download the file directly via curl/wget -- remember to change the filename of the zip to '.xpi' -- then open with the browser & install.  that should work -- you shouldn't have to manually unzip it & place it to your extensions folder & amend ///Library/Application%20Support/Waterfox/Profiles/<your-profile-hash>.default/compatibility.ini yourself to allow it to run).


that's when i got the actual image url. Chris Pederick's Web Developer toolbar>images>display image info got me the real filetype -- webp -- and 'display image paths' got me:


"https://64.media.tumblr.com/tumblr_lyxm3aeV5C1r4ghkoo1_400.gifv"


wait, what? the address is the same!


i right clicked on it & copied the image address and put that in a new tab...


it was obviously the actual image.  


the same URL retrieved the html one way, and the image, the other.  the two sources are quite different however. what is different?  


the suffix URL? is it hidden, and is HTML? browsers parse html as html whether or not the suffix is appended.


here i get a little confused because when i got the image the url appeared to be the same as before.  


so here's what i know, after tinkering til i could reproduce retrieving the actual gif -- which ended up being a gif not a webp -- which is weird --:


so apparently i downloaded the html of the page AS HTML NOT MHT with the extension 'down them all' -- also no longer available AFAIK to firefox -- local addy file:///Users/<user>/Downloads/tumblr_lyxm3aeV5C1r4ghkoo1_400.gifv.html


opened the file in waterfox then right clicked, got image address and opened it in new tab (open image in new tab was not available).


the address was the same - 

https://64.media.tumblr.com/tumblr_lyxm3aeV5C1r4ghkoo1_400.gifv

but the image is the image not code, and can be saved.  right click and save; it self titles


file:///Users/<user>/Downloads/tumblr_lyxm3aeV5C1r4ghkoo1_400.gif





be seeing you.