New: iPhone/iPad development course in Belgium in August.

6 Jul 2008

Adobe's announcement that Google and Yahoo! will be indexing Flash content at a much deeper level was met with all sorts of reactions last week, ranging from praise from the Flash and Flex communities to utter shock and horror from some HTML fundamentalists expressing fear that the end was nigh.

Well, it appears that Google has already started using this new indexing system and some Flash developers are not happy by how much it is revealing about their applications.

Peter Elst, a prominent Flash developer (diclaimer: and member of the Flash Pack on Pistach.io, of which I am a partner), just Twittered the following:

oh no, SWF indexing seems to do just as I feared -- already noticed Google was picking up my test Flash SEO swf but its now exposing URL's

And posted his concerns on his blog, wherein he quotes Ryan Stewart from Adobe on what exactly is getting indexed:

… it will move through the states of your application, get data from the server when your application normally would, and it will capture all of the text and data that you’ve got inside of your Flash-based application.

Peter goes on to state why this could be dangerous:

The concern I have here is that URL requests to the backend will get indexed, those URLs getting exposed in search queries or spider bots hitting those URLs could cause issues. Its not like in HTML content where the search engines can ignore form submit URLs, there is no such context in a HTTPService or URLRequest.

Do you remember the damage Google Web Accelerator caused when it started deleting data by following badly-coded links in web applications? The problem there was web developers using GET requests for non-idempotent operations such as deleting data. It remains to be seen how zealous this indexing is with regards to following data calls from Flash and Flex applications and what, if any, side-effects this will have.

Peter also mentions that the SWF indexing is not opt-in but that's a fact of life with search engines in general and not something that is unique to Flash content. I am sure that this new indexer will follow the instructions in your robots.txt file.

Unfortunately, Flash developers have also been harboring under a false sense of safety that because Flash bytecode is compiled, they can put sensitive information inside their SWF files. I've been warning about the dangers of doing this for years now but this latest development should hopefully help to educate Flash and Flex developers that anything they put in a SWF should be considered public information. Previously, the only security afforded by the bytecode representation of SWF content was security by obscurity -- which we know is not security at all.

To quote Kristof, who posted a comment on Peter's post:

Argh! Google has actually started indexing my Flash Files and is revealing all the URL’s of the pictures in Flash. But also the url’s of the MP3’s I placed in Flash. I was hoping Flash would conceal it - because now, anyone can download our music without paying for it.

I don't know if Google is actually exposing URLs received from data calls or whether the URLs were hard-coded into the SWFs but, if the latter is true, than that information was always available to anyone with a decent SWF decompiler. It was just a little harder to get to.

I'd love to hear your thoughts. How is the new SWF indexing feature of Google and Yahoo! impacting your SWF applications? Leave a comment and let me know.

Add Your Comment

Spam Protection by WP-SpamFree

Google and Yahoo’s Flash indexing is revealing… too much?

  1. “I don’t know if Google is actually exposing URLs received from data calls or whether the URLs were hard-coded into the SWFs but, if the latter is true, than that information was always available to anyone with a decent SWF decompiler.”

    …or just about anyone with firefox installed with the firebug extension. In the NET tab you can see just about any dependent file the current page is requesting, so thinking flash can hide mp3 files (or any kind of data really) on your server has always been a very, very bad idea.

    Or if you’re really serious about seeing what requests are being made, you can always rely on tools like Charles or Fiddler to intercept any request from and to a (flash) site.

    So Google isn’t really suddenly “exposing” all of your dirty little server side requests, it’s just making them easier to get to.

    I guess if you’re really serious about concealing your data or your media you could try to use Flash Media Server or Red5 or any other socket-based solution to stream your content to your site – but unless you’re using a really complicated encryption scheme that can’t even be decyphered by decompiling your swf, I imagine even that can be cracked.

    So the moral is… don’t put anything online that you don’t want people to see or download I guess.

    Gilles Vandenoostende
  2. “But also the url’s of the MP3’s I placed in Flash. I was hoping Flash would conceal it – because now, anyone can download our music without paying for it.”

    lol, ever heard of livehttpheader?
    https://addons.mozilla.org/en-US/firefox/addon/3829

    Horst
  3. Hi Gilles,

    That’s the same point I’m making :) And I’ve been talking to anyone who would listen for the longest time about not putting sensitive information in their SWF files.

    A few years ago, when the crazy frog ringtone became widespread, I was looking on a ringtone site that was done in Flash and they were basically hard-coding the URLs into their SWF files. Funny! Mind you, they were doing literally millions in business from what I understand.

    @Horst: I still can’t believe how little Flash developers sometimes understand how the web works. I didn’t like livehttpheader too much but that’s probably because I’m really spoiled with ServiceCapture, Charles, and Firebug :)

    Aral
  4. And, just to clarify, my concern here is whether or not the indexer is going through URLs in SWF files and making Flash Remoting or loadVariables, etc. calls to them.

    Again, even that will not affect a well-coded application.

    Not all applications, unfortunately, are well coded.

    Aral
  5. Flash indexing is LONG overdue. What took them so long I wonder?

    JT
    http://www.FireMe.To/udi

    James Jones
  6. Just to clarify what I’m talking about — search engines indexing your SWFs is one thing, I think its fair to say it has always been relatively easy to decompile files or just use any tool to look at the HTTP traffic.

    What are important points to raise:

    - Any URL you use in your SWF files gets and returns text gets indexed and crawled separately and is linked to directly, not in context of the SWF file that uses it.

    Lets say you have a MyGreatApp.swf that loads in asset.xml, Google picks up on assets.xml in the search results and links to that XML file. What the user is presented with is the XML document and not your SWF or the page that embeds your SWF.

    I fail to see how this helps Flash SEO in the least.

    At least in the way Google handles things by indexing it seperately there is no advantage whatsoever, on the contrary it just pollutes search results with files that are of no relevance to the user.

    - Google follows scripts, in one of my tests I reference a PHP script in a URLRequest/URLLoader — the search bot triggered that script which then sent me an email. Eventhough I was using URLVariables no value was sent, i.e. Google doesn’t use dummy data for input textfields and the likes.

    OK, saying this will make Flash and Flex developers take care making their applications secure is one thing. I applaud that, but lets not kid ourselves there are likely hundreds of thousands of SWF files out there that are going to be affected by this.

    Peter Elst
  7. Really interesting post Aral. I was thinking about this when I heard the news.

    As well as decompilers being a way to see what’s going on in a swf, along with other ’sniffing’ methods (eg: ‘netstat’ on windows cmd) , there’s always the ‘Activity’ window in Safari which exposes urls to linked media within.

    I wonder how Google copes with externally (and securely) linked ‘loadMovie’ swfs?

    Kosso
  8. Great article. I like this new feature. The content was anyway open for anyone. Has anyone posted some guidelines for making flash pages that are search engine friendly? I mean there are any means to control what the search engine will see and how the content will be indexed?

    I wonder how the results will look like because it’s not possible to create an URL that will put the application in a specific previously indexed state. Are we going to see direct links to resources used in flash movie clips?

    Andrei
  9. Might be worth quoting Google on how they handle it:

    http://googlewebmastercentral.blogspot.com/2008/06/improved-flash-indexing.html

    “In addition to finding and indexing the textual content in Flash files, we’re also discovering URLs that appear in Flash files, and feeding them into our crawling pipeline”

    The problem here is that there is no form element context like you have in HTML which it can ignore, it just crawls any URL it finds.

    “We currently do not attach content from external resources that are loaded by your Flash files. If your Flash file loads an HTML file, an XML file, another SWF file, etc., Google will separately index that resource, but it will not yet be considered to be part of the content in your Flash file.”

    See my earlier point — assets.xml used in your application.swf gets crawled and indexed as assets.xml with *no reference* to your SWF application. That is insane IMHO. It doesn’t help anyone, its like Google acting as one big SWF decompiler and spitting stuff out without any context to what it relates to.

    Adobe deciding to just turn this on without any warning or guidelines to content authors is what I am really concerned about here.

    Peter Elst
  10. The URLs used in your Flash app are always exposed anyway, you can’t hide the URL to a remoting call or to the mp3’s being loaded. Even Safari has an activity window as default which will show you every URL call.

    And for the second part of the problem, such as google arbitrarily following those URLs and remoting calls, well the backend would need to be seriously poor for that to be an issue.

    Just my 2 cents.

    Alex
  11. Ah well, this information has always been available via a browser’s cache. For example, type “about:cache” into Firefox’s address bar (assuming you use Firefox) to see virtually all MP3s, FLVs, XML and anything else cached, regardless of wether they are hard coded or not.

    Lawrence
  12. Anyone putting sensitive information in a file which is available to the public is an idiot, plain and simple. Peter Elst should know better; if he wanted security he should’ve used security, with HTTPS calls and maybe a POST along with the referrer check since any referrer can be spoofed.

    Flash content was never secure. The data stream is available and there are dozens of programs available for all the major OSes to capture those streams. Kristof needs to stop seeing the OH NOES BAD INTARWEBS PYE_RATS and start seeing OH WOW MOAR PEEPULS TO PROMOAT MY MUZIKS! Either that or implement proper security (which will nevertheless be bypassed)

    It’s technology. Nothing’s changed other than the form and format. Developments continue. Either program securely from the ground up or expect that your data will be visible within 10 years. Just above this comment Peter notes that Google does not currently attach external resource content. Why would anyone believe that they wouldn’t do so at some point in the future?

    You can block your swf files in robots.txt or let them be crawled so that they gain more visibility. If you have SOOPER SEEKRIT data like hidden URLs (hah!), encrypt and use only secure connections. But don’t believe for a second that in 20 years SWFs won’t be able to be decompiled and extracted from on the fly.

    ReallyEvilCanine
  13. I think a lot of people here are missing the point completely, its not about secret stuff inside SWF’s getting exposed.

    “Peter Elst should know better; if he wanted security he should’ve used security, with HTTPS calls and maybe a POST along with the referrer check since any referrer can be spoofed.”

    I am not talking about ’security’, I’m talking about URLs getting indexed and linked to from search engines.

    Look at the following in HTML (opening and closing tags obviously left out to avoid formatting issues):

    form method=”POST” action=”myscript.php”
    input type=”text” name=”myname”
    input type=”submit”

    You will not found Google indexing “myscript.php” and showing that up in the search results.

    Compare this to the following in Flex:

    HTTPRequest url=”myscript.php” method=”POST”

    Here the “myscript.php” will show up and the search result won’t link to your SWF that uses the script but directly to that script itself.

    Now, tell me how that isn’t an issue? Google has no context of things be a form or something else when data is loaded into an SWF.

    Trying to make out this helps Flash SEO in any way is complete nonsense.

    Peter Elst
  14. BOOHOO

    Anon
  15. ITT: people who don’t know how to use robots.txt

    Evil Al
  16. Idempotent doesn’t mean impotent, deleting data from a GET request is perfectly valid.

    from RFC 2616

    9.1.2 Idempotent Methods

    Methods can also have the property of “idempotence” in that (aside from error or expiration issues) the side-effects of N > 0 identical requests is the same as for a single request. The methods GET, HEAD, PUT and DELETE share this property.

    maht
  17. You can find everything you like in my name.

    phet
  18. It’s called encryption… look into it and you won’t have to worry about things.

    pile
  19. Using GETs for non-idempotent requests has been a crucial no-no since HTTP was invented. You’d be surprised how many professionals in this field think that it’s “no big deal” to have a GET that is exposed publicly and causes some permanent action.

    Isaac Z. Schlueter
  20. Maybe you’re unaware of Sothink’s Flash Decompiler, which has been around for years? Flash is far from bytecode: http://www.sothink.com/product/flashdecompiler/

    pmedic
  21. @maht, @Isaac Z. Schlueter

    While being GET being idempotent doesn’t mean a request can’t delete data, deleting something twice is the same as deleting something once. The requirement for GET requests to be “safe” does imply that GETs should not destroy data.

    From rfc2616:
    9.1 Safe and Idempotent Methods
    9.1.1 Safe Methods
    [...]
    In particular, the convention has been established that the GET and HEAD methods SHOULD NOT have the significance of taking an action other than retrieval. These methods ought to be considered “safe”. This allows user agents to represent other methods, such as POST, PUT and DELETE, in a special way, so that the user is made aware of the fact that a possibly unsafe action is being requested.

    pkqk
  22. I don’t think either Google or Yahoo will stop what they’re doing. Sure, some bad things may come about from it — but they were bad things to begin with. If music was being protected only by the fact that the URL was only accessible through a Flash file really does mean it was never protected to begin with. “Security through obscurity isn’t.”

    I’m positive there are plenty of Flash apps out there that already do the right thing and are cheering with joy at all the new visitors they are getting because their content is finally being indexed by major search engines. The rest may have a rude awakening about the security of their current model, but that will only improve the overall state of the web and reduce the ignorance about such things. And a quick fix through robots.txt means that the security hole is still open to anyone who can install a Firefox plug-in.

    Shane Conder
  23. Characterizing those that, for very good reasons, oppose building Flash websites as ‘fundamentalists’ doesn’t help your credibility.

    comment
  24. wah wah wah. Google’s job in the search arena is to expose hard to find webpages, and now, flash files. It’s just a fact of web development that you *do* need to explicitly conceal private data and URLs. Non-flash developers have been dealing with this for ages, and it’s just a matter that Flash developers need to catch up.

    James
  25. Use robots.txt and stop whining!

    dmoz
  26. Is badly coded flash sites Google’s fault? Sorry if I sound elitist here, but exposing the url to the gateway where you get your data from really shouldn’t be a problem.

    bjorn
  27. I wonder that when you encrypt your .swf file, what google and yahoo would index.
    Would it still be possible for them to see links ? Even if they are encrypted ?
    Surely they cannot decrypt the code and place a correct link ?!

    Well i hope someone tries this out :)
    Maybe ill just do it :)

    Rackdoll
  28. I wonder how this Flash index information will be ranked in a search query. Metatags always were more relevant than the text in a HTML, depending on the search. I can also think about the usability of it. How this information will actually be relevant to the general search we make? I am a Flash developer, and never had a problem with Google not displaying my whole content. Metatags did the job quite well. Most people makes generic searches. People who are actually making a very strict search, as a large set of text, would know what exactly they are searching for and where to find it, they don’t actually need to google it, they only do it by laziness to actually browsing through the website. Design and develop well, focus on the user and nevermind the bollocks.

    Filipe Abreu
  29. robots.txt will solve all of your problems, you could also store the path as a var, or even take it further and break up the path and concat like they do in ad js. For example var url:String = ‘my’+’script’+’.’+'php’;

    Compiled flash is by far secure. There are many decompilers out there, links to external MP3s can be found out by using a simple http sniffer, and if you are linking to a page check for the existence of a session cookie via js before linking (simple external interface call), search engines still can’t run js (not sure about shared objects).

    Scott
  30. Well said. I read the first two paragraphs and was all geared up to writing a comment which said “this is happening because hiding ’secret’ URLs in your Flash files is wrong!” (which everyone has known for ages — I wrote about it in 2002* and I’m not even a Flash guy!)…and then read on to see you making precisely that point. I should have known better — sorry to doubt you, Aral :)

    sil

    * http://www.kryogenix.org/writings/tech/lame-flash-scores

    Stuart Langridge
  31. From the search engine optimization side of things, I can tell you that a lot of clients we work with are waiting with baited breath for this to be the answer to all their flash-indexing problems.

    From what I’m reading (but have yet to see in search results) the landing page experience for a Google user will not be too hot. Getting dropped into a text-pull from inside a Flash animation doesn’t do anyone any good. I’d much rather see Google do this only for Flash that allows parameter passing, so that they could deliver you to the Flash, with the correct state.

    @Filipe: People do conduct generic (unbranded) searches, but meta tags rarely do the trick on their own. It’s not that meta tags are somehow lacking – it’s just that there’s a lot of competition out there for most keyphrases. If you don’t have indexed page copy, it’s highly unlikely that you’ll rank for a high-volume keyphrase. That’s why this could (potentially) be a big deal (if they get the landing page experience right.)

    Sherwood
  32. Funny thing, but seems that Google got confused when crawling my Flash site.

    About 4 weeks ago I created a test website to see exactly how Google indexes a Flash site with contents coming from diferent sources:
    http://seo.matheusgorino.com/
    4 weeks later the only thing that Google has indexed is the initial string of my preloader TextField: “0″:
    http://www.google.com/search?q=site%3Aseo.matheusgorino.com

    I’m sure that Google bot has gone trough my website a lot o times because I’m getting emailed everytime he does it. Also, he has correctly crawled another test version with alternative content that I have created in the same day of the simple version:
    http://www.google.com/search?q=site%3Aseofull.matheusgorino.com

    Then today I acessed the Google Webmaster Tools and saw that Google is making a mistake when accessing my main.swf. He is trying to access it on http://seo.matheusgorino.com/http://seo.matheusgorino.com/main.swf witch is obviously wrong.
    My test website has a main_loader.swf embed on the html code that preloads and then add to stage the main.swf, which is my main application file.
    As you can see on http://matheusgorino.com/MainLoader.as the only thing that I’m doing is an innocent new URLRequest(”main.swf”)… nothing special then.

    The Google Webmaster Tools report screen is at:
    http://matheusgorino.com/crawl.gif

    Weird.

    Matheus
  33. It a nice site collecting all info about shopping goods.
    I need this info because i want to buy some home ware goods.
    Thanks

    Harry
  34. hey.. my name is also aral… i have never actually know the meaning… if u do i would be grateful if u could mail it to me.. thanks… aral lobo from india aralalobo@gmail.com

    Aral Lobo
  35. After looking at the results of Google’s efforts on some of the site’s I’ve developed over the last couple of months I really think that they are missing the boat here. And I’ve begun to use robots.txt, as some of the others have mentioned here, to politely ask Google et al. from spidering my Flash files.

    I for one honestly don’t believe that Google or any of the search engine’s should even be bothered with trying to index swf files. The answer to searchability for Flash….or at least for sites made entirely of Flash….has been around for a while it’s just poorly executed for the most part. That is to deliver all of the content that Flash receives simultaneously as HTML to the parent container. A combination of SWFObject and something like SWFAddress or StateManager can then be used to control the interaction between the Flash Player and the Browser. Yes, it’s more work, but in the long run it’s the only method that truly works. I’ve done it successfully and currently have a client enjoying very good rankings with a fully indexed and 100% Flash website that is CMS driven.

    I guess in the end my suggestion to Google would be to engage the community before deciding what the best approach to indexing Flash might be. Don’t just pass off a special version of the Flash player to the search engine’s and let the experimentation begin while forcing developers to try and play catch-up without being able to fully understand what’s going on.

    Cory Tomlinson
  36. The article and various comments were interesting. It is a shame that Google do not provide a choice for indexing or not indexing flash content. The problem is not so much in hiding secret content. The problem is that Google is proving direct access to the flash movies thus overridding any javascipt, counters, page alignment for centering flash movies etc. As google now provides direct access to swfs overridding html than we will no longr need to waist time on html compliance, nor will flash developpers need Microsoft. I wonder to what extent Google can legally provide a direct link to what a programmer may want to hide, as he may want to force the user to enter firstly by the home page.
    The question is, Is it right for Google to allow users to override a developpers home page, on the basis that the site contains only the index page and flash movie
    Robert.

    Robert Farley
  37. For those who may want to hide their swfs thus forcing user to enter through the index page i have posted a response on my website at
    http://www.farley-webdesign.com/flash_indexing.html which reads as under:
    FLASH INDEXING

    Following the recent indexing of flash movies by Google we now find that the Google provides a direct access to the swfs when the developper may want the users to enter through their home page such as index.html

    The advantage of entering through the index page is that it allows to provide for centering, adding counters and other javascript. If the user goes directly to the swf then the presentaton may not be what we intended, thus overriding all the html and javascript commands.

    To overcome this problem I have done the following:

    1 – Index.html is indexed, thus providing Google with the keywords we choose with no follow, thus the Google robot will not look any further than the index page.
    2 – Insert Javascipt to redirect page to what is now Main.html
    3 – Main.html is with meta tag, noindex, no follow
    4 – Embed swf in this new main.html
    5 – For existing websites it is better to rename swf to mySwfNew.swf
    6 – Create a new swf with the old swf name with only one line of actionscript redirecting to the main.html. Thus if user finds a link in Google to your old swf he will be redirected to you new main.html
    7 – All other swfs should be embedded into html with noindex and nofollow.

    This seems to work perfectly for those who do not want to index their swfs. Hope this may help others.
    Robert

    Robert Farley