Content sniffing and JSON

I've recently had the misfortune to have to work out how to fix an XSS attack in an existing application. The XSS flaw in question was related to the ability to ask for responses of a specific Content-Type when making requests to a JSON based API. This opened up a really obvious, with hindsight, attack vector, and cased me to spend several days in the hateful world of browser content type sniffing.

Setting the scene

So, first off, why did the implementer (actually, err, I think it was also me) think that the api client needed the ability to specify the response content type? The api includes a method which takes a multipart form to allow file uploading. This allows a simple browser based api client to perform file uploads using a form that posts to an iframe. The iframe can be hidden from the user, and the client javascript can read the returned json out of the iframe as long as it respects the single origin policy. This is quite lame, but appears to provide the desired functionality. Unfortunately it doesn't work in IE, and some Firefox versions: they trigger a file download dialog, as they don't recgonise application/json as a content type they can render. This led to the "great" idea of adding the ability to ask for responses as another mime type, and, somewhat unfortunately, the choice of "text/html" as the favoured alterntive mime type.

The big obvious problem: if an attacker can control the content of a response from the api, they can cause a victim's browser to perform a request which will have HTML in the response, and, using the mis-feature, a text/html response content type. The user's browser will happily parse the HTML, and it's trivial (with a little work to avoid escaping of quotes and slashes) to cause the excution of arbitrary Javascript. Yay! XSS!

The failure of text/plain

So, fixing this is trivial, right? Why not use text/plain rather than text/html? Well it turns out that life isn't as simple as that. For historical reasons IE and Safari will content sniff when faced with text/plain.

You can turn this off with IE8+ using the "X-Content-Type-Options: nosniff" header, but this doesn't help you with IE7 users, of which there are still far too many. IE7 does special case text/plain at least, and only sniffs the first 256 bytes. Inserting 256 bytes of whitespace into the start of the response should, in theory, allow you to safely use text/plain with IE7.

For Safari (not sure what versions) you need to be careful about what's in the URL. Thanks to PATH_INFO it's probably possible to do your api call and sneak a .html onto the end of the path being requested. If that happens Safari will decide that it's probably html.

OK, that's horrible. If I'm to preserve the current semantics of the API then I'm going to need to use text/plain, send a nosniff X-Content-Type-Options, pad with 256 bytes of leading white-space, and make sure PATH_INFO is empty. That should deal with the handful of common browsers that I know are broken. This solution seems truly awful, and relies on me being correct about how content sniffing is triggered in certain browsers. It's definitely not a general solution.

Making text/html safe

Fortunately the only clients using the alternative content type legitimately are internal applications. This allows a something of a lame hack fix. If we still permit a text/html response, can we make it safe? Sure, yes, just send actual HTML. The JSON response can be html-entitied and stuck in the response. This wont be misinterpreted by the browser. The client then needs to read that encoded json, and parse it, which should be possible in a safe manner (I think.)

As long as we don't permit any other content types other than text/html (with the hack), and application/json (with the raw json), we should be OK.

Ideally I'd replace the upload call with one that returns a simple, non-user-controlled, ID, which is safe as text/plain in any browser. Then there'd be a second, json returning, method which takes the ID and provides the actual information. The multipart form posting to an iframe would still work, and the second call wouldn't involve a file upload so doesn't need the iframe. Unfortunately I'm not able to make such a significant change to the API method.

What about application/json?

This got me thinking about this whole class of problems. Are there browsers out there than when faced with an unknown mime-type will try to sniff the content? Certainly it looks like this is permitted behaviour! If so then such a browser faced with application/json might well try to sniff the content type. We're back to our XSS attack problem again. The best answer I've seen for this is to use an attachment content disposition, which means the browser will download the file in the situations when content sniffing would be a problem. When the file is accessed indirectly (XmlHttpRequest, src attribute, etc) the browser ignores the content disposition, and will behave sanely. You should also use "X-Content-Type-Options: nosniff" everywhere, since there's absolutely no excuse for relying on content type sniffing!


  • Don't use text/html unless what you're returning really is html and can be safely rendered by a browser, with no unintended side effects.
  • When using text/plain consider that some browsers will treat it as if it's text/html
  • Use "X-Content-Type-Options: nosniff" everywhere. It's only understood by IE8+ though.
  • When using application/json, or any other content-type a browser might not understand, set a content dispossition of attachment.
  • While I've not covered it here, always specify a charset on your content-type!
  • Read "The Tangled Web" by Michal Zalewski (ISBN 9781593273880)

This is yet another complex problem caused by people following the, hopefully now somewhat discredited, mantra that you should "be lenient in what you accept..." when writing Internet applications. Trying to guess what was really intended, in an attempt to be helpful in the face of common misconfiguration (e.g. text/plain being sent when text/html was intended), creates all sorts of difficult to predict, and extremely undesirable side effects.

I'm indebted to the co-worker who mused that perhaps the ability to select content types was a terrible idea, leading to me spending a very hungover hour or so writing a PoC to show that he was indeed correct. Much of the voyage of discovery leading to the eventual fix should be credited to "The Tangled Web" by Michal Zalewski (ISBN 9781593273880), which repeatedly destroyed any hope I had for my early attempts to solve the problem.