What do you want to learn?
Leverged
jhuang@tampa.cgsinc.com
Skip to main content
Pluralsight uses cookies.Learn more about your privacy
Play by Play: JavaScript Security
by Troy Hunt and Aaron Powell
In this course, you’ll learn how to minimize the security risks that are present when working with Single-page Applications.
Start CourseBookmarkAdd to Channel
Table of contents
Description
Transcript
Exercise files
Discussion
Recommended
Course Overview
Course Overview
Hi everyone, I'm Troy Hunt, and I'm Aaron Powell, and welcome to our Play by Play on JavaScript security. I'm an independent security researcher and trainer, and I'm particularly interested in the impact of security on our web applications. And, of course, these days our web applications are increasingly dependent on JavaScript, so naturally I'm very interested in the impact of security on the way we manage our JavaScript-based applications. While Troy comes from that security researcher background, I'm very much a web developer, building single-page applications or server-side applications that have rich UIs that are building JavaScript using, insert JavaScript framework that's the flavor of the month at the moment, React, Angular, etc. and looking at how best we can build those sorts of applications so that we are taking security best practices into consideration. The last thing we want to do is end up on someone's Twitter feed being named in shame for having some poor security practices in our application. In this course, we're going to look at many different aspects of security as it relates to JavaScript, so, for example, how we manage tokens, which are an essential part of authentication in modern applications. We'll move onto session storage persistence, and how we actually remember identities between sessions, particularly when the browser unloads. We'll look at service workers, cross-origin resource sharing, and the Access-Control-Allow-Origin header. Moving on, we're going to spend a bunch of time looking at the challenges with third-party tool integrations, and we've got a great real-world example that happened recently, which really illustrates the challenge we have when we create dependencies on other parties. This is something that I think is very important, how we're depending on these external libraries. Now we're seeing more and more reliance on things that are coming from npm or other model providers, and we want to make sure that we have a good level of trust about those dependencies that we're taking, how we're making sure that if they do have vulnerabilities we're able to track those, view those reports with inside of our build or deploy process, get that feedback back to our team so that we can make decisions on whether or not they're the right thing to use. And we'll also move onto looking at the OAuth's top ten, incorporating client and server-side validation, plus a little bit of anti-forgery token and cross-site request forgery just for good measure. If you are doing anything with JavaScript these days, and frankly, if you're doing anything with the web you are probably doing something with JavaScript, this is going to be a really useful course for you to get a good understanding of some of the security implications of JavaScript in web applications. I hope you enjoy watching it.
Managing Auth Tokens
General Introduction
Hi, I'm Troy Hunt, and I'm Aaron Powell, and today we're going to be doing a Play by Play on JavaScript security. Yeah, as you know, we're building more and more modern web applications, single-page applications, stuff like that, so we're probably writing a lot more JavaScript than we used to, and well we should be doing that in a secure manner, otherwise, we're going to end up on someone's Twitter stream. Yes, we don't want that to happen. Now we did a Play by Play not so long ago about going beyond just the basics of Azure websites, that was great, we're obviously going to focus very much on the client-side this time. When you and I were preparing this we found there were just so many different angles to this, so where should we start? What's a good starting point? Well, I think that probably the best starting point is, if we're talking about doing something securely, is how you actually log someone into a website, and how we manage the information that we get back about that person and their identity. So obviously OAuth2 and OpenID Connect are probably the most common way if you're building a single-page application and you're going to be doing security. That gives you nice tokens that you can pass back and forth to your API, or that you send to the browser in different mechanisms, and that's stuff that you want to probably keep secure, beyond just sending it over HTTPS.
Introduction to Tokens
Alright, so I know you're going to do a lot demos, and I'm going to ask you a lot of questions as we go along, and then probably a little bit vice-versa as we get further through. Can you show us what you mean by this implementation of OAuth tokens, and then we'll have a look at how we're doing things like storing these tokens. Yeah, so I've got a really simple demo here. It's actually using an open source identity server, and it's got some basic interactions between some APIs and how you would log in and stuff like that. So I'll just jump over to my browser, and I'm just going to log into this simple application that we've got. We'll type in username. So this is actually IdentityServer, the product IdentityServer? Yeah, so what we've done is we started off at port 5003, and we've bounced across to port 5000, which is our IdentityServer, the thing that's acting as our intermediary for doing our OAuth and OpenID Connect, and this could be like an Azure B2C, or it could be Google OAuth connections, and you can see here I can do external login if I wanted, but I'm just going to stick with the one that's built into IdentityServer. So what you're really emulating here is that you're logging onto a totally different service which is doing your authentication, and then we're going to hop back to the original one and obviously persist some sort of authentication state. Yeah, so we'll get back a bunch of tokens that represent my allowance to access the stuff that we've got secured. Okay, cool, so you can see that now we're logged in, you can see I've got a bit of information about the person that I've logged in with, that's just really basic stuff, but if I hit this Call API button, obviously I get back that stuff about that current user. So this is just some claims information, so like I said, it's a really simple demo that we've got here. but I've got a token back, right, so I probably want to do something with that token so that if I was to reload the page I don't lose everything. Now, for my sake as well, and I honestly have not seen this demo before now, when you hit Call API you're effectively emulating what would happen if I had a web application and I needed to call an API, which probably wouldn't normally return this stuff, you'd return a list of products or something like that. Yeah, you'd normally be pulling something a bit more useful than just a bunch of claims flags for what I can and can't do on the website, but yeah, it is doing something. So let's open up the dev tools and have a look a bit at what it's doing. So we'll just come over to the Network tab and hit that Call API again. And you'll see that we've done a couple of requests because it's an OAuth-based call. We've had to do an options request just to make sure that we do have permission to access this, and that's just checking the headers that we're sending and things like that, but really the meat of it is going to be in this second request, which is actually going to get back the data. You can see there's the response that we got back that's on the screen. Now just to double-check these ports again. So we're on port 5003, which is effectively our web application, Yeah. And you have made a call to port 5001, which is sort of emulating, in this case, the authentication environment, which would be the separate website. Yeah, so 5001 is the backend for our system. So if we think of this in a single-page application, or maybe a microservices architecture, you've got APIs that are sitting on different servers, they might be subdomains, or something like that, but in this case I'm just using different ports because I only have one local host, which is my machine. And the really important part of this request, from a security standpoint at least, is this authorization header that we've got down here. It says Bearer, and then a whole bunch of garbled stuff, which is the token that represents my ability to access the API that we're calling. This is what I got back from my IdentityServer. So this is effectively bearer-token-based authorization. Yeah, exactly. So that token is going to be fairly useful. Like if I reload the page, you'll see that I'm still logged in. It's obviously done something to possess that. There's a bunch of ways that we can do that, though.
Using the OIDC Client Library to Manage Tokens
Okay, so traditionally in, say, a normal web style app, JavaScript and everything aside, you would put that probably in an auth token, which would sit in a cookie, if you didn't have cookies then maybe it would fall back to the URL, which is terrible for all sorts of other reasons, but I think you're going to probably go somewhere with this is not in a cookie, right? Yeah, so cookies are a great way for doing that because it gives you a bunch of stuff like your auto-expiry of that authorization token, because you set a cookie to expire, but it does have a bit of a downside, because if you want to do cookies nice and securely, or the things that the server cares about, you're going to turn them into HttpOnly cookies. And all of a sudden, the browser can't access that, so I can't use JavaScript to pull my API. Let's draw down on that. So when a cookie is flagged as HttpOnly it can't be accessed for a client script, so if it was like an auth token in a classic sort of web app unload the page sort of thing, that's fine because your JavaScript doesn't need to access it, but I think what you're saying here is that when you're calling these async services and you actually need to send this token, you have to be able to access it, which means it can't be HttpOnly, and the risk there is that then if you had a cross-site scripting vulnerability on your website it could get the token, and now we're sort of working across purposes, right? Yeah, exactly, so yeah, because we want to do a JavaScript Ajax call whatever it might be, we're going to need to access it, so cookies are probably not going to be the right way to store this. But, yeah, we want to store it in some way that gives us some of the niceties to it that we get with cookies, like that they can be expired, essentially. Now, we've got a couple of different mechanisms that we can use to do this. Browsers have built-in storage, and this library that we're using here, the OIDC client library that is part of IdentityServer, actually gives us a way of doing that. Now obviously I've hit refresh and it's done something, so it hasn't just put it in an in-memory JavaScript object. It's persisted it. Yeah, and what it's done with this one is it's persisted it in a type of storage called Session Storage. So if I come over to the Application tab in the Chrome DevTools here, I have Session Storage, and I have one here for the port and the host that I'm on. And I can have a look, and we'll see here that we've got oidc.user, which is just like a prefix they've put in there, and I have a whole bunch of stuff that's stored inside of that, which is really hard to read on the screen because it gets truncated. But because this is available, I can interact with it. So if I was to come to the object that is managing this sessionStorage, getItem, and that was, if we can remember what it was here, we'll just copy that string. There we are. That's our token that we had, and that's what we can send back and forth.
Session Storage Persistence
Okay, so tell us a little bit then about the persistence of Session Storage. I mean, how does it differ to a cookie in terms of how long it lasts fo., I assume it does not get sent automatically on every request like a cookie. No, so with Session Storage, it is entirely owned by that particular browser window or tab at the lifetime of that. So as soon as I close that tab in an in-private session, which I'm running here, it will kill the Session Storage. If I was running not in private, it would be for the lifetime of that window, so until I close the last running instance of Chrome or Edge or Firefox. So it's a little bit like a cookie expiring at the end of a session? Yeah, it's a little bit like a cookie expiring, but we can't control the time that it might expire. That's probably the only downside that we get with this versus a cookie. So if you wanted to have, I mean, I know every time I log into my bank, I log into my bank, I do some stuff, I go and get a cup of coffee, I come back, I'm logged out already, because very often that cookie just expires after a very, very short period of inactivity. So you're saying here that there's no native construct to expire the Session Storage, but you would implement that in other ways. Yeah, you would implement that in other ways. Like I said, Session Storage will terminate once you've closed all browser types and all windows, or you then have to build something really custom that sits over the top of that. You sort of timestamp against the thing that you've put in Session Storage, you check is that timestamp less than that current timestamp, stuff like that. So that's obviously a really nice way that you can persist something for that session. But if you close this tab, you have another problem. We mentioned before we're talking about HttOnly cookies, and we said we flag HttpOnly cookies because we want to be, or we flag cookies as HttpOnly because we want to be resilient to cross-site scripting. If I can get cross-site scripting on your page, can I access Session Storage? Yeah, it is definitely a risk that you run with this kind of a storage model, is that this is just a global object that you've got access to, and any script that is running in context of your browser is also able to access Session Storage, so it does have a bit of a security risk that you've opened yourself up to when you're doing that. So I think this is interesting, because we're going to keep coming back to these points where there's trade-offs. So on the upside, it doesn't get sent with every request like a cookie, on the downside, we can't protect it from client script like we can a cookie, but you kind of don't want to anyway because you've got to accept for a client script so that you can attach that to the bearer token and send it off to the API. Yeah. It's all tradeoffs. Yeah, so it's what's going to be the least risk for the kind of thing that you're doing. But, as I said before, the big downside is if you close this you're going to be fully logged out, so if I was to come back I'd have to go through the login again, and that might be valuable for your banking scenario.
Remembering Identities Between Sessions
So let me give you a question, so on a website where you login and it has a little box and it says check this to remember me, how do you remember yourself in a Session Storage sort of model? So, well you actually wouldn't use Session Storage for that, we would use a different storage mechanism, which is called localStorage. So localStorage is very similar to sessionStorage. It's actually the same API, so I can do getItem, setItem, etc., but the difference is that localStorage is there for, well, basically the lifetime of your browser, until someone goes in and uses those deep, dark settings that people never actually go to, where it's clear all my data that I've stored about a web page. So we can put stuff in there, and it will last for a really long period of time. So that's how you can do persistence across logging in, logging out, or, sorry, opening and closing browser windows, not really logging in and logging out. You kind of want to _____ at the end anyway. So that has persistence in a more long-term fashion, a little bit like a cookie, but a cookie would have that persistence by virtue of a long expiration date, but localStorage has no expiration date at all, it's up to you to programmatically decide when you're going to take that out of the storage. Yeah, exactly, and that's something that you've got to obviously think about, you want to make sure that you're not leaving someone permanently signed in when they've hit the sign-out button. Gotcha. Okay, cool. So I think that's a neat interest. So we've got the two constructs, localStorage and, of course, sessionStorage before that. We've got the unload situation. Does that sort of cover us for the auth token bit? Yeah, I think that's probably a good wrap-up of how you can manage an auth token. Obviously, cookies are great if you don't need to access them client-side. If you do need to access them client-side, sessionStorage gives you that auto-expiry, but if you want it longer localStorage is your friend then.
Caching Strings and Service Workers
Defining Service Workers
Okay, so that was a good start. We have covered off auth tokens, we started with cookies, we've looked at sessionStorage, localStorage, let's move on and talk a bit about caching things in the browser and server workers. So where do you want to start there? Yeah, so service workers are obviously starting to get a lot more popular. They're appearing across all of the browsers these days, you know, Edge, Chrome, Firefox, iOS, and Android, they're all getting service workers so that we can do these sexy, new progressive web applications that everyone is talking about. But they do introduce a really interesting thing around how we manage data, because a bunch of what we're doing is we're storing data so that we can have these offline experiences. So to take one step back, do you want to define service workers? So put this in a context. What are we talking about here? Yeah, so a service worker is essentially something that's going to run in the background of a web application, and it will continue to run even when you don't have a browser open, so it's just continual background processing. It's also used so that you can do things like intercept network requests, and maybe proxy them, so if you're offline we can send back some data that you've previously cached, and that's where we can start finding some interesting challenges when we're looking at it from a security standpoint.
Service Workers and Intercepts
Alright, so this leads us to the caching bit. Alright, cool, so what are you going to show us with service workers? Right, so I've taken a demo that we had in that first module around managing our tokens and things like that, and changed it so that we actually have a service worker running. So all I've done is I've added this serviceworker.js file to my application, and it's just basically going to start up a really simple example of how to use a service worker. So it's going to intercept the fetch event. So this is the event that's happening when you're doing a network hold. So you might be calling out to an API, you might just be loading another resource which is inside of your website, an image or another HTML page, but that gives us the ability to catch that request and then do some logic with that. So you see here that I'm tracking anything that's going through to our API backend, which is that port 5001/identity, which is the API we're calling, and if we do that we're going to respond with data from cache and then update that data again so that we have the most recent version of the data in our cache for the next time we call it. So you see what we're doing here is we try and call it, we go, well, stopping you actually doing the network request I'm going to respond with what you've previously requested, and then I'm going to get the data back. Okay, so security, got a question for you. So you have added an EventListener which is looking for a request to localhost port 5001. I take it the browser security model will not let you add a service worker that listens to requests on some totally random other host name, right? You can actually intercept any request that's going to any other domain, you just get limited stuff that can be done with it. So let's say I wanted to intercept any request to Google resources, I want to intercept the Google analytics tracking codes, so I want to put them, and I just don't want to have to serve them out every single time serve them from cache. I can do that, but I can't touch anything about that. I can't change the way the request would have happened. So putting my evil hat on, because this is a fun bit, so giving that you're the one writing the service worker in the scope of your app, but you're saying that there are certain things you can't intercept with other requests, how evil can you be? Oh, you can stop requests to some of the domains happening because you're intercepting anything that's happening over the network from your web application. So it gives you the ability to say hang on a second, I might be detecting something that looks a little bit suspect. I ought to just stop that. That's an interesting use case of service workers. Okay, so I want to drill down on this a little bit, because, again, my black hat bit is now stating to go this is kind of cool. When you say you can intercept requests that might go to another host, so say it's Facebook. If you're writing an app, and you're writing an app in this case on localhost port 5001, what can you do with requests to Facebook? So, I can only intercept requests to Facebook that have been made by my web application. So if I have an Ajax call that's happening from my web application, I can intercept that one, but I can't, if you're on facebook.com, I can't touch those directly. So that's what I was hoping to get to, because what we're saying is that the sandbox that we're playing in is really just those requests that are initiated from the scope of your site. It's not like there's another tab over here and it's open on Facebook and now you can get in there and start messing with the traffic. No, you can't play with anything else that's happening with inside of your browser. It's only stuff that comes from your application as it is. Alright, okay, good, I'm glad you found that out. You define a scope when you define your service worker, so it could be a part of your application, so from a subfolder down, it could be from the whole domain, it could be on a subdomain, and stuff like that, so you can have different service workers on different subdomains and stuff like that. Okay, cool, but you're ultimately just messing with what is there on your own site anyway. Yeah, so the way that works is if we scroll down a little bit we have this caches global variable or global object that we can work with inside of the browser, and that's similar to the localStorage and sessionStorage in that it's always available. But I can open a cache there, based off a cache key, they're defined up top as a constant, and then add and remove things from that, and here's that fromCache method that we're using up above. So we'll try and find it, if it finds it, it returns it, otherwise, it's going to return a failing promise. So, how does that work? Let's jump over to our browser, and we'll hit Login and go through again. Okay, cool. So we've logged in, and now we'll Call API, and it's actually going to immediately fail. You see here I've got a failing promise no match. That's because I've previously not cached this data. Now if you remember from our service worker, it's going to immediately respond with what's in cache. It's not in cache yet, though. Okay, but there's nothing there. So it's going to return an error, but in the background it's also going to have fetched that. And to be clear, what's going to go in the cache is the response from that request, right, so that when you replay that later on it'll go, hey, we've already got it, here it is. Exactly. So if we come over to our Application tab, we'll see that we've got Cache down here, API-CACHE, and you'll see that that's on the domain and port that we are accessing. So host 5003. That's where our application is running, and that's our sandbox, so you can see that's the domain. Right, and I guess to the effect of the discussion earlier on, it's not like you're going to be able to go and access something from a completely different host. No, no, you're not. Then you can see here we have that API that we tired to call, and the response that came back. You can see that we know that it was an application.json response, you see the time that it was cached, and you can see the information here. We can even also see the headers that were sent in that request and the response that came back. Alright, so it's cached the entire thing. And in terms of that time that it was cached, is there any default expression or? No, again, like the other storage objects it doesn't actually expire, it's kind of as long as your service worker is running and things like that. But based on the time, we can see how stale it is as well. Yeah, we can. We can definitely see how stale something is. And now when we Call API again, it's really fast because we've hit cache, and we've pulled it out, and we've shown that to the user, and then it's actually updated in the background, so that Time Cached will have changed because of the new request that's being done. Okay, so while we're here, I'm looking at these requests headers, and of course all of this you're saying is being cached that was part of what went into the cache storage we see here. The authorization header is there, so our bearer token there, so that's the one that it would have previously picked up probably from sessionStorage. Correct, yeah, so that's the one that we have that represents us. Okay, cool. Tell us about Access-Control-Allow-Origin.
CORS and Access Control Allow Origin
So the Access-Control-Allow-Origin is something that we use for CORS to indicate where a request can be initiated from to access an API that's on another domain. CORS being Cross Origin Resource Sharing, so to try and avoid this situation where we might be able to scrape something off somewhere else under someone else's identity. Yeah. So this is saying that this request can be initiated from localhost 5003. Now the request has gone to 5001, 5001 said it's okay for it to come from 5003, but from nowhere else. Correct. So, let's just try one other angle on this, because I keep coming back to this XSS thing, and XSS is just such a vicious attack relating to JavaScript, because very often it's people running JavaScript on your site trying to access things. What can I as an attacker do, if I can get XSS on your site, what can I do with your cache storage? So, like session and localStorage, cache can be accessed by anything that's also running on your page context because it's a global object. So including an attacker script? Including an attacker script, yes. Now it does have a little bit more of a challenge because you've got to know, obviously, the name of that cache, so you need to know that this is called API-CACHE, but that's a fairly minor. It's not really security by security to a great degree. But you can discover that, right, if you use the app, because you look in your DevTools. Exactly, but probably the more important thing about how the cache API works, is that it's only available with inside of a secure context, so that means anything that's running over HTTPS, or anything that's running on localhost. Localhost is also treated as a secure context by the browser. That's why I've got it here. Alright, cool. So I've got two interesting questions to follow up from that. So, number one is, and maybe it's just more of an observation as well, there's a bunch of features that browsers have been killing off when they don't load over HTTPS. So this is obviously one of those things, where look if you want to use this feature you've got to operate securely, and I would imagine that that is simply because if you don't do this over HTTPS, you are exposing something that is potentially more sensitive and the browsers are trying to keep this protected. Yep, exactly. The second part of that is, is if I'm an attacker and I can just get XSS in your HTTPS page it's still game over. It is, yeah, so it's still not a perfect solution, obviously we can't lock it down completely, but it does give you that little bit more security. So if someone was to inject something that was not running over HTTPS and, you know, result in mix mode, you're outside of the secure context, so you've got a little bit of a small safe net there. And I guess if I'm an attacker and I'm running script on your page, and I'm extracting it from cache storage, no one else sitting in the middle can see the data as I, as the attacker, get it out of your system. Yep, exactly, yeah. I think that's a minor, minor consolation, but okay, fair enough. Yeah.
Third-party Library Vulnerabilities
Defining the Problem of Third-party Tool Integration
Let's change pace a little bit, and we had planned this out, and we were going to talk about very important, but rather run-of-the-mill sort of things, and then literally yesterday, something, I don't want say something awesome, well, something very poignant in terms of demonstrating the point happened, and I think we should talk about that before we then go on and talk about exactly what we had planned. Yeah, okay. Alright, so, now I wrote a blog post yesterday, and this blog post is about the JavaScript Supply Chain Paradox: SRI, CSP, and Trust in Third Party Libraries, and I want to explain what has actually happened here for everyone else's sake, because I think this will become one of the watershed moments in the industry where we go, maybe we should do this just a little bit differently. So, what happened is, I got up early yesterday morning, and many things happened during the night while I was asleep in Australia. Same for you. We've both got this problem, right? We wake up and the rest of the world has done stuff. And what the rest of the world had done in this case is that thousands and thousands of websites around the world, particularly government websites, had suddenly had cryptominers run on them. So, as in like bitcoin and cryptocurrency sort of stuff? Yeah, they usually mine Monero coin, those particular ones, in fact there is a service that you can embed on your site called coinhive, and this is like a way of monetizing customers. So you might go, I don't want to have ads, but I'm going to put this script on my page, and when someone comes to my site it's going to use up a whole bunch of their CPU cycles, it's going to mine Monero coin, it's going to send it to my wallet, and I get a fraction of a cent for every person that comes to my site. So like having ads on your website? Yes, except with other questionable sort of ethics, so we're not going to go into that here, let's just talk about the mechanics of it. So, what actually happened is we saw a whole bunch of sites looking like this. And this is a tweet from Scott Helm, he's another security researcher who does a lot of work around this area, and he put out a bunch of tweets about different sites that had been hit by this particular coinhive cryptominer. Right. And I thought we'd pick one here that's come from Australia. This is from my local state government, Queensland Government. And what we're seeing here is their website, Scott's popped open the DevTools, and in the console here we see a reference to coinhive.com. And what's happened here is we literally have the coinhive cryptomining script running not just on this site, but on the U.S. court site, on the ICO website in the U.K., the Information Commissioner's Office in the U.K., and just thousands and thousands of sites around the world. Okay, yeah, that doesn't sound good. It doesn't sound good, right? So despite the news headlines around the world saying these sites had been hacked, it wasn't so much a vulnerability with the sites themselves, so no one had broken into the Queensland Government site, for example, but what we're seeing here is that all of these sites have these upstream dependencies where they are embedding a JavaScript library into their site off another service. Right, okay, so it wasn't necessarily them directly, but how would you prevent against that kind of thing? Okay, let's not get ahead of ourselves. I want to show you what it looked like, I know where you're going with this. Now, let's talk about why they were doing this, because many sites do this, they will say I am going to have an external dependency that I embed into my site. The external dependency was a service called browsealoud. Now browsealoud is a tool for accessibility. It's made by a company called Texthelp. Look, the way this is embedded is the way so many different libraries are embedded in websites, which is, in this case, that they create this service, it's a service that you sign up for, you have a piece of JavaScript that you put in your website, so it's literally like script source browsealoud.com, or whatever it is, and their script is loaded dynamically in real-time off their site and embedded in yours, and there's thousands and thousands of sites around the world that do this. So it's essentially like pulling something in from a CDN, which I'm sure is not uncommon to most people. Exactly, it's very, very much like that. And, in fact, we can see what their script looked like, because Scott managed to snap a pic of this whilst it was in its malicious state. So what actually happened is this whole browsealoud script looked like what we just see here. And if I scroll down a little bit, we can see that at the top we've got what looks to be a degree of obfuscation, so line 5 on this paste is quite a bunch of obfuscated characters. Everything beneath that line 7 down is the legitimate content. And what we actually worked out here is that this browsealoud script sitting on their service somewhere, or sitting on their server, as it may be, had been modified, and everything on line 5 had been added to it, not by them, but by some malicious external third party. Right, I mean, that definitely doesn't look like something that any sane developer has written. Well, once you de-obfuscate it, it becomes quite a bit clearer. So this is the de-obfuscated script, and what we're seeing here is that they're actually dynamically embedding this script with a reference back to coinhive. Coinhive gets embedded in the site, it then goes through and it has a key in here, which is effectively the wallet that we're going to mine the coins into, and then every single person who went to any one of those thousands of sites, many of them government, loaded this into their browser and started mining cryptocoin. Wow, that doesn't sound like a good thing to have happen. Do we know anything about how someone might have achieved something like this? Well, we're only, literally at the time of recording we're about a day and a half into this, so it's still really, really new news. At a guess, I would say that someone managed to gain access to whatever storage construct it is that has this file into it. Now, this file then gets distributed out via a CDN, so inevitably they got in there, they made a change, cache flushed, it went through to all the edge nodes that serve it from the CDN, and now every single website embedding it has got cryptominer. And this then leads us to the discussion about the pipeline, because effectively you're saying that you have this supply chain of code coming from external parties, it's automatically included into your site, good luck. Yeah, well, I suppose that, and that's kind of a real concern. There was an article that was going around late last year that someone wrote about how they had created essentially a malicious npm module that they had then managed to shoehorn into a couple of large, popular open source modules, and essentially was trying to scrape credit card details when people went to web pages. Now it turned out that this article that the person had written was actually just a thought experiment, this is something they could have done, but they were kind of illustrating a problem of just blindly taking in dependencies without really understanding what they are, you know, that supply chain problem that you mentioned there. So the post you just mentioned was actually really, really important, so I've just popped this open here in a browser window. And like you said, this was a thought experiment, so it's not something that they actually did, but I would really encourage people to go and read this because it's an epically good piece of writing. And what this guy speaks about here as well is he says look, it's not just like pulling something in from the supply chain. And in this case, he was talking about npm, so that's something that you would've included and then published yourself to your service as opposed to sort of dynamically including on page _____. Yeah, true. But what was really, really fascinating about this is he said look, I will do things like, I'm not actually going to run my malicious piece of script, which was a key logger that would get credit cards and everything, I'm not going to run it all the time. I'm going to run it like 10% of the time. And once it's run on a client I'm not going to run it again. And what he was talking about here is techniques to try and make it hard to reproduce. Now, when you compare that to what happened yesterday with the coinhive stuff, it was on every single site. Identifying this was absolutely dead simple. There was no attempt whatsoever, other than that basic bit of obfuscation which Scott deobfuscated immediately and no problems there, there was no attempt to try and hide it. It wasn't like, look, we're only going to execute it on certain occasions, we're not going to pop anything up in the DevTools, so frankly, I reckon we dodged a bullet. I reckon we got off very, very lightly from this, but it now makes us think about how we're going to stop this from happening in the future.
Protecting Yourself from Third-party Vulnerabilities
So that sounds like a really big problem. Question then, how can we go about trying to prevent this? It's a little bit tricky in a couple of ways, because what you're sort of saying is, I want to embed something from a third party, now I think we need to break this into two categories as well, because something like browsealoud is a service. So you pull in their script, there's a bunch of other things that then happen in the background as well. It embeds toolbars, it can do things like text to speech. It's very similar, as well, to something like discuss. So I run Disqus, the commenting engine on my blog. I have one line of script that says, just go and get me Disqus, and then they do a bunch of stuff in the background, so there's that use case. There's also the use case which I think we need to separate, which is what you mentioned before about CDNs. So, for example, on a site like Have I Been Pwned, I pull in jquery from Cloudflare CDN, and I reference it very, very explicitly. In fact, let me show you what it looks like. I've got Have I Been Pwned open here, and I use jquery to do a bunch of very typical sort of things like orchestrate API calls and things like that. If we jump down and have a look at the source code, and I'm going to jump to the end of this page, we can see down here around line 364 I'm pulling in jquery from cloudflare, and there's a couple of things I want to point out here. So number one is I'm referencing an explicit version. So this is version 2.2.4 of jquery, the minified version. Now here is where this becomes different to Disqus and browsealoud. This version will never change. It's always going to be exactly that. If they add features and they release an update later on, it's going to be a new version. They're not going to change the one that's there. That is very different to, say, pulling in Disqus. Let's talk about Disqus because most people are familiar with that. That is like here's a script tag, you go away and give me the service, and you can do whatever you want within that service. But, of course, the risk we have here is what we just saw with the coinhive and the browsealoud stuff, which is that anyone who controls the script can do whatever they want. Now we'll come back to that in a moment. The defense that I've now got here is you'll see that I've got an integrity attribute off the end of that, and it starts with sha384, and then we've got a great big hash after that. So this is SRI, sub resource integrity, and what we're doing here is saying that the sha384 hash of that script, so jquery version 2.2.4, minified version, it is exactly that hash. And when my browser goes and pulls that script down and it hashes it with sha384, if they don't match, the browser doesn't run it. Right, cool, so that probably wouldn't be ideal for something like Disqus where it's doing something a lot more complex. It's great in this case where you're talking about CDNs. Yeah, and here's the rub, because in this particular situation this works beautifully, this allows you to use a CDN and have confidence in the integrity of the file. It's a lot harder with something like Disqus, but we have another way of dealing with it, which is to use a content security policy, or CSP, because a CSP then allows us to say things like, ah, okay, this website is allowed to embed a script from Disqus, and it can embed images from Disqus, and that is it. And then if someone manages to modify that script and put a request to go in and grab something from coinhive, the CSP hasn't actually whitelisted the coinhive address. Right, okay, well so that helps us so we can kind of save ourselves in both scenarios. Yeah, look, I mean we're not going to drill into SRI and CSP here, but that is the defense against this. And if you don't have SRI an CSP and you're pulling stuff in from other locations, you need to seriously reconsider that because we have just seen the perfect storm demo of how it goes wrong.
NPM Package Manager Vulnerabilities
Now I know that you want to talk about some npm stuff, and some package stuff, and some dependency tracking things, so what are we going to look at on that side, because that kind of, it's a little bit tangential to this, although it's still about how do you trust external dependencies? Yeah, so obviously what you've been talking about there where you're pulling something in from a CDN is great for that kind of stuff where it's easy to grab from there, but more and more we're just doing an npm install and downloading several terabytes worth of JavaScript files just so we can have our Hello World application. But have you actually gone through all of the dependencies that you're pulling down? You might start with React, but then React is going to depend on stuff that depends on stuff, that depends on stuff. And have you gone that many levels deep to just know what they are? Going back to the article about credit card mining. That was taking about how you could inject something and just through the npm dependency chain people have no idea these scripts even existed. Yeah, the paradox here, and I've just been having this argument on Twitter, people on Twitter love to argue, the paradox is that people are using things like npm, or pulling in something like browsealoud or Disqus, because the whole thing is built, right, like this is the fast track to building your things, but what we're also saying is what happens when you can't trust those things. Yeah, exactly, not to say that you should go write everything yourself. Well, there's the paradox isn't it, because we keep telling people, particularly in the security industry, don't write things yourself. Go and use proven things that have been tried and tested, and in many cases audited as well. But then we're also saying that could actually be modified, and you could have a problem from that. And, you know, actually we don't have a good answer to this in some ways. No, there's no real good answer. It's security, you can't save yourself from every scenario.
Site Linting with Sonarwhal
But you can do little things that can help give you protection. I want to talk about a particular service and tool that I use on some projects, particularly when we've got those external dependencies that we're really needing to be concerned about, or at least be very security conscious with, and that's a tool called sonarwhal. So, it's an open source library. It is distributed via npm, so everything I said about just blindly installing from npm, ignore for the moment, because blindly install this from npm, or you can use the hosted version. Sonarwhal is a linter for websites. You're probably familiar with linters if you've been using like ASLint to make sure your JavaScript is hitting certain code standards, or that you're programming in a consistent way across multiple members of your team. But this takes kind of a different view of the problem of not just the code that you've written, but the way the application you're building will run. So rather than looking at just the code, it looks at how you're hosting that application, the dependencies that you might be using within side of your application, so stuff like that. So, I've got the sonarwhal website up here, and I can chuck in a URL in here. So we'll use your haveibeenpwned. So this is effectively going to be dynamic analysis. So it's going to make requests against a live-running application, and come back and tell us a bunch of a things about it? Correct. So I just hit Scan Now, and this will pop it into a queue that's going to be scanning, and that will take a few moments, so we'll come back to that scan when it's done. And instead I wanted to have a look at some of the rules that are available, particularly the rules that are important to what we're talking about here, which is the security-centric rules. So we've got a whole bunch of different ones here around how you can check the headers that are being sent back and forth or how you're specifying the URLs that you're depending on, your CDN URLs there. Remember from when you showed source you were using a protocol relative URL so you didn't have that HTTP: or HTTPS: at the start. But the really interesting one that we can look at is this no-vulnerable-javascript-libraries. The way this one works is it uses a service from snyk, s-n-y-k, which is a vulnerability tracking database across node.js modules, and npm, and then a bunch of other non-JavaScript programming languages. So if we open that up, we can have a look at what kinds of things that that would be tracking. And we'll just scope this down to npm, but you can obviously see all the different sorts of languages and sources that we could be using. And we've got NuGet and things like that in there as well. And if we scroll down, you'll see that there's a lot of different things that it's tracking, and a lot of different types of issues that you could be detecting. Things you might not have even thought about. Let me check something here because particularly when we say things like package managers which could integrate on the server side, not necessarily exposed on the client side, but you've just run, effectively, a dynamic analysis test that just looks at what's exposed publicly, is this the model where you're actually going to have to run it against a code base as opposed to a live-running site in order to identify a lot of this? Yeah, so if you're doing things like bundling and minifying your external dependencies you're going to want to run it locally as well, and look at all the things that you're depending on with inside of your npm supply chain, basically, but it can also look at the stuff that you're referencing. So are you referencing something from a CDN that has a known vulnerability in it? So that's another good way to look at it. But I guess to go back to --- I'll give you an example. I mean let's say I built an ASP.NET app, I'm using NuGet, I've pulled in an HTML parser that's got a vulnerability that there's no public visibility of the HTML parser, so this is only going to work if I can run against my code base and it can look at the packages that I'm using server side, right? Yeah, so sonar is only capable of understanding JavaScript. Okay right, so this won't pick up a lot of the server-side stuff. So it won't pick up most of your server-side stuff. It can be plugged in to look at things like Express, because there's vulnerabilities in Express that have been tracked and mitigated and resolved, but if you're looking at, say, NuGet packages, you're probably going to look at something that's going to bundle into maybe the way that you're doing an npm install, or maybe into your build process that looks at vulnerabilities relative to the packages that you're depending on. But we'll focus on the ones that are going to be served out to the public internet. And, yeah, as you can see there's lots of different things here. We can see that there's things that have got cross-site scripting vulnerabilities. Bootstrap had one in there for a while, and you can see the versions that were affected by that version, anything below 3.4, and then a bit, a few versions above that. And this also tracks when they were resolved, and the pull requests that might have fixed it and things like that. We should actually define that for mine because I think some people probably look at this and they go how can a JavaScript library have an XSS vulnerability? But look, ultimately, JavaScript libraries are very often emitting content to the DOM, right? I mean they're going to have to take untrusted input, which could have potentially malicious content in it, and if they're not outputting code in it correctly, then there is your vulnerability in a client-side library. Exactly, and that's, you know, someone like Bootstrap, that's what they do, they emit HTML. So something like Bootstrap, if you're using that within side of your website, you probably want to know if it's got an XSS vulnerability within it.
Dealing with Typosquatting (And Sonarwhal Followup)
Another thing that's really common that people don't pick up on immediately is this notion of type _____. And you can see that the first couple of malicious package that are marked here for CoffeeScript. There's a lot of CoffeeScript there, but it's not all CoffeeScript is it? Exactly, if you actually look at the spelling of those, the first one is missing an e, the second one is missing an e but has a hyphen in it, the next one is missing an f, variations of that. So these are malicious packages masquerading as another package. So someone has created these packages, they have whacked them into npm, they're hoping that people will come along and grab them not realizing that there's a typo? Correct, yeah, that's exactly the idea, and now they could be injecting a bitcoin miner or they could be doing something a little bit less malicious, or maybe less financially beneficial to them. But here, that's the kind of things that it's really easy to just miss a letter and npm install hasn't failed, they just carry on with your daily work. So, out of curiosity, I'm familiar with Retire.js, how does this sort of map to Retire.js. So it's very similar to Retire.js. Retire.js has a separate vulnerability tracking database that it works with, but it's also only specific for tracking vulnerabilities. Yeah. And it's, like this one it's community managed, so it's only getting updates as people are providing it. The reason that I was looking at sonar, is it also looks at a couple of other security vectors, and it also can do things across your HTML and your CSS and things like that. Now our report should be done, so let's just jump back and see how good this little website is actually tracking. So it looks like haveibeenpwned has completed its run. and looks like it's fared reasonably well. Yeah, I saw 108 errors, and I was thinking something different to reasonably well, but _____. Oh, look, we've got some accessibility warnings, we've got some performance issues, but that's probably just my connectivity. We are in Australia after all, and I think the carrier pigeon might have died outside halfway through the report run. I think you're being very kind, but let's go on. So we'll jump down to Security because that's really the one that we're mot interested in. As I said, we've got a couple of different things here like protocol-relative-urls. Actually, let's drill down on this, because we saw that on my Cloudflare embed script before. Why doesn't it like protocol-relative-urls? So, we'll just open it up, here's your cloudflare and things like that. Now, if we jump over to the documentation, and it'll give us a reason as to why it's saying that a protocol-relative-url might not be ideal. The reasoning that it's got is essentially that some really old browsers might not deal so well, older IEs, the ones that it's obviously talking about. But also, if you're using something that's on a CDN, the CDN is only ever going to serve it over HTTPS, but you've loaded this out on HTTP, so you actually end up with a redirect for that CDN resource. Now if you're on like an HttpsOnly website and you're serving out so it cannot be hit via HTTP, it's probably less of a problem. So it's the kind of thing that you'll look at and it gives you some information, but you make a decision about is that really relevant for your application. I think that's really important, because there are a bunch of online services that are very good at giving you a rating, you know, you got a B, or an F, or an A+, or whatever it may be, and having a bunch of findings, and I see people getting infatuated with I've got to get an A or an A+, so I've got to get them to 0, and you're going, yeah, but do you actually understand what the problem is, because what I find interesting with this, the support for protocol-relative-urls, like if you're on IE7 my site's not going to work. You have bigger issues than just _____ securely, so that's not a problem. I serve everything over HTTPS, and I've got an upgrade and secure request CSP, which means that even if I didn't try and serve it over HTTPS, something embedded in there, the request would be upgraded, so I'm not going to have any redirects. I will admittedly say, there is actually no point in me having a protocol-relative-url because I'm never going to serve it over HTTP anyway. So I reckon I probably should change it to HTTPS, and I know that I've just said that in order to make that report look better, because it will not actually have any practical value whatsoever. But I reckon, like the broader picture out of stuff like this is that if you understand why it says that, you understand why you're doing it, you understand the ups and downs of it, I'm actually okay with whichever position you take, if it's a conscious decision. Yeah, and that's a very valid point. A site like this, or a service like this doesn't understand the context of how your website is actually run, so you've got to take these things with a grain of salt about how it's relative to the problem that you're trying to solve. Having said that, by the time this goes to air, if you look at the source code at haveibeenpwned.com, it's probably going to have HTTPS's in there. Well, can you afford the _____ hit of that extra six characters? I know gzip is going to compress those extra characters. Okay, so what else did it find, because I am curious now. So, we'll just close off that one, and let's have a look at the no-vulnerable-javascript-libraries. So you've got 2 errors there. This is curious now. Okay, so what it said is that you're using a version of Bootstrap that has a medium vulnerability in it, and similarly jQuery. It's got a couple of low-to-medium vulnerabilities. Now, again, that's something that you take into consideration about what you're using that dependency for, and then this has got the information through to that particular vulnerability so you can understand more about it and make that decision is it really going to impact my site. If it's an XSS vulnerability, well maybe that's, I'm actually not using the feature on Bootstrap that's exposed to that vulnerability, you know, or whatever it might be. So, again, you can look at it and go, these are the things that it's telling me things that I should be aware about, but maybe they're not impactful for my use case. So I'm going to offline later on drill down on those, because I am actually kind of curious now. What I do know is that I'm running the latest and greatest version of Bootstrap and jQuery within that major version, within Bootstrap 3 and jQuery 2. To upgrade, I've got to go to a new major version that's got totally different implementation details as well, so I'm sort of in that space now where it's like there's a security thing, and of course we've got to see how bad they actually are, and the fact that it actually takes a bunch of work to move that forward. Yeah, and again, it's like all static analysis reporting, you need to look at it and go, what is the cost benefit of trying to get that A+ certification from my report. I'm curious, one more thing before we move on, strict-transport-security, why do I have four errors for that, because I know my HSTS. Well, it looks like we've got a couple of. That's not me. But this is still stuff that you're externally depending on. So what it's doing is it's obviously tracking through all of your dependencies and the libraries that you are also depending on and anything that they are pulling in. Now, yeah, this is stuff that you might not be doing. And we've been running the online version of sonarwhal and it's got a bunch of default configurations that it's going to hit. Now, you can actually install this as a node module and run this locally as well and tweak those settings. So you could say, well, don't tell me about the Google stuff, because I've got no control over the resources that google.com is serving me. And maybe the protocol-relative-urls are not something I'm interested in, so I'm going to turn off that rule. I think that's another one of those sort of practical things, because I'm seeing google, and I'm seeing newrelic, and I'm seeing google analytics, and in each one of those cases there, and that last one is also newrelic, so two newrelic hosts, two google hosts. In each one of those cases, I'm embedding this content over HTTPS from my website, like there's no way to man-in-the-middle that and intercept that request. I imagine each one of those providers still wanted to make the content available over HTTP in case someone doesn't want to request it over HTTPS, which frankly is kind of a little bit pointless now, because HTTPS, particularly if HTTP2 is so fast and efficient anyway, but like you said, this is them, this is outside of my control. And I'm not just trying to be defensive, this is one of those areas where I think you've got to look at those reports, and again, sort of understand what they mean and consider are they actually realistic, are they practical, is this something I should worry about? Yeah, exactly. And like I said, this is also available as a node module, so you can plug this into a build pipeline so you can get that static analysis, and you can also tweak the settings to what are going to be right for your environment. Tell it to ignore domains that are outside of your core domain, or maybe there's a specific CDN that you're also managing, or I only want to look at dependencies that are coming in from there, I don't care about the ones that are coming in from google. Cut the noise out. Exactly, because, like you said, they're things you can't control, they're things that probably Google wants to serve over HTTP and HTTPS. But you know I could have that just as my continuous part of my build and release cycle of, hey, maybe I've brought in a dependency on a new npm module that that one actually has a vulnerability in there that I didn't know about, or it has dependencies on dependencies on vulnerabilities.
Working with the OWASP Top Ten
Alright, so there's one last thing I think we should refer to in this whole thing about dependencies, and it's the OAuth's top ten, and I've got the OAuth's top ten, and the top ten, of course, is the top ten web application security risks. I've got them open on my machine here, and as we're talking about this, I just remembered that we do have an item in the top ten that speaks specifically about this. And, in fact, if we scroll down, and this is the 2017 edition, it's the latest one at the time of recording, A9 Using Components with Known Vulnerabilities. And I'm going to skip down all the way to where that's described, and this would be something really, really worthwhile looking into for folks that maybe have not thought about this too much before. So the OAuth top ten is sort of the canonical resource for the ten most critical web application risks that you should be paying attention to. Yep. They recognize this, they've got it there in their list, and they talk about things like, they talk about retire.js actually just here, which we mentioned before, they talk about dependency check tools, and they're saying you should have a process. And the thing that struck me yesterday with that whole coinhive situation and the external dependencies is that so many organizations just don't have a process for safely embedding things, and that was the issue with the coinhive stuff, and also maintaining their versions, as well, which sort of speaks to the bits that you were just talking about.
Client-side Validation and Controls
Incorporating Client and Server-side Validation
Okay, so we have done auth tokens, we have done service workers and cache, we have just gone through and done a whole bunch of dependencies and the things that go wrong there. I thought maybe the last thing to do is to talk a little bit about sort of client-side validation and client-side controls versus server-side controls, and I think we might end this in a kind of philosophical way, which I think is important. I thought I'd just start here. I shared a tweet only a few days ago at the time of recording, about something I saw that I thought was kind of hilarious and inevitably many other people have thought it was kind of hilarious as well. And we won't actually name the site, because it is a little bit embarrassing when you actually look at it. This is email validation. You tell me, what is wrong with this picture? Well, their allowing you to paste, so that's a good start. Okay, there is something that is not wrong, alright that's good. But it looks like they're trying to stop certain mail hosts being used as the email address when you provide the email for whatever the purpose they need it for. Yes, yes, that was a very charitable way, I think, of putting it. So this is a company who has decided that they don't want people to sign up with email addresses on certain hosts, and in fact, what actually happens if I just move this away so we can show the message behind there, it says we accept only business email addresses, no free or ISP email addresses. So, in this particular case this company is trying to get you to sign up for something, and they inevitably then want to sell you something. And they're clearly trying to make sure that people just don't go and sign up and get their thing for free, like they want to be able to monetize you and have a good sense that you are someone who is going to be able to pay money. Now where do you think this goes wrong? And there are multiple answers here. Well, first off, if you work for Facebook you have a business email address, so their validation message is going to be really confusing. But I think it goes a bit further than that. We've got a regex here that we're running in the browser that's saying this is what you're allowed to do, but that makes us think that we're treating the browser as a trusted source of information. And to be clear, the regex is actually saying this is what you're not allowed to do, right? So and I want to come back to your last point because I think that's kind of curious, but it's saying, you can't have Hotmail, Gmail, google mail, etc., etc. Some of these things I think are a bit of a call back to the past. No Juno. Does anyone still have juno? I was looking at that an aol mail. AOL? So you kind of need to ask, what's curious about this, though, is the regex has got two really critical flaws. So one is that you'll notice that when we go through this list here one of these banned host names is mail. If you have mail anywhere in the name of your domain, you actually can't use this because there's a partial match. So you could have mail.mycompany.com, you can't use it, and that could be a legitimate company email address. So that's a little bit worrying. But the other thing is that this is a case sensitive regex. Yeah. So if you didn't like it and you had Mail@company.com, then you'd be straight in there. Yeah, if you want to shout at it, you're going to get in. Well, exactly, you just go all uppercase and it's job done. Now, the point that you made, and this kind of brings us on track to the purpose of this module, is that this is client-side validation. And there is nothing wrong with client-side validation. It responds very, very quickly, it saves you making requests, it can give you feedback as you enter information into a site before submitting it to the server. This particular example has no corresponding server-side validation. That, I can see that becoming a problem. Right, so you with your black hat on again, what might you do here? Well, knowing that I can obviously circumvent this by using a capital M in mail means that I can sign up for something, but kind of extrapolating a bit from just this one particular use case of an email validation, but if the server is not really validating appropriately, are they even expecting a valid email address, where could we go from there? You could strip this. Because this is the thing. When I first saw this I thought, wow, this is really curious. I wonder if I went into the element inspector, right, and I just literally stripped this attribute out of the DOM, what would happen? And, of course, it went straight through. So I think that sort of the macro picture we're talking about here is that you need to have a corresponding server-side validation for every piece of client-side validation. And we're talking about this in a JavaScript course because a lot of people are putting a lot of logic in JavaScript, increasingly we're pushing logic there. Yep, exactly. When we're talking about single-page applications in particular, there's a tendency that we write all of our code in JavaScript, and then that's just bundled, minified, and sent out to the browser. And, well, yeah, if we're making assumptions about what the server expects, but the server is no longer validating those assumptions, the server is not checking to see if it's not a Hotmail address, or if it was something that was less conspicuous than an email address, we could start changing the way the behavior works because the server just doesn't care. So I suspect that the philosophical part of this, which I mentioned before, is that when a lot of people are building apps they're like, I have created this app. It has client bits and server bits, and they will all work together in this way, and they will never work together any different way, and they miss the fact that you can decompose these bits and strip out the server-side requests from all the client-side validation. And they're not expecting people like us that are going to open up the DevTools and put a breakpoint in and go, oh, let's tweak some of these values.
Why You Need Server-side Validation
So let's do something similar to that just to sort of illustrate the point. And I've got this website open, hackyourselffirst.troyhunt.com. This is a website that I've used on a heap of my Pluralsight courses before. You can go to this website and hack it and not go to jail, which is really useful. So it's got all sorts of vulnerabilities, SQL injection, cross-site scripting, all this sort of stuff, as well as an implementation of precisely what we just discussed. And I thought I'd show you what that looks like. So I'm logged in this site. The site is designed for you to go and rate super cars. You can rate on the ones you like, and then it ranks them. Now, let's go ahead and drill down into this particular car, the GT-R, and we see a green message to the left of the screen. It says you've already voted for the GT-R, so you can't vote again. Now, let's go to the home page again. I'm going to pick another car, we'll pick, say, McLaren down here, there's one McLaren. I've already voted for that, too. I've got to pick one I haven't voted for. So I might go and grab the Pagani. Alright, so I can now vote for this car, but before I vote for it, I'm just going to grab this in another tab and do this, paste it over there like that. So now I've got two tabs open, both on the same car. I'm going to go to the first tab, and I'm going to make a vote, and I'm going to call this, say, Vote 1, we'll give it a comment Vote 1. We'll vote for that guy like that. Yep, fine. Scroll down to the bottom, Vote 1 is in there. Now, it says Thank you for voting, I can't vote again, right? Unless, and I know this sounds really fundamentally basic, but this is the sort of stuff that goes on in web apps. I go to this tab here, which is still showing the button. The vote button is still there. The vote button is still there, Vote 2. Vote, the Vote button disappear. Scroll down to the bottom, Vote 2 is in there, but if I give this page a refresh, Vote 1 and Vote 2. Ah, so it hasn't done that server-side validation. Yeah, exactly right. So what's happened here is we've security trimmed it. So we've said, alright, under this circumstance there should no longer be a Vote button because you've already voted it. And you can imagine that the, let's say the less sophisticated developer building this game works his design. Alright, the spec says vote once, can't vote again. I show this a lot in the workshops I run, and I talk about particularly testers. Like if a tester had a test script, and they literally just followed through the test script, they would go this works fine, but they would miss this entirely. Now, there's another interesting thing that happens here, and I might show you this one too. If I jump over to the leader board, and we'll go and grab something else I might not have voted on. Let's go to the Koenigsegg there. And then I open up my DevTools into my Network tab, and let's have a look at what happens when I vote on this one. So we'll give that a little quick comment up there, Nice, like so. Let's actually scroll down to where we can see the button. We'll vote. Watch that request go through, drill down, and now let's go to the bottom and actually have look at what got sent in that post request. Now what might you as a slightly curious person do with that? Well, there's obviously a couple of IDs there. What would happen if I was to change the user ID? Can I vote on behalf of someone else? Yeah, so that's where your problem is, you can. And, again, I think people will look at this and they'll go, who would do this, but there are so many examples of this happening. So, again, someone who built this would put it together and go, well, this is the way I've constructed it. It will always work this way because this is the way the website is designed. And they miss the fact that we could always go and grab that request. We could go to Postman or a Fiddler, or we could wget or curl it and recreate this request, because what's actually happening here is it is taking this data. It's not actually looking at the auth cookie, which is up here, we've got this great big value here, cryptographically signed value which can't be tampered with, which uniquely identifies the user. It is persisted via cookie though, which goes back to our earlier discussion. This is what we want to use to identify people. I mean, we could take this on the server-side, grab that, tie it back to an identity and go that's the person who voted. That is tamperproof. This is not tamperproof. So that's sort of a very traditional API call. Now, here's a good question for you. Is this potentially at risk of a cross-site request forgery attack, a CSRF attack? Well, yeah, you did mention that you could use curl to invoke this before. So, yeah, this is something that definitely you could be having a cross-site request forgery attack vector, because it's not really doing any kind of validation to make sure that the page that created it is the one that's submitting it.
Using Anti-forgery Tokens for API Calls
So, I think that this sort of leads us in multiple interesting angles. So, you know, we traditionally think of CSRF attacks happening on full page unloads or full page, particularly page post backs, because often post is actually changing data. We think about it happening under verified identities because you're trying to trick someone into making your request under an identity, and we have anti-forgery tokens, so it leads to an interesting discussion, which is, in a case like this, this here is an API. so this is JavaScript orchestrating a call to an API. Do we use anti-forgery tokens in API calls? We can. So there's definitely ways that we can approach that with big popular libraries like Angular. Because angular has got some native constructs for appending an anti-forgery token. Yeah, they do, and essentially that just adds it to it, but if we're talking about throwing it across domains, or you've got microservices and those are the APIs, then that starts becoming a lot more challenging because it is actually coming from another domain, it's not coming from the page that originated it, so you've got to be able to test that back and forth between multiple web servers to make sure that any request forgery token that you send is from the origin that was expected, and it is actually a valid one to be provided. So often when I ask this question, I say, do we need an anti-forgery token in API, and we also take a step back and we talk about what the risk is, and the risk is that auth tokens in cookies get automatically sent with requests, so if you can trick someone into making a request to an endpoint, and there's authentication persistence via an auth token in a cookie, then the cookie goes automatically. And then I say to people, do you need this in an API, and when I use the word cookie and API together sometimes people kind of lose their minds. What is it, tell us about the REST thing and the philosophy there. Well, I suppose because cookies can be used to try and give you a bit of state in your application, it helps you tie back to things that you've sent back and forth, persisted. HTTP is a stateless protocol, after all, but if we want some way to track it, you know, cookies are a good way to do that. But REST, and particularly if you're going die-hard pure REST, you are not getting any kind of state, so sending cookies doesn't even have any kind of value, because when we start really expanding out and doing microservices or even a serverless API architecture that's running on a server, because serverless runs on servers still, we have no state, we have no ability to persist state, because you might have come in to the microservice1 the first time around, but then it's running inside of a cluster of ten, and the next time you're coming to a different one, and a different one, and they've got no knowledge that each one exists, so we can't really store and persist data across them. So that's the answer I normally get, and apparently you call people like this, you call them a RESTafarian, right? And I think you used the word die-hard, like the die-hard REST people. It's like, if you use cookies it's not RESTful and you shouldn't do that because of state and all the rest of it. And I honestly think that's an argument I just never wanted to get into because people get so passionate about it. Exactly, particularly when you get those hardcore REST people that whether you're using things correctly, it's like, I just want to be able to end some data really. Apologies for any hardcore REST people. I like falling back to the position of whether you use cookies in any sort of API call or not, that there's a philosophical thing. The basic mechanics of it are that if you do use cookies for an HTTP request to persist state, well now you're into a space where you do need anti-forgery protection, because this is the risk. Now, of course, if you use bearer tokens you don't have the same risk, because you have got to manually construct that request to get the bearer token added on, and an attacker's malicious page which is constructing that request, shouldn't have access to it, because it would be in, say, your localStorage or your sessionStoarge, which is outside, or rather within the sandbox of the site, and the attacker is outside of that. Yeah, it makes it a little bit harder for sure.
Using Anti-forgery Tokens for a Get Request
Alright, so here's another good one. This is probably my final question. Would you need an anti-forgery token for a get request to an API? No, generally not, because if you're doing a get request correctly. Here's the rub. Yeah, because a get should not be changing any state on your server. So when I have an endpoint which is like /widgets/5/delete, not get? That probably shouldn't be a get request, because a get should just be giving you data from your service, it shouldn't be, to use analogies, it should be immutable and not changing server state. It shouldn't hit anything other than a select statement from a database or something like that. So, you generally shouldn't need an anti-forgery request token in that scenario, because it's about validating that you are who you say you are, and were allowed to send that data. So I get what you're say8ing, and I think the discussion there is the correct semantic use of verbs in terms of what you're actually doing with the resource. And I almost sort of fall back again to that same position where we go look, there is a whole discussion here, and I get why they say this, however, if it is a request that you would not want someone to forge, regardless of whether it is a put, delete, whatever it's going to be, then you need that token again. And the only other sort of bit that I come back to sometimes is what if there's a get request which, let's just say it retrieves an entity, but in the process of retrieval it logs that activity. and you wouldn't want someone to accidentally be tricked into having that activity logged against their identity? Yeah, so if you're starting to do auditing and audit trails, then you've got to think about how you're validating the creation of said audit trail. What is the mechanism that ties it back to the person that made that request? So I think where we're sort of ending there is that if you don't want a request to be potentially forged, and you are using cookies for any sort of persistence, regardless of why, good or bad or otherwise, then you're back in that anti-forgery space. So I think the lesson there is that you've got to consider those anti-forgery tokens. So I think where we're sort of ending now as we wrap up the course is that so much of what we've spoken about is about consciously thinking about how the different constructs work, so how do cookies work versus the local and the sessionStorage constructs? If you are making requests that use any sort of cookie for authentication, and you're orchestrating it from JavaScript, well, you know, maybe there's going to be a CSRF issue there as well. Yeah, we also talked a bit about the ways you can store data in the browser and the challenges and the risk that you can do by having that. And the more data you're storing in the browser, the more that you can be potentially exposing, particularly if you start having dependencies that you're taking in and you're not really understanding those dependencies or validating those dependencies as securely as you can be. And, you know, that's a good point too, because a lot of what we touched on with the dependency bit, and even things like the way you secure the APIs, I mean they're not necessarily issues with JavaScript per se, right, that's like the ecosystem that you work within. And I think maybe the note to wrap that up on is that you've got to have a good understanding of that broader ecosystem, not just that one little bit of the code you're working on. Yeah, you can't just survive on knowing how to write Angular, and build a secure web application. Right, good point. Alright, I think that's a good note to wrap it up on, and thank you very much everyone for watching. Thanks.
Course authors
Troy Hunt
Aaron Powell
Course info
LevelBeginner
Rating
(36)
My rating
Duration1h 13m
Released2 May 2018
Share course