Week 7
Data security: validation, sanitization
Quiz 6: Data models 15 mins
There will be a quiz today. It will be worth 2% of your final grade.
# Agenda
- AMA (10 mins)
- Quiz (10 mins)
- Better Debugging (20 mins)
- Break (10 mins)
- Input Sanitization (30 mins)
- Data Validation (30 mins)
- Break (10 mins)
- ES Modules (20 mins)
- Lab Time (30 mins)
# Better Debugging
Before we dig into recommended practices for guarding against data corruption and scripting attacks through input data validation and sanitization, let's add another tool to our development toolbox.
The debug (opens new window) NPM module is a better console.log
that you can turn on and off with an environment variable instead of commenting out the debugging statements in your code. It is used internally by all of the modules in the Express framework, and we can use it in our modules as well.
Install debug as a project dependency.
npm install debug
Make it available in app.js
and set the namespace to the application name.
Set the primary namespace
The primary debug namespace should match the name
property in package.json
.
'use strict'
const debug = require('debug')('week7')
const express = require('express')
require('./startup/database')() // IIFE
const app = express()
app.use(express.json())
const port = process.env.PORT || 3030
app.listen(port, () => debug(`Express is listening on port ${port} ...`))
To keep the main entry module for our project (app.js) as clean as possible. It is a good practice to move one-time set-up activities, like connecting to the database, into separate modules in a /startup
or a /bootstrap
folder.
Put your Mongoose connection set-up code from week 6 into a new module called /startup/database.js
.
Then at the top of that file, require the debug module setting the namespace to week7:db
and change any console.log()
statements to debug()
statements. The completed code should look like this.
const debug = require('debug')('week7:db')
const mongoose = require('mongoose')
module.exports = () => {
mongoose
.connect(`mongodb://localhost:27017/mad9124`, {
useNewUrlParser: true,
useCreateIndex: true,
useFindAndModify: false,
useUnifiedTopology: true
})
.then(() => {
debug(`Connected to MongoDB ...`)
})
.catch(err => {
debug(`Error connecting to MongoDB ...`, err.message)
process.exit(1)
})
}
# Environment Variables
Remember
The debug module suppresses output by default.
If you run the app now with node app.js
it will not output anything to the console. You need to activate it by setting the DEBUG
environment variable to our application namespace before running the application.
DEBUG=week7 node app.js
Hmmm ... still not quite right. We only saw the debug message from app.js, not from the database connection module. Debug allows us to be very selective about which module's messages we want to see. If we run it again with ...
DEBUG=week7:db node app.js
... now we only see the database connection message.
To see all related namespaces we can use the *
wildcard character.
DEBUG=week7* node app.js
Now you will see any debug output where the namespace starts with 'week7'.
# NPM Scripts
The application start-up command is starting to get long. What if we want to add some other environment variables like NODE_ENV
or PORT
?
DEBUG=week7* NODE_ENV=dev PORT=3000 node app.js
NPM has a solution to make this easier. In the package.json
file there is a scripts
option. We can set different start-up instructions for different environments. By default, it looks like this.
"scripts": {
"test": "echo \"Error: no test specified\" && exit 1"
},
Add a start
script for production and a dev
script to use with our local development environment.
"scripts": {
"start": "NODE_ENV=production PORT=433 node app.js",
"dev": "DEBUG=week7* NODE_ENV=dev PORT=3000 nodemon app.js",
"test": "echo \"Error: no test specified\" && exit 1"
},
Now you can start your development server by running ...
npm run dev
Or you can start up in production mode with npm start
.
TIP
We have more configuration work to do before we are ready for production mode. We will cover that in a few weeks.
# Sanitization
The very nature of our application is to take user input and act on it in some way, either to store information or searching and filtering information from the database. The data stored is ultimately returned to the client application.
This means that a web service application is inherently vulnerable to a variety of malicious attacks.
We cannot trust any data coming from the client application. It might not be our code. It may have been hacked. A malicious actor may be using it to directly input data formatted to cause our system to malfunction.
The Open Web Application Security Project (OWASP) (opens new window) is an excellent resource for staying up to date on application security risks and best practices. Among their many resources, the annual OWASP Top 10 Critical Web Application Security Risks (opens new window) should be on your reading list.
Directly relevant to our task is the OWASP Node Goat Tutorial (opens new window) that provides examples of the Top 10 in a Node.js application environment.
Let's look at solving the two most damaging attack types: Cross-site scripting (XSS) and Database Injection. To do that, we need to filter and cleanse all incoming data.
# XSS Protection
To help protect against Cross-site scripting (XSS) attacks (opens new window) you need to strip out any HTML or script tags from the input values so that they would not be interpreted as scripts or alter the browser rendering of any returned data. You can use the xss (opens new window) NPM package to solve this problem.
npm install xss
Build a small middleware function to implement this behaviour. Create a new file in the middleware
folder called sanitizeBody.js
. Require the debug
and xss
modules at the top. Then add an empty middleware function signature. (We will fill in the logic in the next step.)
const debug = require('debug')('sanitize:body')
const xss = require('xss')
module.exports = (req, res, next) => {
// sanitization logic goes here
next()
}
Remember
It is a good practice not to take destructive action on the original request body. So, make a copy to work with and set the resulting sanitized version as a new property of the request object that downstream route handlers can access.
Start by stripping out any id
, or _id
properties. We never want those.
const {id, _id, ...attributes} = req.body
The xss
module applies its filter rules to strings. To apply it to the various properties of our user supplied data, we will need to loop over the members of the req.body
object with a for...in (opens new window) loop.
The xss
function takes an optional configuration object to customize how it works.
for (let key in attributes) {
attributes[key] = xss(attributes[key], {
whiteList: [], // empty, means filter out all tags
stripIgnoreTag: true, // filter out all HTML not in the whitelist
stripIgnoreTagBody: ['script']
// the script tag is a special case, we need
// to filter out its content
})
}
Lastly set the modified attributes
as the value of a new req.sanitizedBody
property and then call next()
. The whole thing should look similar to this.
const debug = require('debug')('sanitize:body')
const xss = require('xss')
module.exports = (req, res, next) => {
debug({body: req.body})
const {id, _id, ...attributes} = req.body
debug({attributes})
for (let key in attributes) {
attributes[key] = xss(attributes[key], {
whiteList: [],
stripIgnoreTag: true,
stripIgnoreTagBody: ['script']
})
}
debug({sanitizedBody: attributes})
req.sanitizedBody = attributes
next()
}
TIP
Notice that I added some debug statements just to help verify that everything is working. To see these print out, modify the dev
script's debug environment variable in the package.json
file to DEBUG=week7*,sanitize*
OK, we have our body sanitizer middleware function. How do we use it?
Remember, Express route method declarations can take more than one callback function. This lets us call one or more middleware functions directly for any given route.
app.post('/test', middleware, routeHandler)
So, in our /routes/cars.js
module, we could do this ...
const sanitizeBody = require('./middleware/sanitizeBody')
const Car = require('./models/Car')
// ... other set-up
router.post('/', sanitizeBody, async (req, res) => {
try {
const newCar = new Car(req.sanitizedBody)
await newCar.save()
res.send({data: newCar})
} catch (err) {
errorHandlerFunction(err)
}
})
You can now be reasonably confident that only plain text strings will be passed to Mongoose for validation. Potentially malicious HTML and JavaScript will be removed from input strings.
You can test it with Postman.
Yay! It worked!
But wait. What if the payload attributes are not simple strings? We need to refactor the sanitizeBody middleware to call itself recursively for more complex data structures.
# Recursion
From The Modern JavaScript Tutorial (opens new window)
When a function solves a task, in the process it can call many other functions. A partial case of this is when a function calls itself. That’s called recursion.
Start by creating a new function called stripTags
. It should take a single argument - let's call it payload. Now cut the for...in
loop from the main function and paste it into the new one. As a best practice, we should not mutate the original payload object that is passed in, so let's make a copy of that with the line let attributes = {...payload}
. Don't forget to return the sanitized attributes
at the end of this new function.
Then call the new stripTags
function from within the primary function. The refactored middleware should now look like this.
const debug = require('debug')('sanitize:body')
const xss = require('xss')
const stripTags = payload => {
let attributes = {...payload}
for (let key in attributes) {
attributes[key] = xss(attributes[key], {
whiteList: [],
stripIgnoreTag: true,
stripIgnoreTagBody: ['script']
})
}
return attributes
}
module.exports = (req, res, next) => {
debug({body: req.body})
const {id, _id, ...attributes} = req.body
debug({attributes})
const sanitizedBody = stripTags(attributes)
debug({sanitizedBody: sanitizedBody})
req.sanitizedBody = sanitizedBody
next()
}
This version should be functionally equivalent to what we had before. Test it with Postman to be sure that it is working.
OK now we can augment the stripTags
function to check for objects and then call itself to loop over that nested object and sanitize it's properties. Wrap the logic inside the for..in
loop in an if/else
block.
for (let key in attributes) {
if (attributes[key] instanceof Object) {
attributes[key] = stripTags(attributes[key])
} else {
attributes[key] = xss(attributes[key], {
whiteList: [],
stripIgnoreTag: true,
stripIgnoreTagBody: ['script']
})
}
}
Test that with Postman ...
Great! Now let's make sure that we handle Arrays properly. Since arrays inherit from Object in JavaScript, we need to check for that case first. Then instead of just recursively calling stripTags
on the array, we need to use the .map()
method to loop over the array.
Each element could be another complex object or a simple string. We need to check and handle both cases. If it is a string we can call xss()
otherwise call stripTags()
again.
Rather than duplicate the code for the xss()
function call, let's extract that to a separate function in our module.
const sanitize = sourceString => {
return xss(sourceString, {
whiteList: [],
stripIgnoreTag: true,
stripIgnoreTagBody: ['script']
})
}
Then the conditional block in the stripTags()
function becomes ...
if (attributes[key] instanceof Array) {
attributes[key] = attributes[key].map(element => {
return typeof element === 'string' ? sanitize(element) : stripTags(element)
})
} else if (attributes[key] instanceof Object) {
attributes[key] = stripTags(attributes[key])
} else {
attributes[key] = sanitize(attributes[key])
}
TIP
The return statement of the map()
method above is using JavaScript's ternary operator (opens new window), rather than a more verbose if/else block.
Writing out that single ternary expression the long way would look something like this ...
let cleanedElement
if (typeof element === 'string') {
cleanedElement = sanitize(element)
} else {
cleanedElement = stripTags(element)
}
return cleanedElement
OK. Test that with Postman ...
Phew!
The final version of the sanitizeBody.js
middleware module should look similar to this (with no debug statements).
const xss = require('xss')
const sanitize = sourceString => {
return xss(sourceString, {
whiteList: [],
stripIgnoreTag: true,
stripIgnoreTagBody: ['script']
})
}
const stripTags = payload => {
let attributes = { ...payload } // don't mutate the source data
for (let key in attributes) {
if (attributes[key] instanceof Array) {
attributes[key] = attributes[key].map(element => {
return typeof element === 'string'
? sanitize(element) // if true
: stripTags(element) // if false
})
} else if (attributes[key] instanceof Object) {
attributes[key] = stripTags(attributes[key])
} else {
attributes[key] = sanitize(attributes[key])
}
}
return attributes
}
module.exports = (req, res, next) => {
const { id, _id, ...attributes } = req.body
const sanitizedBody = stripTags(attributes)
req.sanitizedBody = sanitizedBody
next()
}
OK. Now we can be reasonably sure that all HTML tags and scripts will be removed from all req.body
properties, no matter how deeply they are buried.
# Database Injection
Using Mongoose which enforces schema validation and attempts type coercion to ensure that only data in the correct format is stored, goes a long way to protecting MongoDB from various attacks. However, there are still some cases that are not covered.
Read A NoSQL Injection Primer (with Mongo) (opens new window) to learn more about this kind of vulnerability.
To help protect against database injection attacks, we will use the Express Mongoose Sanitize (opens new window) NPM package.
From the docs ...
# What?
This module searches for any keys in objects that begin with a $ sign or contain a ., from req.body, req.query or req.params. It can then either:
- completely remove these keys and associated data from the object, or
- replace the prohibited characters with another allowed character.
The behaviour is governed by the passed option, replaceWith. Set this option to have the sanitizer replace the prohibited characters with the character passed in.
# Why?
Object keys starting with a $ or containing a . are reserved for use by MongoDB as operators. Without this sanitization, malicious users could send an object containing a $ operator, or including a ., which could change the context of a database operation. Most notorious is the $where operator, which can execute arbitrary JavaScript on the database.
The best way to prevent this is to sanitize the received data, and remove any offending keys, or replace the characters with a 'safe' one.
# Let's install it
npm install express-mongo-sanitize
# Let's implement it
We could implement this middleware on a route by route basis. e.g.
router.post('/:id', sanitizeMongo(), (req, res) => {})
Or at the router level ...
app.use('/api/cars', sanitizeMongo(), carsRouter)
Or at the application level ...
app.use(sanitizeMongo())
Let's choose the application level -- set it and forget it!
Don't forget
You need to require the module before you can use it.
const sanitizeMongo = require('express-mongo-sanitize')
# ES Modules in Node.js
Up until now, all of the code examples that we have looked at have been using the traditional (for node) CommonJS module syntax using require()
and module.exports
. When Node.js was created, this module system was developed to fill a gap in the core JavaScript language – which did not have a module system.
Since mid-2018 all major browsers have had support for the new official ECMAScript module standard. Node.js has had experimental support for ES Modules for a couple of years, but as of v14.x which became the LTS (long term support) version in October 2020, Node.js now has official production support for ES Modules. Since this is now the officially supported JavaScript module syntax you should expect to see teams preferring to develop new applications using this syntax rather than the legacy CommonJS syntax.
ES modules: A cartoon deep-dive
Here is a great primer on JavaScript modules (opens new window), how they work and what problems they solve.
So, let's convert the simple Node/Express application that we have built in this lesson to use the newer ESM (ECMAScript Modules) import
and export
syntax.
Instead of using the const
keyword to declare a new variable and then assign its value with the result of the require()
function, the ES Module syntax uses the import
keyword to declare the variable and the from
keyword to describe the module name or path.
# server.js
# CommonJS
const http = require('http')
const app = require('./app')
const createDebug = require('debug')
const debug = createDebug('week7:httpServer')
const httpServer = http.createServer(app)
const port = process.env.PORT || 3030
httpServer.listen(port, () => {
debug(`HTTP server is listening on port ${httpServer.address().port}`)
})
# ES Modules
import http from 'http'
import app from './app.js'
import createDebug from 'debug'
const debug = createDebug('week7:httpServer')
const httpServer = http.createServer(app)
const port = process.env.PORT || 3030
httpServer.listen(port, () => {
debug(`HTTP server is listening on port ${httpServer.address().port}`)
})
WARNING
Notice that unlike CommonJS modules, ES Modules do not assume the .js
file extension. You must explicitly include it in the from
module path.
# app.js
Here again you need to make the same substitutions for importing modules at the top of the file. But, this time you also need to update the export syntax.
CommonJS has an exports
key on the module
object. We can assign any valid JavaScript data type (typically an object or a function) to that key to make it available to import in another module.
ES Modules uses the export
keyword in front of any variable or function declaration to make that variable or function available as a named export (opens new window). You can also use the export default
keywords together to define the default export (opens new window). That is the method that we will use here.
# CommonJS
const morgan = require('morgan')
const express = require('express')
const sanitizeMongo = require('express-mongo-sanitize')
const sanitizeBody = require('./middleware/sanitizeBody')
require('./startup/connectDatabase')()
const app = express()
app.use(morgan('tiny'))
app.use(express.json())
app.use(sanitizeMongo())
// routes
app.get('/', (req, res) => res.send('Hello'))
app.post('/test', sanitizeBody, (req, res) => {
res.status(201).send(req.sanitizedBody)
})
module.exports = app
# ES Modules
import morgan from 'morgan'
import express from 'express'
import sanitizeMongo from 'express-mongo-sanitize'
import sanitizeBody from './middleware/sanitizeBody.js'
import connectDatabase from './startup/connectDatabase.js'
connectDatabase()
const app = express()
app.use(morgan('tiny'))
app.use(express.json())
app.use(sanitizeMongo())
// routes
app.get('/', (req, res) => res.send('Hello'))
app.post('/test', sanitizeBody, (req, res) => {
res.status(201).send(req.sanitizedBody)
})
export default app
WARNING
Notice that the technique of immediately invoking the connectDatabase()
function with the CommonJS require statement doesn't work with ES Modules.
# connectDatabase.js
# CommonJS
const debug = require('debug')('week7:db')
const mongoose = require('mongoose')
module.exports = function () {
mongoose
.connect(`mongodb://localhost:27017/mad9124`, {
useNewUrlParser: true,
useCreateIndex: true,
useFindAndModify: false,
useUnifiedTopology: true,
})
.then(() => {
debug(`Connected to MongoDB ...`)
})
.catch((err) => {
debug(`Error connecting to MongoDB ...`, err.message)
process.exit(1)
})
}
# ES Modules
import mongoose from 'mongoose'
import createDebug from 'debug'
const debug = createDebug('week7:db')
export default function () {
mongoose
.connect(`mongodb://localhost:27017/mad9124`, {
useNewUrlParser: true,
useCreateIndex: true,
useFindAndModify: false,
useUnifiedTopology: true,
})
.then(() => {
debug(`Connected to MongoDB ...`)
})
.catch((err) => {
debug(`Error connecting to MongoDB ...`, err.message)
process.exit(1)
})
}
# sanitizeBody.js
# CommonJS
const debug = require('debug')('week7:sanitize')
const xss = require('xss')
const sanitize = (sourceString) => {
return xss(sourceString, {
whiteList: [],
stripIgnoreTag: true,
stripIgnoreTagBody: ['script'],
})
}
const stripTags = (payload) => {
const attributes = Object.assign({}, payload) // same as {...payload}
for (let key in attributes) {
if (attributes[key] instanceof Array) {
attributes[key] = attributes[key].map((element) => {
return typeof element === 'string'
? sanitize(element)
: stripTags(element)
// The ternary expression above could be written with this if..else block
// if (typeof element === 'string') {
// return sanitize(element)
// } else {
// return stripTags(element)
// }
})
} else if (attributes[key] instanceof Object) {
attributes[key] = stripTags(attributes[key])
} else {
attributes[key] = sanitize(attributes[key])
}
}
return attributes
}
function sanitizeBodyMiddleware(req, res, next) {
const { id, _id, ...attributes } = req.body
req.sanitizedBody = stripTags(attributes)
next()
}
module.exports = sanitizeBodyMiddleware
# ES Modules
import xss from 'xss'
import createDebug from 'debug'
const debug = createDebug('week7:sanitize')
const sanitize = (sourceString) => {
return xss(sourceString, {
whiteList: [],
stripIgnoreTag: true,
stripIgnoreTagBody: ['script'],
})
}
const stripTags = (payload) => {
const attributes = Object.assign({}, payload) // same as {...payload}
for (let key in attributes) {
if (attributes[key] instanceof Array) {
attributes[key] = attributes[key].map((element) => {
return typeof element === 'string'
? sanitize(element)
: stripTags(element)
})
} else if (attributes[key] instanceof Object) {
attributes[key] = stripTags(attributes[key])
} else {
attributes[key] = sanitize(attributes[key])
}
}
return attributes
}
export default function sanitizeBodyMiddleware(req, res, next) {
const { id, _id, ...attributes } = req.body
req.sanitizedBody = stripTags(attributes)
next()
}
# For next week
Before next week's class, please read these additional online resources.
- Learn and Understand Recursion in JavaScript (opens new window)
- An Introduction to Content Security Policy (opens new window)
- OWASP Node Goat Tutorial (opens new window)
Assignment Reminder
Assignment 2 - Mongo CRUD - is due before the start of class on Monday March 8th.
Quiz
There will be a short quiz next class. The questions could come from any of the material referenced above.
Next week is Break Week
This is a great opportunity to review the course notes and all of the linked hybrid study materials from what we have covered so far.
As a self-assessment tool to help you identify areas that might need more review, I have posted an optional ungraded review quiz covering topics from the first half of the term.