Week 8
Data Santization, XSS and Logging

# Agenda

AMA (10 mins)
Better Debugging
Santization
XSS
Logging

# Better debugging

Before we dig into recommended practices for guarding against data corruption and scripting attacks through input data validation and sanitization, let's add another tool to our development toolbox.

The debug (opens new window) NPM module is a better console.log that you can turn on and off with an environment variable instead of commenting out the debugging statements in your code. It is used internally by all of the modules in the Express framework, and we can use it in our modules as well.

Install debug as a project dependency.

npm install debug

Make it available in app.js and set the namespace to the application name.

Set the primary namespace

The primary debug namespace should match the name property in package.json.


'use strict'

const debug = require('debug')('week12')
const express = require('express')

require('./startup/database')() // IIFE

const app = express()
app.use(express.json())

const port = process.env.PORT || 3030
app.listen(port, () => debug(`Express is listening on port ${port} ...`))

To keep the main entry module for our project (app.js) as clean as possible. It is a good practice to move one-time set-up activities, like connecting to the database, into separate modules in a /startup or a /bootstrap folder.

Put your Mongoose connection set-up code from week 6 into a new module called /startup/database.js.

Then at the top of that file, require the debug module setting the namespace to week7:db and change any console.log() statements to debug() statements. The completed code should look like this.

const debug = require('debug')('week7:db')
const mongoose = require('mongoose')

module.exports = () => {
  mongoose
    .connect(`mongodb://localhost:27017/mad9124`, {
      
    })
    .then(() => {
      debug(`Connected to MongoDB ...`)
    })
    .catch(err => {
      debug(`Error connecting to MongoDB ...`, err.message)
      process.exit(1)
    })
}

# Environment Variables

Remember

The debug module suppresses output by default.

If you run the app now with node app.js it will not output anything to the console. You need to activate it by setting the DEBUG environment variable to our application namespace before running the application.

DEBUG=week12 node app.js

Hmmm ... still not quite right. We only saw the debug message from app.js, not from the database connection module. Debug allows us to be very selective about which module's messages we want to see. If we run it again with ...

DEBUG=week12:db node app.js

... now we only see the database connection message.

To see all related namespaces we can use the * wildcard character.

DEBUG=week12:* node app.js

DEBUG=week12* node app.js Now you will see any debug output where the namespace starts with week12.

The application start-up command is starting to get long. What if we want to add some other environment variables like NODE_ENV, PORT or API_KEY?

DEBUG=week12* NODE_ENV=dev PORT=3000 API_KEY=91ij08-1fehi=1feef node app.js

There is a library dotenv that will take a special file called .env, and automatically add all the values from the file as environment files to our node application.

Install the library:

yarn add dotenv

And add it to the very top of our index.js file. We can simply require the configuration function, and it will set everything up for us!

// index.js

require('dotenv/config');

// .env
NODE_ENV=development
PORT=3000
DEBUG=week12:*
API_KEY=91ij08-1fehi=1feef

With this easy setup, we can easily keep track of our environment variables!

# IMPORTANT NOTE

MAKE SURE TO ADD .env TO YOUR .gitinore FILE!!!!!

.env files often contatin sensitive information that shouldn't be shared with github, and should not be committed. Instead, we make a .env.example file, so that anyone who clones our repo knows that these values are needed for our application to run, but won't know the exact sensetive values (like passwords or api keys). It should look like this:

// .env
NODE_ENV=
PORT=
DEBUG=
API_KEY=

# NPM Scripts

Add a start script for production and a dev script to use with our local development environment.

"scripts": {
    "start": "node app.js",
    "dev": "nodemon app.js"
  },

Now you can start your development server by running ...

yarn dev

Or you can start up in production mode with yarn start.

We have more configuration work to do before we are ready for production mode. We will cover that in a few weeks.

# Sanitization

The very nature of our application is to take user input and act on it in some way, either to store information or searching and filtering information from the database. The data stored is ultimately returned to the client application.

This means that a web service application is inherently vulnerable to a variety of malicious attacks.

We cannot trust any data coming from the client application. It might not be our code. It may have been hacked. A malicious actor may be using it to directly input data formatted to cause our system to malfunction.

The Open Web Application Security Project (OWASP) (opens new window) is an excellent resource for staying up to date on application security risks and best practices. Among their many resources, the annual OWASP Top 10 Critical Web Application Security Risks (opens new window) should be on your reading list.

Let's look at solving the two most damaging attack types: Cross-site scripting (XSS) and Database Injection. To do that, we need to filter and cleanse all incoming data.

# XSS Protection

To help protect against Cross-site scripting (XSS) attacks (opens new window)you need to strip out any HTML or script tags from the input values so that they would not be interpreted as scripts or alter the browser rendering of any returned data. You can use the xss (opens new window) NPM package to solve this problem.

npm install xss

Build a small middleware function to implement this behaviour. Create a new file in the middleware folder called sanitizeBody.js. Require the debug and xss modules at the top. Then add an empty middleware function signature. (We will fill in the logic in the next step.)

const debug = require('debug')('sanitize:body');
const xss = require('xss');

const sanitizeBody = (req, res, next) => {
  // sanitization logic goes here
  next();
}

module.exports = sanitizeBody;

It is a good practice not to take destructive action on the original request body. So, make a copy to work with and set the resulting sanitized version as a new property of the request object that downstream route handlers can access.

Start by stripping out any id, or _id properties. We never want those.

const {id, _id, ...attributes} = req.body;

The xss module applies its filter rules to strings. To apply it to the various properties of our user supplied data, we will need to loop over the members of the req.body object with a for...in loop.

The xss function takes an optional configuration object to customize how it works.

for (let key in attributes) {
  attributes[key] = xss(attributes[key], {
    whiteList: [], // empty, means filter out all tags
    stripIgnoreTag: true, // filter out all HTML not in the whitelist
    stripIgnoreTagBody: ['script']
    // the script tag is a special case, we need
    // to filter out its content
  })
}

Lastly set the modified attributes as the value of a new req.sanitizedBody property and then call next(). The whole thing should look similar to this.

'use strict'

const debug = require('debug')('sanitize:body');
const xss = require('xss');

const sanitizeBody = (req, res, next) => {
  debug({body: req.body});
  const {id, _id, ...attributes} = req.body;
  debug({attributes});
  for (const key in attributes) {
    attributes[key] = xss(attributes[key], {
      whiteList: [],
      stripIgnoreTag: true,
      stripIgnoreTagBody: ['script']
    });
  }
  debug({sanitizedBody: attributes});
  req.sanitizedBody = attributes;
  next();
}

module.exports = sanitizeBody;

OK, we have our body sanitizer middleware function. How do we use it?

Remember, Express route method declarations can take more than one callback function. This lets us call one or more middleware functions directly for any given route.

app.post('/test', middleware, routeHandler);

So, in our /routes/cars.js module, we could do this ...

// router/student.js

studentRouter.post('/', sanitizeBody, studentController.create);

// controllers/students.js
const sanitizeBody = require('./middleware/sanitizeBody');
const StudentService = require('./services/Student');
// ... other set-up

const create = async (req, res, next) => {
  try {
    const newStudent = await StudentService.create(req.sanitizedBody);
    res.send({data: newStudent});
  } catch (err) {
    next(err);
  }
}

You can now be reasonably confident that only plain text strings will be passed to Mongoose for validation. Potentially malicious HTML and JavaScript will be removed from input strings.

You can test it with Postman.

screenshot of Postman test

It worked!

But wait. What if the payload attributes are not simple strings? We need to refactor the sanitizeBody middleware to call itself recursively for more complex data structures.

# Recursion

From The Modern JavaScript Tutorial (opens new window)

When a function solves a task, in the process it can call many other functions. A partial case of this is when a function calls itself. That’s called recursion.

Start by creating a new function called stripTags. It should take a single argument - let's call it payload. Now cut the for...in loop from the main function and paste it into the new one. As a best practice, we should not mutate the original payload object that is passed in, so let's make a copy of that with the line let attributes = {...payload}. Don't forget to return the sanitized attributes at the end of this new function.

Then call the new stripTags function from within the primary function. The refactored middleware should now look like this.


const debug = require('debug')('sanitize:body')
const xss = require('xss')

const stripTags = payload => {
  const attributes = {...payload};
  for (const key in attributes) {
    attributes[key] = xss(attributes[key], {
      whiteList: [],
      stripIgnoreTag: true,
      stripIgnoreTagBody: ['script']
    });
  }
  return attributes;
}

const sanitizeBody = (req, res, next) => {
  debug({body: req.body})
  const {id, _id, ...attributes} = req.body
  debug({attributes})
  const sanitizedBody = stripTags(attributes)
  debug({sanitizedBody: sanitizedBody})
  req.sanitizedBody = sanitizedBody
  next()
}

module.exports = sanitizeBody

This version should be functionally equivalent to what we had before. Test it with Postman to be sure that it is working.

OK now we can augment the stripTags function to check for objects and then call itself to loop over that nested object and sanitize it's properties. Wrap the logic inside the for..in loop in an if/else block.


for (let key in attributes) {
  if (attributes[key] instanceof Object) {
    attributes[key] = stripTags(attributes[key]);
  } else {
    attributes[key] = xss(attributes[key], {
      whiteList: [],
      stripIgnoreTag: true,
      stripIgnoreTagBody: ['script']
    });
  }
}

Test that with Postman ...

screenshot of Postman test

Great! Now let's make sure that we handle Arrays properly. Since arrays inherit from Object in JavaScript, we need to check for that case first. Then instead of just recursively calling stripTags on the array, we need to use the .map() method to loop over the array.

Each element could be another complex object or a simple string. We need to check and handle both cases. If it is a string we can call xss() otherwise call stripTags() again.

Rather than duplicate the code for the xss() function call, let's extract that to a separate function in our module.


const sanitize = sourceString => {
  return xss(sourceString, {
    whiteList: [],
    stripIgnoreTag: true,
    stripIgnoreTagBody: ['script']
  })
}

Then the conditional block in the stripTags() function becomes ...

if (attributes[key] instanceof Array) {
  attributes[key] = attributes[key].map(element => {
    return typeof element === 'string' ? sanitize(element) : stripTags(element)
  })
} else if (attributes[key] instanceof Object) {
  attributes[key] = stripTags(attributes[key])
} else {
  attributes[key] = sanitize(attributes[key])
}

The return statement of the map() method above is using JavaScript's ternary operator, rather than a more verbose if/else block.

Writing out that single ternary expression the long way would look something like this ...

let cleanedElement
if (typeof element === 'string') {
  cleanedElement = sanitize(element)
} else {
  cleanedElement = stripTags(element)
}
return cleanedElement

OK. Test that with Postman ...

screenshot of Postman test

Phew!

The final version of the sanitizeBody.js middleware module should look similar to this (with no debug statements).


const xss = require('xss')

const sanitize = sourceString => {
  return xss(sourceString, {
    whiteList: [],
    stripIgnoreTag: true,
    stripIgnoreTagBody: ['script']
  });
}

const stripTags = payload => {
  const attributes = { ...payload } // don't mutate the source data
  for (let key in attributes) {
    if (attributes[key] instanceof Array) {
      attributes[key] = attributes[key].map(element => {
        return typeof element === 'string'
          ? sanitize(element) // if true
          : stripTags(element); // if false
      });
    } else if (attributes[key] instanceof Object) {
      attributes[key] = stripTags(attributes[key]);
    } else {
      attributes[key] = sanitize(attributes[key]);
    }
  }
  return attributes;
}

const sanitizeBody = (req, res, next) => {
  const { id, _id, ...attributes } = req.body;
  const sanitizedBody = stripTags(attributes);
  req.sanitizedBody = sanitizedBody;
  next();
}

module.exports = sanitizeBody;

OK. Now we can be reasonably sure that all HTML tags and scripts will be removed from all req.body properties, no matter how deeply they are buried.

# Database Injection

Using Mongoose which enforces schema validation and attempts type coercion to ensure that only data in the correct format is stored, goes a long way to protecting MongoDB from various attacks. However, there are still some cases that are not covered.

Read A NoSQL Injection Primer with Mongo (opens new window) to learn more about this kind of vulnerability.

To help protect against database injection attacks, we will use the Express Mongoose Sanitize (opens new window) NPM package.

From the docs ...

# What?

This module searches for any keys in objects that begin with a $ sign or contain a ., from req.body, req.query or req.params. It can then either:

completely remove these keys and associated data from the object, or replace the prohibited characters with another allowed character. The behaviour is governed by the passed option, replaceWith. Set this option to have the sanitizer replace the prohibited characters with the character passed in.

# Why?

Object keys starting with a $ or containing a . are reserved for use by MongoDB as operators. Without this sanitization, malicious users could send an object containing a $ operator, or including a ., which could change the context of a database operation. Most notorious is the $where operator, which can execute arbitrary JavaScript on the database.

The best way to prevent this is to sanitize the received data, and remove any offending keys, or replace the characters with a 'safe' one.

# Let's install it

yarn add express-mongo-sanitize

# Let's implement it

We could implement this middleware on a route by route basis. e.g.

// At the route level
router.post('/:id', sanitizeMongo(), (req, res) => {})

// Or at the router level ...
app.use('/api/cars', sanitizeMongo(), carsRouter);

// Or at the application level ...
app.use(sanitizeMongo());

Let's choose the application level -- set it and forget it!

Don't forget

You need to require the module before you can use it.

const sanitizeMongo = require('express-mongo-sanitize');

# Logging

In production applications, it is a good practice to use a logger like Winston (opens new window) instead of writing debug or console.log statements. Instead of just writing debug statements to the console, a logger can also write them out to a file or database table that can be analysed later.

A logger lets you tag your output with a severity level (opens new window). This makes it easier to filter for right level of detail.

# Winston

Let's install and setup Winston.

npm install winston

Now define a new logger.js module in the /util folder.

// src/util/logger
'use strict'

const winston = require('winston');
const logger = winston.createLogger({
  level: 'info',
  format: winston.format.json(),
  defaultMeta: {service: 'student-service'},
  transports: [
    //
    // - Write to all logs with level `info` and below to `combined.log`
    // - Write all logs error (and below) to `error.log`.
    //
    new winston.transports.File({filename: 'error.log', level: 'error'}),
    new winston.transports.File({filename: 'combined.log'})
  ]
});

//
// If we're not in production then log to the `console` with the format:
// `${info.level}: ${info.message} JSON.stringify({ ...rest }) `
//
if (process.env.NODE_ENV !== 'production') {
  logger.add(
    new winston.transports.Console({
      format: winston.format.simple()
    })
  );
}

module.exports = logger;

Import this logger module in every other module and update any console.log or debug statements. For example in the /startup/connectDatabase module ...

// src/util/db.js
const logger = require('./logger');
module.exports = () => {
  const mongoose = require('mongoose');
  mongoose
    .connect(process.env.MONGO_URL)
    .then(() => {
      logger.log('info', `Connected to MongoDB ...`);
    })
    .catch(err => {
      logger.error('error', `Error connecting to MongoDB ...`, err);
      process.exit(1);
    })
}

Winston is a very flexible logger with a lot of additional capabilities. Be sure to read the full documentation (opens new window) to know what it can do.

# NODE_ENV

Like in the Winston configuration above, it is very common to have conditional logic in the main app startup module(s) that set different behaviours based on the deployment environment. For example, we generally want to see debugging or detailed logging during development, but want to suppress these in production.

As you know we can access command shell environment variables using the process.env object. It is a common best practice to set the NODE_ENV environment variable to indicate the current runtime environment. Some typical values are: dev, development, test, staging, prod, production.

If NODE_ENV is not set, process.env.NODE_ENV returns undefined.

Because this is such a common use case, Express provides another way to read the NODE_ENV environment variable – using app.get('env'). If NODE_ENV is not set, app.get('env') returns development as a default value.

To declare the production environment run this command in the terminal.

export NODE_ENV=production

To set it back to development simply reset the value of the NODE_ENV variable.

export NODE_ENV=development

# Generate JWT Secret

The JWT secret key can be any string value. To make it more effective, it should be at least 30 characters and preferably random. Here is a simple script to generate a new random key. Create a new top level file called genKey.js

console.log([...Array(30)].map(e => ((Math.random() * 36) | 0).toString(36)).join(''))

When you are setting up your deployment environment, run the script and then copy the output to set the environment variable.

node genKey.js
ixzz7ph7goovu62b6hz3k6egyghhbn

export APP_JWTKEY=ixzz7ph7goovu62b6hz3k6egyghhbn

DANGER

The JWT secret key should only be changed if there is a suspected security breach. Changing it will immediately invalidate all exiting tokens.

# Compression

To improve network communications performance, we look for any opportunity to reduce the payload size for any given request/response cycle. One such possibility is to use standard text compression algorithms on the response payload.

The NPM module called compression (opens new window) is a Node.js compression middleware. It will attempt to compress the response.body using gzip for responses with a compatible Content-Type header value (e.g. text/html or application/json). See the compressible (opens new window) module for default behaviour.

npm install compression

const compression = require('compression');
const express = require('express');
const app = express();

// attempt to compress all routes
app.use(compression());

# Security Middleware

As we have discussed before, good application security is not "a feature" and not "a bolt-on module". It is about applying multi-layered best practices throughout the design and development cycle. As we prepare to deploy our web service to a production environment, CORS (opens new window) and Helmet (opens new window) are two important middleware modules that help.

# CORS

Read this HTML5 Rocks backgrounder on Cross-Origin Resource Sharing (CORS) (opens new window).

The use-case for CORS is simple. Imagine the site alice.com has some data that the site bob.com wants to access. This type of request traditionally wouldn’t be allowed under the browser’s same origin policy. However, by supporting CORS requests, alice.com can add a few special response headers that allows bob.com to access the data.

To manage CORS in our Express web service use the cors (opens new window) package from NPM.

npm install cors

Then require it in the main app.js module and apply it as the first middleware with the default settings.

const express = require('express');
const cors = require('cors');

const app = express();

app.use(cors());
// other middleware goes here

The cors() middleware constructor function takes an optional configuration options (opens new window) object. The default settings are equivalent to setting the options object with these values.

{
  "origin": "*",
  "methods": "GET,HEAD,PUT,PATCH,POST,DELETE",
  "preflightContinue": false,
  "optionsSuccessStatus": 204
}

This will let a client app served from any domain name access the API resources.

# Helmet

The helmet (opens new window) NPM module is an integrated middleware package (with 14 sub-modules) that implements many network communications security best practices.

It's not a silver bullet, but it can help!

npm install helmet

const express = require('express');
const helmet = require('helmet');

const app = express();

app.use(helmet());

Not all of the included middleware functions are enabled by default. See the full documentation (opens new window) for all of the options and details about the kinds of attacks they help to prevent.

# NPM audit

The NPM package manager has a built-in function to scan the entire dependency tree for known security vulnerabilities. This will output a list of packages that may need to be updated or replaced with a more secure library module. It is a good idea to run this check regularly, as new attack vectors are discovered and reported on an ongoing basis.

yarn audit

# Assignment

Assignment 3 - Pokemon API with Authentication - is due before next week's class.

# More readings

← Week 11: Authentication With Passport Week 13: Testing with Jest →

Week 8 Data Santization, XSS and Logging

# Agenda

# Better debugging

# Environment Variables

# IMPORTANT NOTE

# NPM Scripts

# Sanitization

# XSS Protection

# Recursion

# Database Injection

# What?

# Why?

# Let's install it

# Let's implement it

# Logging

# Winston

# NODE_ENV

# Generate JWT Secret

# Compression

# Security Middleware

# CORS

# Helmet

# NPM audit

# Assignment

# More readings

Week 8
Data Santization, XSS and Logging