File I/O, Streams, File Upload

In this topic we will look at how to perform file input/output (file I/O) in Node.js. This will allow you to create new files, or open existing files, that exist on your server. We will also look at how to upload files to a server in Node.js.

The Node.js fs module

Node.js comes with an inbuilt module fs to handle basic file I/O (see the documentation). You can open an existing file and loop through each line, open an existing file at once, or write to a new file. There are two versions of fs: the non-promise (callback-based) version and the promise-based version. We will use the promise-based version as it allows simpler code with async/await. Here are some examples:


import * as fs from 'fs/promises'; // use promise-based version

async function fileTest() {
    try {
        const fp = await fs.open('bytes.bin', 'r');
        const buffer = Buffer.alloc(10);
        const { bytesRead } = await fp.read({buffer: buffer});
        console.log(buffer);
        if(bytesRead != 10) {
            console.error("Less than 10 bytes read. File corrupted");
            process.exit(1);
        }
        fp.close();
    } catch(e) {
        console.error(e);
        process.exit(1);
    }
}

fileTest();

This example reads in 10 bytes from file and stores them in a Buffer. A Buffer is a Node.js data structure used to represent bytes in memory. The first thing we need to do is open the file:

const fp = await fs.open('bytes.bin', 'r');

This opens the file in read mode (hence 'r'), and resolves with a FileHandle (the variable fp in the example above). A FileHandle is a 'pointer' to an open file which can be used to perform operations on it; when opened it 'points' to the start of the file but as the file is read, it moves through the file.

Note how we then allocate memory for a buffer with capacity 10 bytes. We then call the read() method of the FileHandle to read the bytes into the buffer from the file. This will return an object containing two properties:

bytesRead - the number of bytes read from the file;
buffer - the buffer that the bytes have been read into. If we supplied a buffer within the options passed in as an argument to read() it will be the same buffer; if not, a new buffer will be created and returned.

Note the line:

const { bytesRead } = await fp.read({buffer: buffer});

What is this doing? fp.read() resolves with a an object containg a bytesRead property representing the bytes read. So we could equally well do:

const obj = await fp.read({buffer: buffer});
if(obj.bytesRead != 10) { 
    // ...

However, the destructuring operator { } allows us to "unpack" the object into variables with names corresponding to the property names of the object. So if we wanted to use the buffer property of the object returned from read() (which we might do if we didn't create a Buffer ourselves, as a Buffer (of 16384 bytes) is created for us if we don't supply one) then we could also obtain it using destructuring:

const { buffer, bytesRead } = await fp.read({});

Note here that we do not pass in a buffer to read(), and instead give it an empty options object; in this case read() will allocate a buffer for us.

Writing to file

The next example will illustrate writing to file.

import * as fs from 'fs/promises'; // use promise-based version

async function fileWriteTest() {
    try {
        const array = [ 1, 4, 9, 16, 25, 36, 49, 64, 81, 100 ];
        const buffer = Buffer.from(array);
        const fp = await fs.open('outbytes.bin', 'w');
        const { bytesWritten } = await fp.write(buffer);
        console.log(`Number of bytes written: ${bytesWritten}`);
        fp.close();
    } catch(e) {
        console.error(e);
        process.exit(1);
    }
}

fileWriteTest();

Note how we:

Create an array containing 10 numbers;
Create a buffer from the array;
Open a file for writing ('w' indicates we wish to open the file for writing);
Actually write the buffer to file. This will again resolve with a promise containing an object with two fields, buffer (the buffer written) and bytesWritten (the number of bytes successfully written). Again we use destructuring to extract the latter value into a variable.

Note that the write() method of the FileHandle will also accept a string, rather than a buffer. This allows us to easily write text to file. The following example shows this:

import * as fs from 'fs/promises'; // use promise-based version

async function textFileWriteTest() {
    try {
        const array = [ "C++", "Java", "JavaScript", "PHP", "Python", "Kotlin" ];
        const fp = await fs.open('languages.txt', 'w');
        for(let language of array) {
            await fp.write(`${language}\n`);
        }
        fp.close();
    } catch(e) {
        console.error(e);
        process.exit(1);
    }
}

textFileWriteTest();

This example creates an array of strings and then loops through them with a for/of loop. We write each line in turn: note how we need to add a new line character (\n).

Higher-level operations

We can also perform some commonly-used higher-level operations with the fs package with minimal code. For example we can read in the entire contents of a file with fs.readFile():

import * as fs from 'fs/promises'; // use promise-based version

async function readWholeFile() {
    const wholeFile = (await fs.readFile("data.txt")).toString();
    const lines = wholeFile.split("\n");
    console.log(lines);
}

readWholeFile();

Note how readFile() resolves with a buffer containing the entire contents of the file (which is then converted to a string with toString). We then split the string into an array representing each line; note how we can use the split() method, with a given delimiter string, to split a string into an array at a certain character. (We could use the same technique to split a line at the commas in a CSV file).

Streams

To understand some features of Node file I/O it is useful to understand the meaning of the term stream. A stream represents the flow of data from one device to another. Examples of streams include:

Keyboard input (keyboard to memory) - this is known as standard input and can be accessed in Node.js as process.stdin
Screen output (memory to screen) - this is known as standard output and can be accessed in Node.js as process.stdout
File input and output (memory to and from file), which we have seen abovee
Network input and output (memory to and from another machine on the network), e.g. the use of the node-fetch API

Streams divide into input and output streams. An input stream represents input to the program (from file, keyboard, network, etc), while an output stream represents output from the program (to file, screen, network etc). In Node parlance, input streams are known as readable streams and output streams as writable streams.

The diagram below shows examples of streams:
Streams

Pipes

A nice feature of the Node.js file I/O functionality is the ability to pipe streams. What does this mean? Remember, from above, that a stream is a flow of bytes from one entity to another. We can connect one stream to another, to transfer information from an input source to an output destination.

Here is the simplest possible use of a pipe. We pipe standard input (process.stdin) to standard output (process.stdout). What will this do? It will read in input from the keyboard and echo it to the screen. The stream of information into the program from the keyboard is piped straight out to the screen (standard output).

process.stdin.pipe(process.stdout);

This example will continue until the user terminates the input stream with Control-D.

Here is an example which will pipe standard input to a file:

import * as fs from 'fs';
const outStream = fs.createWriteStream('output.txt');
process.stdin.pipe(outStream);

Again this will continue until the user terminates standard input with Control-D. Here is an example which will create an input stream from a file, and stream it to standard output (i.e. the file contents will appear on screen):

import * as fs from 'fs'; 
const inStream = fs.createReadStream('file.txt');
inStream.pipe(process.stdout);

and finally here is an example which will copy one file to another by opening an input stream from the first file, an output stream to the second file, and pipe the input stream to the output stream:

import * as fs from 'fs';
const inStream = fs.createReadStream('file1.txt');
const outStream = fs.createWriteStream('file2.txt');
inStream.pipe(outStream);

You can also do more complex piping operations by executing multiple stream operations , such a technique can be used for example to zip files using a zip stream from the zlib module. For example (from the documentation here, this reads one file, creates a zip stream to compress the file to gz (gzip) format, and then outputs the zip file. The data is piped through all three.

import * as fs from 'fs';
import * as zlib from 'zlib';
const instrm = fs.createReadStream('file2.txt');
const outstrm = fs.createWriteStream('file2.txt.gz');
const z = zlib.createGzip();
instrm.pipe(z).pipe(outstrm);

Pipes and the web

So for example if we connect to a remote web API from our own server, we receive a stream of bytes from the remote API. Now, let's say we want our server to act as a proxy, or middle-person, between the remote API and the user, so that the user requests a resource from our own server but it is ultimately sourced from a remote third-party API (an example of when you want to do this is given below). In this case, there is no reason to save the response from the remote API in the Node server's memory; you can instead just forward - or pipe - the stream on to the ultimate client, i.e. the web browser.

A good example of where this might be useful is where a remote API requires an API key. An API key is a code, unique to a given registered user, which is used to access certain APIs which may have usage limits, or require payment for heavy use. Use of an API key ensures that users of the remote API do not overload the API's servers, or allows charging for APIs if a client exceeds a certain number of requests, e.g. per month. Many mapping providers use API keys. API keys, however, need to be secret, so you will not want to include them in your client-side JavaScript code as otherwise people might 'steal' them. So a better pattern is for your client-side JavaScript to access the remote API via a proxy route on your own server. The proxy route can forward the request on to the remote API and add the API key. So the API key can, for example, be stored in a .env file.

So here is an example of a proxy route which forwards a request on to a remote API, adding an API key, and using piping:

app.get('/map/:z/:x/:y.png', async(req, res) => {
    const mapsResponse = await fetch(`https://megamaps.example.com/map/${req.params.z}/${req.params.x}/${req.params.y}.png?apiKey=${process.env.API_KEY}`);
`
    // When piping we need to set the correct content type on our headers
    res.set({'Content-Type': 'image/png'});
    mapsResponse.body.pipe(res);
});

Note how this is working. We use the node-fetch API (you should install version 2 - npm install node-fetch@2 - for this to work as version 3 uses ES6 modules which we have not covered yet) to perform a fetch request to the megamaps.example.com remote map server, forwarding on the requested x,y and z tile coordinates to it and adding an API key stored in .env We await the response and then take the body of the response (a stream containing the content, i.e the requested tile). This will be a stream of the bytes making up the tile. So this is piped into the response from our own route (i.e. res), which will result in the map tile being forwarded directly to the ultimate client - the browser - without being processed by our own server.

Another use of proxy routes is to allow access to remote APIs which do not have CORS set up from an AJAX front end on our own server. The same-origin policy prohibits AJAX requests to third-party servers, but we can circumvent this by using a proxy. Our AJAX front end communicates with the proxy on our own server, and the proxy forwards the request on to the remote API. Again, we can setup a pipe on our proxy route which forwards the response stream on to our AJAX front end. Here is a real example, used to access the OpenStreetMap Nominatim API, which provides search services:

app.get('/nominatimSearch/:query', async(req, res) => {     
    const nominatimResponse = await fetch(`https://nominatim.openstreetmap.org/search?format=json&q=${req.params.query}`, {         
        headers: {             
            'User-Agent': 'OpenTrailView 4',             
            'Referer': 'https://opentrailview.org'         
        }     
    });     

    // When piping we need to set the correct content type on our headers
    res.set({'Content-Type': 'application/json'});     
    nominatimResponse.body.pipe(res); 
});

Stream events

It is possible to handle certain events emitted through the use of streams. For example the error event to indicate an error (e.g. opening a non-existing file), the end event for readable (input) streams when all data has been read, and the finish event for writable (output) streams when all data has been written. For example:

import * as fs from 'fs';

const instrm =  fs.createReadStream('file1.txt');
instrm.on("error", e => { console.error(`Input stream: ERROR ${e}`); } );
instrm.on("end", () => console.log("READ...") );

const outstrm = fs.createWriteStream('file2.txt');
outstrm.on("error", e => { console.error(`Output stream: ERROR ${e}`); } );
outstrm.on("finish", () => console.log("WRITTEN...") );

instrm.pipe(outstrm);

More information on streams can be found on the API documentation.

File upload

Having looked at file I/O in general, we will now consider how to upload files to a server. When a file is uploaded, it is sent in multipart/form-data format. An example of this is shown below; here we are uploading a file called cats_small.jpg via an item of POST data called userPhoto.

POST /photos/upload HTTP/1.1
Host: example.com
Content-Type: multipart/form-data; boundary=-----------------------------36350044621708912132908698620

-----------------------------36350044621708912132908698620
Content-Disposition: form-data; name="userPhoto"; filename="cats_small.jpg"
Content-Type: image/jpeg


[JPEG bytes follow here]
-----------------------------36350044621708912132908698620--

As can be seen, the post data begins an ends with a delimiter (the -----------------------------36350044621708912132908698620 here; the end delimiter has two additional dashes) and is followed by a Content-Disposition header describing the data, including its identifier (userPhoto here) and the filename of the file we're uploading (cats_small.jpg here).

If a form is sent with multipart/form-data, then any non-file POST data will be sent in the same way too. Here is an example in which we are sending an item of POST data called message with the photo:

POST /photos/upload HTTP/1.1
Host: example.com
Content-Type: multipart/form-data; boundary=-----------------------------36350044621708912132908698620

-----------------------------36350044621708912132908698620
Content-Disposition: form-data; name="userPhoto"; filename="cats_small.jpg"
Content-Type: image/jpeg

[JPEG bytes follow here]
-----------------------------36350044621708912132908698620
Content-Disposition: form-data; name="message"

A photo of two street cats, Binnie and Clyde.
-----------------------------36350044621708912132908698620--

See the Mozilla documentation. Note how the same delimiter is used to separate each piece of data.

As should be apparent,it is trickier to parse on the server side than straightforward POST data, and for that reason, you probably want to use a third-party library to handle file uploads.

Specifying the form on the client-side

Before we look at how to handle file uploads on the server, we need to look at how we can handle multipart/form-data forms on the client side. You must specify the enctype (encoding type) of the form to multipart/form-data and specify a form field to upload the file with a type of file:

<form method='post' enctype='multipart/form-data'>
Select your file: <input type='file' id='userPhoto' />
<input type='button' id='uploadBtn' value='Upload!' />
</form>

Here is some JavaScript code which could then be used to upload the files:

const photoFiles = document.getElementById("userPhoto").files;
if (photoFiles.length == 0) {
    alert('No files selected!');
} else {
    const formData = new FormData();
    formData.append("userPhoto", photoFiles[0]);
    const response = await fetch('/photos/upload', {
        method: 'POST',
        body: formData
    });
    // ... continue ...
}

Note how this works:

Firstly we access the file upload element in the form (userPhoto) and then access its files property (an array of all files selected, we can potentially upload more than one at once)
Then we create a FormData object. FormData is an object used to represent POST data which we need to use when doing multipart/form-data uploads.
We then make our fetch request and specify the FormData as the request body. This will automatically take care of formatting the multipart request.

The server side

Having looked at how we upload files on the client side, how do we then process them on the server? As said above we can make the job easier by relying on a third-party Node module, and we are going to use express-fileupload. First of all we have to add this as middleware to our Express app:

import express from 'express';
const app = express();
import fileUpload from 'express-fileupload';
import 'dotenv/config';

app.use(fileUpload({
    useTempFiles: true,
    tempFileDir: process.env.TMPDIR,
    limits: { fileSize: process.env.UPLOAD_LIMIT_IN_MB * 1024 * 1024 }
}));

Note how we use the express-fileupload module and then add it to our app as middleware. We configure it with various settings:

useTempFiles means to upload the files to a temporary location, rather than keeping them in memory. Our code can then perform checks on the files and if they are valid, they can be moved to a permanent location on the server.
tempFileDir is the location of the temporary directory.
limits allows you to configure various limits on what will be accepted. Here, we are using the fileSize limit which specifies the limit on upload file size in bytes. We calculate this from a value in .env, UPLOAD_LIMIT_IN_MB, specified in megabytes, by multiplying by 1024*1024.

Once this middleware is added, the uploaded files can be accessed via req.files. You would probably set up an /upload route, e.g:

app.post('/photos/upload', async(req, res) => {
    try {
        const fileName = req.files.userPhoto.name;
        await req.files.userPhoto.mv(`${process.env.PERMANENT_UPLOAD_DIR}/${fileName}`);
        res.json({success: 1});
    } catch(e) {
        res.status(500).json({error: e});
    }
});

In this example we simply move the file (using the mv() method) to its permanent location. Note how we use req.files to access the file; this is a dictionary-like object with properties corresponding to the POST data name(s) for each file uploaded. In our case, the POST data name was userPhoto so we use it here. In more sophisticated applications we might want to add checks, for example to prevent non-logged-in users being able to upload files, or store a record of which user uploaded which file.

Note also how mv() returns a promise, which resolves if the file could be moved to the requested location, or rejects otherwise - so to handle this, we await the resolution of the promise and catch a rejection with the catch block.

Exercise

Available here is a CSV file describing restaurants in the local area. Create a Node application which reads in this CSV file into a string, and splits it into an array. Display the array directly with console.log().

Now try splitting each line into a sub-array, by splitting at the commas. Output the data in this format:

Name: restaurant name
Address: restaurant address (without town) 
Town: restaurant town
Cuisine: restaurant cuisine
Latitude: restaurant latitude
Longitude: restaurant longitude
--------
Name: next restaurant name
... etc ...

Add functionality to your mapping application to add a Nominatim search via a proxy. There should be a "search" box with a button, and when this is pressed, you should call a proxy route on your server to forward the search string to Nominatim. Ensure you supply the User-Agent and Referer properties, as shown in the example, these are required to use Nominatim otherwise you are breaching its terms and conditions otherwise you are breaching its terms and conditions. The User-Agent can just be something like "Student test app", while the Referer should be a localhost URL e.g. http://localhost:3000.. Pipe the JSON returned from Nominatim to your browser. By examining the JSON returned from Nominatim, work out how to set the map position to that of the found location (if more than one result, use the first).
Write a simple test file-upload application. It should just contain a form to allow the user to upload a file, plus some client side code to upload the file and Node.js code to handle receiving the file.
Return to question 1 and try to add a route to your Express application to send back JSON of all the restaurants to the client. You need to parse the CSV file into an array of objects, the array being an array of the restaurants and each object representing a restaurant (use appropriate keys). Then, connect the route to your map front end, so that the locations of all restaurants are shown.
Modify your Nominatim search to handle multiple results by creating a <div> for the results, and list each result in the <div>. Add a button to each result, and move the map to that location when the user clicks the button.