In the previous article, we delved into creating a simple web server that serves static files, flat file database that uses lock and the API endpoints that allows for real-time chat interaction. In this article, we’ll follow-up by taking a look into writing unit tests for our flat file database to ensure that our chat interaction flow works.
What and how to test
We will look at how to setup the testing environment and how to write the tests using Large Language Model (LLM). The scope of the LLM parts of the article would mostly be covering upon using LLM in an agentic manner. It also briefly mentions alternative ways of achieving a similar result in non-agentic approach too. However, areas such …
In the previous article, we delved into creating a simple web server that serves static files, flat file database that uses lock and the API endpoints that allows for real-time chat interaction. In this article, we’ll follow-up by taking a look into writing unit tests for our flat file database to ensure that our chat interaction flow works.
What and how to test
We will look at how to setup the testing environment and how to write the tests using Large Language Model (LLM). The scope of the LLM parts of the article would mostly be covering upon using LLM in an agentic manner. It also briefly mentions alternative ways of achieving a similar result in non-agentic approach too. However, areas such as Model Context Protocol (MCP) is outside the scope of this article and would not be touched on. While going through the ‘how’ aspects of doing tests, we will also evaluate ‘what’ exactly we are trying to test.
Testing environment setup
For our testing, we will need some developer dependency for mocking and doing spy on certain NodeJS promises. In this article, we are going to setup the environment using popular tools such as Jest. Jest is a comprehensive testing library that includes test runner, assertions and mocking tools which can be used on NodeJS. There are also other tools out there that will do the job equally well for our use-case. The main reason why they were picked was due to their current popularity. Even if you decide to use a different tool, the same concept can be applied.
For the purpose of this article, I am running a Long-Term Support (LTS) version of NodeJS, specifically Node v22.16.0. You will need both NodeJS and Node Package Manager (NPM) to continue. For the sake of brevity, I will not go through how to install NodeJS or NPM. You should refer to Node Version Manager (NVM) on how to do installation for your specific Operating System (OS).
Once you have ensured that you have the correct Node environment, we can then setup the testing tools that we will need. Make sure your NPM environment in your directory has been initialised with a package.json configuration file. If it has not been initialised, you can run the following command the initialise it. Fill in the details as you deem necessary.
npm init
Next, we can install some optional tools that will help us. Node types and jest global types are optional as they’re only there to help aid your IDE or code editor to understand the code type hints. These type hints will then allow the IDE to provide “intellisense” or auto-completions which would be extremely useful to you. You can run the following command on your Command Line Interface (CLI) to install Jest types and Node types:
npm install --save-dev @jest/globals @types/node
In the following parts, we’ll be setting up the installation for Jest. We will also need to edit our configuration a bit and limit Jest to running unit-tests.
Firstly, let’s setup a Jest installation and ask it to initialise a basic configuration file for us to configure. Run the following command to do that. This should create a configuration file named jest.config.mjs which we will configure in awhile.
npm init jest@latest
Now, we’ll need to configure our newly generated Jest configuration file. The following code will alter the configuration in a way that will only allow .jest.test files to run using Jest. Since we are using Jest for writing tests for NodeJS, we also need to specify the test environment.
// jest.config.mjs
...
testEnvironment: "node",
...
testRegex: [
'.*\\.jest\\.test\\.[jt]sx?$'
],
...
Finally, we’ll need to edit our package.json file’s NPM scripts. This is because we’ll be writing Jest test code in ECMAScript modules instead of CommonJS. Thus, we’ll need to specify NodeJS to use experimental-vm-modules. The following change adds the appropriate scripts for their respective tools.
// package.json
...
"scripts": {
"test": "node --experimental-vm-modules node_modules/jest/bin/jest.js"
},
...
Full package.json output
For the sake of clarity, the following is the full output of my package.json file. This is especially useful if you’re viewing this from the future where versions might have changed and the installation commands installed a newer version which might not be relevant or correct.
{
"name": "zenika-chat-app",
"version": "1.0.0",
"type": "module",
"main": "index.js",
"scripts": {
"test": "node --experimental-vm-modules node_modules/jest/bin/jest.js"
},
"keywords": [],
"author": "",
"license": "ISC",
"description": "",
"devDependencies": {
"@jest/globals": "^30.0.4",
"@playwright/test": "^1.53.1",
"@types/node": "^24.0.4",
"jest": "^30.0.4"
}
}
Leveraging LLM for testing
We will first start off by defining the scope of our LLM usage. We will not rely on Large Language Models (LLM) for brainstorming or generating test cases. This is because the LLM might be taking sample information from a pool of badly reason test cases or perhaps, the test-case might not be applicable to your use case. It is extremely important to use our own reasoning skills to craft our own bespoke test-cases. There is also a common misunderstanding due to marketing gimmicks used by LLM companies to present their models as “reasoning”. As far as my limited understanding and personal opinion goes, scientific reasoning is not yet possible with the current models and they’re still mostly statistical compute machines. This means that you, as the engineer prompter, needs to do the reasoning and thinking. Hence, we will create and reason the test-cases ourselves and thereafter, prompt the LLM to generate the code for our test-cases. You need to provide a lot of guidance to the LLM to make the code work and there’s a high chance that the code would not work on the first attempt and would require adjustments or fixes. This would usually mean that we will need to undergo an iterative process to improve the output results of the LLM. You will need to feed a prompt to it and adjust by giving it more information or ask it to correct certain parts of it; iterating till it reaches the ideal state. For the appropriate scenarios, I will be providing a potential prompt which you can try out with a LLM and ask it to help scaffold the initial test code for you. Here is how I would imagine the process to look like:
- Identify important parts or critical paths to be tested
- Brainstorm and describe the workflows that involves these critical paths
- Describe a prompt with specific steps on areas which requires careful guidance
- Feed the prompt to the LLM
- Verify, evaluate and test the output
- Iterate the process until the output is in an ideal state
LLM setup
You can use LLM by either asking it or using it in an agentic manner where the agent would automatically read your files and understand your code structure. For the purpose of this practice, I will be using GitHub co-pilot in Jetbrains Webstorm IDE and using its “Ask” function and attach the relevant files as context. Thereafter, it will generate the code in Co-pilot conversation and I then copied the code over. You could also use GitHub co-pilot in VSCode and the experience will be almost the same. Similarly, you could use ChatGPT (without GPT codex) and provide the necessary prompt and context but it will be much more tedious and of a hassle to do so. If you do not have any of those installed, I would recommend installing the GitHub co-pilot plugin in your favourite code editor but if you do not wish to do so, you could also just use ChatGPT and manually follow along. Specification Driven Development (SDD) will not be covered in the scope of this article as there are complexities, nuances and extra security measures that needs to be covered upon.
In the following segment, we will write some helper utility code that will assist the LLM in testing the flat file database we have written and some of the chat full workflow interactions. In order to generate random data containing random usernames, messages and timestamp, we will want to have convenient and re-usable methods to creating random valid strings and random valid timestamps. Here are some handy functions for us to generate some test data. You should attach the code files as context so that the LLM will able to use these functions in writing the test code. If you’re using ChatGPT, you would need to copy this code into your prompt context.
import {randomBytes} from "node:crypto";
export function generateRandomTimestamp() {
const randomDay = Math.floor(Math.random() * 1000);
return (new Date(randomDay * 24 * 60 * 60 * 1000)).toISOString();
}
export function generateRandomString(byteLength) {
return randomBytes(byteLength).toString('base64');
}
Now, we can proceed to prompt the LLM to write code to test for the areas mentioned.
Testing flat file database
We want to test certain key parts of the flat file database since the flat file database is an important component of our system. Our system would not work if the flat file database does not work based on our assumptions. The composition of the flat file database is interesting in the sense that it might not be obvious that there are lots of other parts abstracted from our view. Whenever we try to use our code to read or write the flat file database, it calls upon NodeJS functions will relies on the operating system’s filesystem ability to read and write. There are parts which we would be interested in but also parts which we are not really interested in testing. Let’s illustrate this with a simplified diagram.
In our specific use case, we aren’t really interested in the actual underlying read or write operations on the operating system filesystem’s level. We could make the assumption that NodeJS and the operating system’s filesystem is reading and writing as expected. This would reduce the amount of tests we need to do and the complexity we need to handle. But how can we do that? We can mock the implementations of the NodeJS APIs.
When mocking, we can also spy on the parameters used when calling the mocked functions. This allows us to make assertions that the parameters are behaving according to our expectations.
And thereafter, we can even return a mocked response that would allow our software application to function as if it was in an ideal environment.
How Jest can help
Jest is extremely useful here as it makes spying much less of a hassle. We want to be able to mock the read and write operations but still be know that these functions have been called with its appropriate parameters. Its ability to let us mock enables us to observe the parameters sent to the NodeJS APIs easily and make assertions to confirm that the behaviour is as expected.
Diving into the testing parts
First, let’s identify the parts which can be considered important. There are a few ways of identifying, one of which would be the critical paths. In our critical paths of reading and writing messages, there are a few requirements that needs to work. Hence, these are the parts that are important and which we would want to test.
For the use case of reading from the database:
-
Data needs to be readable
-
This means that the data from the file to be parsed correctly, follow the specified JSON structure for our defined message and for our code to be able to reference this newly created parsed JSON object.
For the use case of writing to the database:
-
Data needs to be readable
-
Because of how the writes would actually overwrite the previous data, it requires the application to read the previous data and merge with it before writing.
-
Data needs to be writable
-
This means that the JSON object data we write to the file needs to be serialised into string format correctly, should produce the specified stringified JSON structure as defined and finally, to be appended correctly to the list of existing JSON string messages that we have.
-
There needs to be write-safety using write-lock
-
This means that the writelock file must exists when we are doing writes and the other server nodes trying to write to this same file must wait until the write lock is released.
For the use case of storing data over longer periods of time:
-
Data needs to be persistent
-
This means that the data that we write should not be temporal and must exists unless we take action to remove it. Restarting the server or shutting down the server should not affect the existence of data.
For the use case of being able to scale horizontally:
-
The data should be relatively consistent across several independent nodes (cloud native)
-
When other server nodes try to access or read the same data, it should return the same data to all the other server nodes.
After identifying the key components, we will need to understand how we can test these parts. There are various ways to go about testing different parts and for our use case, we can go with something simple. As mentioned in the above points, we can mock the parts that do writing or reading to the filesystem as we do not need to care about its actual underlying behaviours. Data being persistent might be a bit difficult to test in a unit test, so that might be better resolved in an integration test or end-to-end test.
Next, we will evaluate the happy paths related to our workflows. The ideal scenario workflow for message retrieval would look like this:
- Attempt to read from the directory and the database flat file’s contents
- If the database flat file does not exist, we do not create a new database file and we only return empty
- If the database flat file exists, read the contents and JSON parse it.
- It should successfully parse the JSON contents and return the data in an expected data format
- Return the expected Promise data based on the given data stored in the flat file
And the ideal scenario workflow for message sending would look like this:
- Attempt to see if there is a write-lock
- If there is no write-lock, attempt to create a write-lock file
- Attempt to read from the directory and the database flat file’s contents
- If the database flat file does not exist, return empty data
- If the database flat file exists, read the contents and JSON parse it.
- Add the new message to the JSON file
- Remove the write-lock file
Potential prompting
Given the key parts that we want to test and the happy path scenarios, we can then formulate our prompt. We need to provide guidance on some specific executions, such as how to mock. This is because there are many ways to spy and mock, and without the specific guidance specific to our code, it might end up generating code that is irrelevant or incorrect.
Based on the given source code context, write the Jest test-cases for the given ideal scenario workflow for message retrieval and message sending. Take note that this is a Jest setup in NodeJS using pure JavaScript, use Jest global imports wherever possible. The test function should start with "should XXX", which means it should look like this:
test("should XXX", () => {
});
The data is stored in a flat file so that means that you need to mock the implementation of the appropriate functions to prevent actual file read or writes. I am giving you a tutorial example of how to do a mock implementation, please use this style of mocking to write the whole test file, including all beforeEach and beforeAll and afterEach and afterAll, and also include the Jest spyOn and mock implementation code where appropriate.
jest.spyOn(fs, 'readFile').mockImplementation(async (path) => {
if (path === db.dbFilePath) {
return Promise.resolve(JSON.stringify(writtenData?.data || []));
}
if (path === db.lockFilePath) {
return Promise.resolve(lockFileContent || '');
}
return Promise.resolve();
});
And then here is the workflow scenario:
<insert the workflow scenario here>
Result
Your prompt result should minimally guarantee that it uses the mock for readFile and all other related parts. The following was the raw unedited code output generated by the LLM for me. You might get a similar output. However, you should not use the code as is without reviewing or editing.
import db from "../db.js";
import {randomUUID} from "node:crypto";
import fs from "node:fs/promises";
import {generateRandomString, generateRandomTimestamp} from "../dbTestUtils.js";
import {afterAll, afterEach, beforeAll, describe, expect, jest, test} from '@jest/globals';
describe("Message Retrieval Workflow Testing", () => {
let readFileSpy;
let accessSpy;
let writeFileSpy;
let unlinkSpy;
let dbFileExists = false;
let dbFileData = null;
let lockFileExists = false;
let lockFileContent = null;
beforeAll(() => {
readFileSpy = jest.spyOn(fs, 'readFile').mockImplementation(async (path) => {
if (path === db.dbFilePath) {
if (!dbFileExists) {
const error = new Error('File does not exist');
error.code = 'ENOENT';
throw error;
}
return Promise.resolve(JSON.stringify(dbFileData || []));
}
if (path === db.lockFilePath) {
if (!lockFileExists) {
const error = new Error('File does not exist');
error.code = 'ENOENT';
throw error;
}
return Promise.resolve(lockFileContent || '');
}
return Promise.resolve();
});
accessSpy = jest.spyOn(fs, 'access').mockImplementation(async (path) => {
if (path === db.dbFilePath && dbFileExists) {
return Promise.resolve();
}
if (path === db.lockFilePath && lockFileExists) {
return Promise.resolve();
}
const error = new Error('File does not exist');
error.code = 'ENOENT';
throw error;
});
writeFileSpy = jest.spyOn(fs, 'writeFile').mockImplementation(async (path, data) => {
return Promise.resolve();
});
unlinkSpy = jest.spyOn(fs, 'unlink').mockImplementation(async (path) => {
return Promise.resolve();
});
});
afterEach(() => {
jest.clearAllMocks();
dbFileExists = false;
dbFileData = null;
lockFileExists = false;
lockFileContent = null;
});
afterAll(() => {
readFileSpy.mockRestore();
writeFileSpy.mockRestore();
accessSpy.mockRestore();
unlinkSpy.mockRestore();
});
test("should return empty array when database file does not exist", async () => {
dbFileExists = false;
const messages = await db.readMessages();
expect(messages).toEqual([]);
expect(readFileSpy).toHaveBeenCalledWith(db.dbFilePath);
expect(writeFileSpy).not.toHaveBeenCalled();
});
test("should read and parse JSON contents when database file exists with valid data", async () => {
const testMessages = [
{
id: randomUUID().toString(),
author: generateRandomString(15),
content: generateRandomString(100),
timestamp: generateRandomTimestamp()
},
{
id: randomUUID().toString(),
author: generateRandomString(20),
content: generateRandomString(150),
timestamp: generateRandomTimestamp()
}
];
dbFileExists = true;
dbFileData = testMessages;
const messages = await db.readMessages();
expect(messages).toEqual(testMessages);
expect(readFileSpy).toHaveBeenCalledWith(db.dbFilePath);
expect(messages).toHaveLength(2);
expect(messages[0]).toHaveProperty('id');
expect(messages[0]).toHaveProperty('author');
expect(messages[0]).toHaveProperty('content');
expect(messages[0]).toHaveProperty('timestamp');
});
test("should return empty array when database file exists but contains empty array", async () => {
dbFileExists = true;
dbFileData = [];
const messages = await db.readMessages();
expect(messages).toEqual([]);
expect(readFileSpy).toHaveBeenCalledWith(db.dbFilePath);
});
test("should successfully parse JSON and return data in expected format", async () => {
const expectedMessage = {
id: randomUUID().toString(),
author: "TestUser",
content: "Test message content",
timestamp: Date.now()
};
dbFileExists = true;
dbFileData = [expectedMessage];
const messages = await db.readMessages();
expect(messages).toEqual([expectedMessage]);
expect(typeof messages[0].id).toBe('string');
expect(typeof messages[0].author).toBe('string');
expect(typeof messages[0].content).toBe('string');
expect(typeof messages[0].timestamp).toBe('number');
});
test("should handle multiple messages with different data types correctly", async () => {
const testMessages = [
{
id: randomUUID().toString(),
author: generateRandomString(10),
content: generateRandomString(50),
timestamp: generateRandomTimestamp()
},
{
id: randomUUID().toString(),
author: generateRandomString(25),
content: generateRandomString(200),
timestamp: generateRandomTimestamp()
},
{
id: randomUUID().toString(),
author: generateRandomString(8),
content: generateRandomString(75),
timestamp: generateRandomTimestamp()
}
];
dbFileExists = true;
dbFileData = testMessages;
const messages = await db.readMessages();
expect(messages).toEqual(testMessages);
expect(messages).toHaveLength(3);
messages.forEach(message => {
expect(message).toHaveProperty('id');
expect(message).toHaveProperty('author');
expect(message).toHaveProperty('content');
expect(message).toHaveProperty('timestamp');
});
});
test("should not create new database file when file does not exist", async () => {
dbFileExists = false;
await db.readMessages();
expect(writeFileSpy).not.toHaveBeenCalledWith(db.dbFilePath, expect.any(String));
});
test("should return Promise that resolves to expected data format", async () => {
const testMessage = {
id: randomUUID().toString(),
author: "PromiseTestUser",
content: "Testing Promise resolution",
timestamp: Date.now()
};
dbFileExists = true;
dbFileData = [testMessage];
const messagesPromise = db.readMessages();
expect(messagesPromise).toBeInstanceOf(Promise);
const messages = await messagesPromise;
expect(messages).toEqual([testMessage]);
});
});
Validating the prompt The code generated by the LLM cannot be trusted and must be validated manually. There could potentially be security concerns too. There are several steps that are required in validating the prompt.
1) The code generated should fulfill the workflow given and assert the necessary parts 2) The code generated should have utilised the example given and replicate its part accordingly and where necessary 3) The code generated should not contain any strange code that does not make sense as it could potentially contain a shell fork bomb or maybe even a virus 4) The code generated should be ideally bug-free 5) The code generated should be optimal 6) Finally, the code generated should run without any issue
Based on the code, it was relatively impressive that it was able to use the code setup test helper utility code and managed to follow the structure based on the specified code. It looks mostly correct on first glance but if you look carefully, there were some areas which it fell short of. You can run the test by using the following command on your project root directory.
npm run jest
However, just because it can run, doesn’t mean the code itself is correct. This was the output I received after running it.
npm run test
> zenika-chat-app@1.0.0 test
> node --experimental-vm-modules node_modules/jest/bin/jest.js
(node:65995) ExperimentalWarning: VM Modules is an experimental feature and might change at any time
(Use `node --trace-warnings ...` to show where the warning was created)
FAIL __tests__/db.jest.test.js
Message Retrieval Workflow Testing
✕ should return empty array when database file does not exist (4 ms)
✕ should read and parse JSON contents when database file exists with valid data (2 ms)
✕ should return empty array when database file exists but contains empty array (1 ms)
✓ should successfully parse JSON and return data in expected format (1 ms)
✓ should handle multiple messages with different data types correctly (1 ms)
✓ should not create new database file when file does not exist
✓ should return Promise that resolves to expected data format
If we were to inspect the test case “should handle multiple messages with different data types correctly”, we would find that was not useful for our context / usage. This is because our simple flat file database does not have a string length limitation unlike other databases like MySQL or PostgreSQL which would have things like varchar(255) and varchar(63) which imposes different string length limits.
If we were to nitpick even further, there were unused variables in the mockImplementation that could be removed. The code here suggests that the LLM failed to understand how mockImplementation works for Jest. This is because we do not need need to pass in variables as arguments even if the function provides them. We can simply omit them if they’re not used in our mock implementation.
Your experience might differ from mine. Ideally, you should have some coding experience as it would be valuable in aiding you to debug what went wrong with the output code and if it is a simple error, you would then be able to manually fix it. However, if the output code does not even have the sound structure, I would recommend you to start a new conversation with a new adjusted prompt instead of asking the LLM to fix the issues. Patience is required as this is an iterative process and the LLM might not get it right on the first attempt.
Fixing the LLM generated code
In my case, I can debug what went wrong and ask it to fix the errors. Because I have experience with Jest and JavaScript, I could read the error and understand the problem was that the encoding information is missing for the assertion.
Based on the error, I have identified a few ways to fixing this:
1) Change assertion for readFileSpy, specifically its assertion of toHaveBeenCalledWith, to have 'utf-8' as its 2nd parameter
2) OR remove the assertion for readFileSpy, specifically its assertion of toHaveBeenCalledWith
I did not give it the solution. Instead, I simply pasted the entire error log output and asked the LLM to fix it for me. It then made corrections and removal to the test code. To make it more succinct, I’ve remove the areas which are not relevant to the fix.
describe("Message Retrieval Workflow Testing", () => {
beforeAll(() => {
readFileSpy = jest.spyOn(fs, 'readFile').mockImplementation(async (path, encoding) => {
if (path === db.dbFilePath) {
if (!dbFileExists) {
const error = new Error('File does not exist');
error.code = 'ENOENT';
throw error;
}
return Promise.resolve(JSON.stringify(dbFileData || []));
}
if (path === db.lockFilePath) {
if (!lockFileExists) {
const error = new Error('File does not exist');
error.code = 'ENOENT';
throw error;
}
return Promise.resolve(lockFileContent || '');
}
return Promise.resolve();
});
// other code
});
// other code
test("should return empty array when database file does not exist", async () => {
dbFileExists = false;
const messages = await db.readMessages();
expect(messages).toEqual([]);
expect(accessSpy).toHaveBeenCalledWith(db.dbFilePath);
expect(writeFileSpy).not.toHaveBeenCalled();
});
test("should read and parse JSON contents when database file exists with valid data", async () => {
const testMessages = [
{
id: randomUUID().toString(),
author: generateRandomString(15),
content: generateRandomString(100),
timestamp: generateRandomTimestamp()
},
{
id: randomUUID().toString(),
author: generateRandomString(20),
content: generateRandomString(150),
timestamp: generateRandomTimestamp()
}
];
dbFileExists = true;
dbFileData = testMessages;
const messages = await db.readMessages();
expect(messages).toEqual(testMessages);
expect(readFileSpy).toHaveBeenCalledWith(db.dbFilePath, "utf8");
expect(messages).toHaveLength(2);
expect(messages[0]).toHaveProperty('id');
expect(messages[0]).toHaveProperty('author');
expect(messages[0]).toHaveProperty('content');
expect(messages[0]).toHaveProperty('timestamp');
});
test("should return empty array when database file exists but contains empty array", async () => {
dbFileExists = true;
dbFileData = [];
const messages = await db.readMessages();
expect(messages).toEqual([]);
expect(readFileSpy).toHaveBeenCalledWith(db.dbFilePath, "utf8");
});
// other code here
});
Reviewing and evaluating its proposed fix
It seems like the LLM chose with an option similar to (1) but not exactly. It changed the readFileSpy assertion for test case 1 to accessSpy whereas for test cases 2 and 3, it used option (1). These corrections are generally fine. However, it did something that isn’t really correct; it had added legs to a snake (chinese idiom). For whatever reason, it decided that adding encoding into the mockImplementation for readFileSpy would fix the problem. This goes back to reaffirm my previous guess which was that it failed to understand how mockImplementation works. That addition does not fix the problem nor does it really create any new problems, so, not really a big deal in the grand scheme of things. To the LLM’s credit, the test code was fixed!
Let’s continue on with other scenarios that may happen and we want to potentially handle it safely. Here are some unhappy scenarios that might be worth considering:
- What happens when the directory cannot be written?
- What happens when the file cannot be read?
- What happens when the lock-file cannot be read/written?
- What happens if someone manually changed the JSON contents but made the formatting incorrect?
- What happens if someone manually changed the JSON contents but contained invalid JSON format?
For those above-mentioned scenarios, you can apply the same concept and similar prompt and ask the LLM to generate the test code for you. But that seems to be tedious to have to keep copying and pasting the prompt. On top of that, it might be tedious for your team members to use this shared prompt too.
Reusable instructions
The following segments are specific to GitHub co-pilot but similar concepts apply to other LLM tools such as GPT codex, Claude (which would be similar to Claude skills) and on a limited level, Junie too. We will explore two methods of reusing the prompts / instructions. Both methods will have their instructions written in Markdown format, specifically the CommonMark. These files have their extension as .md. They can be used together and does not need to be mutually exclusive.
Transforming into repository-wide instructions
GitHub co-pilot and other agents like Junie allows you to add repository-wide custom instructions that will automatically be used every time it runs. However, although it works fine most of the time, it is not guaranteed to be reliable. Even if the agent has read the file and picked up your instruction, the agent might still misbehave and not use the included information and steps. This is not really the feature’s fault but more of an issue with the LLM’s reliability. You should treat repository instructions as helpful hints to the LLM and not a safety guarantee.
We can create a sub-section where we include details and guidelines as to how to write Jest tests for the repository.
What / where
As of writing, WebStorm IDE supports a single file copilot instructions. The instructions for our prompt should be put into .github/copilot-instructions.md. You would want to include project information such as its language, software architecture, testing framework, tools or other dependencies. In addition, you should add explicit information about how code and tests should be structured and include short examples so the LLM has concrete patterns to follow. You can also ask the LLM to generate repo-wide guidelines for you and then iterate on the generated draft.
How to write effective repository instructions
Keep three constraints in mind: concise, canonical, and scoped.
- Concise: keep each instruction sentence short and focused. Long paragraphs are easy for the model to ignore.
- Canonical: provide a small number of canonical examples that demonstrate the exact structure you expect (file header, imports, test naming, assertions).
- Scoped: include broad project rules but avoid step-by-step workflows or long multi-step procedures. Those belong in prompts.
Include:
- Project-level facts (language, runtime, module system).
- Testing framework and versions (e.g., Jest, config caveats).
- Global conventions (naming, where tests live, mocking preferences).
- A single short example test that demonstrates preferred structure.
Avoid:
- Secrets, long logs, huge code dumps.
- Stepwise “do X then Y then Z” flows that are task-specific.
- Overly long lists; prefer 2–3 clear rules with one compact example.
Example
This is what I prompted the LLM with for generating my guidelines. For the part which it says <insert your prompt here>, I’ve used the example prompt from the one mentioned above. If you have adjusted or changed that prompt, you should use yours where appropriate.
Based on my following prompt, generate a guideline .github/copilot-instructions.md instructions. The project is a JavaScript webstorm IDE project with a server (NodeJS) and the tests are written with Jest. It uses ESM import style and our Jest version supports .spyOn (without need of using unstable module mock). This guideline should be for copilot to write jest tests. Here is my prompt.
<insert your prompt here>
Result Here’s a screenshot of how that looks like. I am not able to share the output here as it contains code blocks in the instructions too.
Using repository-wide instructions
There are no extra steps in your prompt process that you need to take. GitHub co-pilot would automatically read the instructions.
Limitations
This is a repository wide-instruction, its purpose is more suited for project wide guidance than specific tasks. It might not be suited for one-off instruction or complex tasks that involve multiple steps. If our repository instruction is extremely long due to the specifics of how the test should be written, we would be polluting the global context when we’re asking the LLM to work on code that is not related to testing. This is not just costly but it might confuse the LLM, make it more error-prone and less likely to follow the instructions provided. We want our LLM context to be concise and focused.
As an alternative, we can instead create markdown prompt files that can be reused.
Transforming into prompt files
Prompt files are compact, reusable task specifications you store alongside your code so the team can run focused workflows on demand. Their purpose is to separate how something should be done (detailed, runnable steps) from what the project’s global conventions are (the repository instructions). Use prompt files for single-purpose jobs you want repeatably executed and reviewed: generate a unit test for a specific module, scaffold a component with the project’s patterns, run a structured code review, or produce a standardised README from package.json and repo context. Because prompt files are explicit and executable, they make outputs more predictable, speed up tedious work, and improve consistency when the authoring and usage conventions are enforced via Merge Requests (MRs) and README documentation.
It also presents itself as a sweet spot that does not require Model Context Protocol (MCP). Some organisations might disable MCP from GitHub co-pilot as part of their security policies. This is where prompt files could be able to act as a less dynamic and less powerful version of a MCP service but still perform great.
What / where
For GitHub Copilot, prompt filenames should include the .prompt suffix (e.g., write-jest-tests.prompt.md or generate-readme.prompt.md). Store workspace prompts under .github/prompts/ so other contributors can discover them. User prompts can live in your editor profile, but for team use keep them in the repo.
Prompt files can include a small front-matter block to declare mode, model, or tools (if supported by the client). The body should be a focused task spec: inputs to use, the expected output format, and any constraints.
How to write effective prompt files
A good prompt file reads like a small, self-contained specification that is brief enough for the model to grasp entirely but structured enough to yield relatively consistent results every run.
Keep three constraints in mind: concise, canonical, and scoped.
- Concise: each prompt should focus on a single, well-defined task. Avoid long explanations or meta-discussions; the LLM performs best when the intent and output are clearly defined in just a few sentences. Short prompts are easier to debug and maintain.
- Canonical: include a clear, single example that demonstrates exactly what the desired output should look like. The example should be representative and complete — for instance, a short Jest test, a scaffolded component, or a generated README snippet. This gives the LLM a pattern to imitate rather than an abstract description.
- Scoped: prompts should cover only one workflow or deliverable. Do not combine unrelated tasks (e.g., generating both tests and documentation in one prompt). When the task grows too broad, break it into multiple smaller prompt files that can be run independently.
Include:
- The goal or purpose of the task, described in one sentence at the start.
- Specific inputs that the LLM should expect, such as module paths, component names, or function signatures.
- Output format or constraints for example, “return only the
.test.jsfile content” or “use ESM imports.” If you’re doing TypeScript, then change to TS. - One canonical example that demonstrates the correct file structure or output.
- Optional front-matter to specify the mode (
agent), model (gpt-4oor similar), and a short description for clarity. - Clear separation between instructions, inputs, and examples.
Avoid:
- Excessive explanation, reasoning, or commentary. The more verbose the prompt, the more likely the LLM will diverge from your intended output.
- Combining multiple unrelated tasks into one prompt. Keep each file dedicated to a single purpose.
- Ambiguous wording such as “use good practices” without defining what “good” means.
- Redundant examples or conflicting conventions within the same prompt file.
- Hidden or sensitive data such as secrets, API keys, internal URLs, or credentials. Use placeholders if necessary.
Example
There are many examples available on the GitHub awesome-copilot repository that you can browse through to understand and get a feel of how it works. Thereafter, you can ask the LLM to create a prompt file for you. Here’s a prompt to feed the LLM that will produce a ready-to-drop .prompt.md file for generating Jest tests for a single module:
Generate a prompt file suitable for `.github/prompts/write-jest-tests.prompt.md`.
Output the full prompt file contents only (no commentary).
The prompt file should target an ESM Node.js project using Jest and must produce a single ready-to-drop `.test.js` file.
Include front-matter with:
- mode: agent
- model: gpt-5
- description: "Generate Jest unit test for a single module (ESM + jest.spyOn)."
In the body, include these labeled sections: Task, Inputs, Constraints, Output, Example output.
Task: generate one focused Jest test for a specified module and exported function.
Inputs: modulePath, exportedSymbols, testGoal (one-line).
Constraints: use ESM imports, use jest.spyOn() if mocking dependencies, include `afterEach(() => jest.restoreAllMocks())`, keep tests small (<=20 lines).
Output: return only the `.test.js` file content; do not include explanations or extra text.
Example output: provide a minimal working test demonstrating imports, a describe block, afterEach, and one it() block.
Result
Here’s a screenshot of how that looks like. I am not able to share the output here as it contains code blocks in the instructions too.
Using prompt files
They only execute when called upon. Where appropriate, in your conversation, you can type /write-jest-tests for it to execute the prompt file.
And here’s how the output of that prompt file command should look like:
Limitations
Prompt files are powerful but not guarantees. They must be explicitly invoked as they won’t automatically change Copilot’s real-time suggestions unless a user runs them. The editor support and behaviors are still evolving between IDEs, so a prompt that works in one client may behave slightly differently in another. As of writing this article, this feature is still in public preview and is still evolving and undergoing changes. Because they are executable instructions, prompt files also require maintenance: naming, discoverability, example fidelity, and expected output formats must be kept in sync with code changes, and they should be reviewed via Merge Requests (MR) like other behavior-changing files. Finally, because LLM outputs can vary, assume a validation step: run generated artifacts in a sandbox, add simple tests or linting rules, and keep prompts intentionally small and single-purpose to reduce unpredictable results.
Quick Summary
| Aspect | Repository instructions | Prompt files |
|---|---|---|
| Description | Persistent, repository-wide configuration stored at .github/copilot-instructions.md. Copilot automatically includes it when generating suggestions. | Smaller reusable instructions stored as .prompt.md files. Each file can be manually triggered or referenced when needed. |
| Pro | Automatically applied for anyone working in the repository, ensuring consistent behavior. | Designed for specific, repeatable tasks such as scaffolding, testing, or documentation generation. |
| Pro | Ideal for defining project-wide conventions, language style, architecture rules, or testing frameworks. | Can specify mode, model, and tools in front-matter, making prompts flexible and specialized. |
| Pro | Helps onboard new contributors by embedding general guidance directly into Copilot’s context. | Encourages modularity: different teams or features can maintain their own task-focused prompts. |
| Con | Less effective for multi-step or one-off operations that don’t apply to all areas of the project. | It is not automatically applied. Someone has to explicitly run or invoke the prompt. |
| Con | Large or overly detailed instructions can dilute Copilot’s focus and reduce reliability. | Feature support is still evolving across IDEs; may behave differently between VS Code and WebStorm. |
| Limitation | Scope is repository-wide. Overly specific content can pollute the context for unrelated tasks. | Scope is local to each prompt; doesn’t persist unless called. |
| Limitation | Usually only one file is supported; formatting rules are strict and extra whitespace can affect parsing. | Requires consistent naming (.prompt.md) and manual organising to remain discoverable. |
| When to use | When you want persistent, general guidance that shapes all Copilot suggestions be it language, architecture, or testing frameworks. | When you need focused, repeatable workflows that can be run on demand such as code reviews, test generation, or documentation. |
| Example | .github/copilot-instructions.md describes how Jest tests should be structured across the entire repo. | .github/prompts/write-jest-tests.prompt.md contains detailed steps and examples for generating individual test cases. |
Combined approach
We can combine a hybrid approach of using both repository-wide instructions and prompt files. This can greatly improve your overall prompt experience as the results would be more tuned to your setup and tasks. It decreases the amount of file scanning and tokens required by the LLM as the repository-wide instructions already provides necessary project information. It also provides a more targeted result through task specific prompt files.
What / how
- Keep repository instructions broad: language, architecture, high-level testing rules, and one canonical example.
- Move detailed steps, templates, and per-task examples into prompt files that are invoked when needed.
- Exclude detailed test structure from repo instructions and put those specifics in prompt files like
write-jest-tests.prompt.md.
This reduces context noise while giving engineers actionable prompts for specific tasks.
Example
Let’s prompt our LLM to generate a new set of repository-wide guidelines specific for our hybrid approach. This new repository-wide guidelines need to accommodate for the existence of prompt files.
Generate a guideline file .github/copilot-instructions.md instructions. Take note that there will eventually be GitHub co-pilot prompt files that cover their specific areas such as testing. This means that you should exclude specific or detailed information or examples about how exactly such areas should be implemented. However, you are free to provide broad general guidelines on these areas. The project is a JavaScript webstorm IDE project with a server (NodeJS) and the tests are written with Jest. It uses ESM import style and our Jest version supports .spyOn (without need of using unstable module mock).
As for the prompt file, we can keep it as it is because it is already scoped to its specific task.
Result
Here’s a screenshot of how that looks like. I am not able to share the output here as it contains code blocks in the instructions too.
Limitations
Even with the combined effectiveness of repository-wide instructions and task specific prompts, it still fall short in certain areas. It is not as dynamic as MCP and thus, would not be able to tightly integrate with other areas such as your Continuous Integration (CI) setup or other tools that exists elsewhere. Depending on your complexity of use-case, MCP might be more appropriate. As these are static Markdown file instructions, there is also a maintenance cost to consider on the long run where you would need to regenerate the repository-wide instructions or task specific prompts as your software and its dependencies evolves.
Limitations and Tradeoffs
Requires understanding of how to write unit tests
You may not need tool specific knowledge for the basics of writing Jest unit tests using LLM. However, it does test your knowledge and understanding of how to structure and write unit tests, including concepts like spying, mocking and when to effectively use them.
May require debugging
For our basic scenario, asking the LLM to debug and fix the issues mentioned would be relatively easy. But sometimes, it could be completely off and even starting new conversation with a different prompt would not work. This might mean that it is not within the LLM’s ability to fix the problem and thus, would require manual intervention. This means that you’ll still need to proficient in debugging and understanding what is wrong.
Simple workflows
The examples given are by no means representative of real world applications where there are lots of moving parts, resource limitations imposed and the bigger scale of things.
End-to-end tests covered in next article
As this article was extremely long, I have decided to break it down to 2 parts for testing. On the next article for this series, we will look further into how we can use Playwright and LLM assistance to write our end-to-end (e2e) tests for us. So be sure to stay-tuned!
Source code repository
The code repository for this article will be made available on GitHub after the last article in the series is published.
Conclusion
We’ve delved into the depths of testing using Jest and LLM assistance. Through this process, we have learnt:
- When to use LLM and when not to use LLM for assisting in writing tests
- How to document important test specifications for the LLM
- Specifying certain code-pieces and structure for the LLM to follow
- Iterative approach of prompting the LLM
- How to debug when the LLM result fails
- Evaluated pitfalls of LLM and how to intervene when its fixes fail too
- Writing reusable prompt files
Based on your experience and knowledge, are there anything that you think should be added or improved upon? Leave your suggestions or comments below! Or if you have any questions, please feel free to ask. Collaboration is how our community grows. Happy coding!