Skip to content

Test Stability Fixes

Issue

Random test failures in the screenflow service where tests would be killed intermittently during execution. The process would show "Killed" without completing all test suites.

Root Causes Identified

1. Memory Leaks from Unclosed NestJS Test Modules

Problem: Test files using Test.createTestingModule() were not calling module.close() in afterEach hooks, causing memory leaks that accumulated over time.

Impact: As tests ran, memory usage increased until the OS killed the process to prevent system instability.

Affected Files:

  • All use-case test files in services/screenflow/src/modules/application-layer/use-cases/
  • Infrastructure service tests in services/screenflow/src/modules/infrastructure-layer/guacamole/
  • Package tests in packages/nest/src/framework/

2. Unhandled setTimeout Calls

Problem: Some tests used setTimeout() without properly awaiting them, causing assertions to run after the test had already completed.

Impact: Tests would pass or fail randomly depending on timing, and could leave pending timers that interfere with subsequent tests.

Affected Files:

  • services/screenflow/src/modules/domain-layer/aggregates/resource/resource.aggregate.spec.ts
  • services/screenflow/src/modules/domain-layer/entities/instance-definition/instance-definition.entity.spec.ts

Solutions Applied

Fix 1: Added Proper Test Module Cleanup

Added afterEach hooks to properly close NestJS test modules:

typescript
describe('SomeUseCase', () => {
    let useCase: SomeUseCase
    let module: TestingModule // Added module variable
    
    beforeEach(async () => {
        module = await Test.createTestingModule({ // Changed from const
            providers: [...]
        }).compile()
        
        useCase = module.get<SomeUseCase>(SomeUseCase)
    })
    
    afterEach(async () => {
        await module?.close() // Added cleanup
    })
})

Files Fixed:

  • services/screenflow/src/modules/application-layer/use-cases/aggregates/resource/start-instance/start-instance.use-case.spec.ts
  • services/screenflow/src/modules/application-layer/use-cases/aggregates/resource/delete-instance/delete-instance.use-case.spec.ts
  • services/screenflow/src/modules/application-layer/use-cases/aggregates/resource/delete-instance-definition/delete-instance-definition.use-case.spec.ts
  • services/screenflow/src/modules/application-layer/use-cases/aggregates/resource/update-instance/update-instance.use-case.spec.ts
  • services/screenflow/src/modules/application-layer/use-cases/aggregates/resource/update-instance-definition/update-instance-definition.use-case.spec.ts
  • services/screenflow/src/modules/infrastructure-layer/guacamole/input-classifier/guacamole-input-classifier.service.spec.ts
  • services/screenflow/src/modules/infrastructure-layer/guacamole/guacamole-parser/guacamole-parser.service.spec.ts
  • packages/nest/src/framework/log/logger.service.test.ts
  • packages/nest/src/framework/langchain/langchain-hub.service.spec.ts

Fix 2: Properly Awaited setTimeout Calls

Changed unhandled setTimeout to properly awaited promises:

Before:

typescript
it('should handle multiple delete calls', () => {
    resource.deleteInstanceDefinition()
    const firstDeleteTime = resource.instanceDefinition.getDeletedAt()
    
    setTimeout(() => {
        resource.deleteInstanceDefinition()
        const secondDeleteTime = resource.instanceDefinition.getDeletedAt()
        expect(secondDeleteTime?.getTime()).toBeGreaterThan(firstDeleteTime?.getTime() || 0)
    }, 1)
})

After:

typescript
it('should handle multiple delete calls', async () => {
    resource.deleteInstanceDefinition()
    const firstDeleteTime = resource.instanceDefinition.getDeletedAt()
    
    await new Promise((resolve) => setTimeout(resolve, 10))
    
    resource.deleteInstanceDefinition()
    const secondDeleteTime = resource.instanceDefinition.getDeletedAt()
    expect(secondDeleteTime?.getTime()).toBeGreaterThan(firstDeleteTime?.getTime() || 0)
})

Files Fixed:

  • services/screenflow/src/modules/domain-layer/aggregates/resource/resource.aggregate.spec.ts (line 524)
  • services/screenflow/src/modules/domain-layer/entities/instance-definition/instance-definition.entity.spec.ts (line 159)

Test Results

Before Fixes

PASS screenflow src/modules/domain-layer/aggregates/procedure/procedure.aggregate.spec.ts (8.974 s)
PASS screenflow src/modules/domain-layer/aggregates/resource/resource.aggregate.spec.ts
PASS screenflow src/modules/domain-layer/entities/instance-definition/instance-definition.entity.spec.ts
PASS screenflow src/modules/domain-layer/entities/instance/instance.entity.spec.ts
PASS screenflow src/modules/infrastructure-layer/guacamole/guacamole-parser/guacamole-parser.service.spec.ts
Killed
Warning: command "jest" exited with non-zero status code

After Fixes

Test Suites: 12 passed, 12 total
Tests:       176 passed, 176 total
Time:        2.451 s - 7.237 s (varies)
Ran all test suites.

✓ Ran 3 consecutive times without failures
✓ All 4 projects (screenflow, coding, nest, valuation) passing

Best Practices for Future Tests

1. Always Clean Up NestJS Modules

typescript
let module: TestingModule

beforeEach(async () => {
    module = await Test.createTestingModule({...}).compile()
})

afterEach(async () => {
    await module?.close()
})

2. Never Use Unhandled Timers in Tests

typescript
// ❌ Bad
setTimeout(() => { /* test assertions */ }, 100)

// ✅ Good
await new Promise((resolve) => setTimeout(resolve, 100))
/* test assertions */

3. Use Async/Await for All Async Tests

typescript
// ❌ Bad
it('test name', () => {
    setTimeout(...)
})

// ✅ Good
it('test name', async () => {
    await new Promise(...)
})

The Jest configuration already had good settings to help prevent issues:

services/screenflow/jest.config.ts:

  • testTimeout: 60000 - 60 second timeout for individual tests
  • maxWorkers: 1 - Run tests sequentially to avoid memory issues

jest.preset.js:

  • testTimeout: 30000 - 30 second default timeout
  • maxWorkers: '50%' - Use half of available cores

Impact

  • Stability: Tests now run consistently without random failures
  • Performance: Faster test runs (2-7s vs previously hanging/killed)
  • Memory: No memory leaks from unclosed modules
  • Reliability: Can be run multiple times consecutively without issues

Verification

Run tests multiple times to verify stability:

bash
# Run screenflow tests 3 times
for i in {1..3}; do nx run screenflow:test --skip-nx-cache; done

# Run all project tests
nx run-many --target=test --skip-nx-cache --projects=screenflow,coding,nest,valuation