Developers Guide to the JobTech Taxonomy API

DON'T PANIC

The JobTech Taxonomy API is big. You just won't believe how vastly, hugely, mind-bogglingly big it is.

Here you can find information on how to install a development environment, build requirements, how the project is structured, and how to configure, build, test, and finally deploy the JobTech Taxonomy API.

Open Source

The JobTech Taxonomy API is provided under the EPL-2.0 license.

Copyright © 2019 JobTech

Introduction

The documentation is split into two different part.

This part, aimed at developers of the taxonomy, and a part for taxonomy users. If you have questions about what a taxonomy is, how the JobTech Taxonomy relates to the job market, and how to use the taxonomy it is that second part that you want.

For new developers

An overview of chapters describing fundamental project information.

From source to running

Chapters describing different steps in getting the taxonomy off the ground.

Development

The following chapters describe how the repository is structured, where and how the documentation is built, how to set up the development environment, and the major building blocks used in developing the taxonomy.

Project Layout

This chapter describes how the project is structured. Where to find what files.

Documentation

Some information on how this and the user documentation is written, where it is stored, and how to build it.

Development environment

How to set up a local development environment.

Technologies

This chapter introduces the larger components that are used by the taxonomy. Things like the database layer, the web server, logging, testing, documentation, and so on.

Getting started

Follow the instructions in https://clojure.org/guides/install_clojure to get Clojure up and running.

Starting the system

Now it should suffice to run clj -M:dev -m dev in the clone of the taxonomy API repo to build and deploy the API. The logging messages should provide a link to where the API can be tested. If you run from your local machine and use the default config the link should be http://localhost:8080 or similar.

Running tests

  • clj -M:test:kaocha runs all tests.
  • clj -M:test:kaocha --watch runs all tests continuously and looks for changes to the code.
  • clj -M:test:kaocha --focus :unit --watch runs only the (slightly faster) unit tests continuously.

Troubleshooting

Sometimes, documentation and code can get out of sync. When this happens—when the system builds and runs but not as described in the documentation—here are some tips to help you resolve the issue.

Note

If you find discrepancies between the code and documentation, please create a merge request to update the documentation!

To troubleshoot continuous integration pipeline bugs, build errors, or unexpected runtime behavior, refer to the following files. They often provide clues about what is supposed to work:

  1. Dockerfile – Contains setup instructions for the Docker environment.
  2. .gitlab-ci.yml – Configures the continuous integration pipeline.
  3. build.clj – Manages the build process and dependencies.

Look to the CI system

The .gitlab-ci.yml file contains information on how the system is built, tested, and deployed. It is an excellent starting point to see what step fails and why. It also gives a hint of what the expected environment for the build is and what commands should work.

When this documentation was built it looked like this:

include:
- remote: https://gitlab.com/arbetsformedlingen/devops/gitlab-cicd/jobtech-ci/-/raw/v2.1.1/jobtech-ci.yml
- remote: https://gitlab.com/arbetsformedlingen/devops/gitlab-cicd/jobtech-ci/-/raw/v2.1.1/deploy.yml
variables:
  TEST_DISABLED: 1
  DEPLOYS_DISABLED: 1
  
  SCHEMATHESIS_IMAGE: schemathesis/schemathesis:3.28.1
  
  CLOJURE_BUILD_IMAGE: clojure:temurin-22-tools-deps-1.11.3.1456-jammy

  RUNTIME_BASE_IMAGE: eclipse-temurin:22.0.1_8-jre-alpine
  
  AUTO_DEVOPS_BUILD_IMAGE_EXTRA_ARGS: "--build-arg BUILDER_BASE_IMAGE=$CLOJURE_BUILD_IMAGE --build-arg RUNTIME_BASE_IMAGE=$RUNTIME_BASE_IMAGE"
  TAXONOMY_DEVELOP_SWAGGER_URL: http://jobtech-taxonomy-api-develop.apps.testing.services.jtech.se/v1/taxonomy/swagger.json

# This will store the libraries from deps.edn in a cache that will
# be restored in the next job that uses the same cache key
.cache-template: &cache-template
  - key:
      files:
        - deps.edn
    paths:
      - .m2/repository

# Prepare and download the libraries in deps.edn.
# The result is cached between jobs and pipelines
# if there are no changes in deps.edn.
prepare_taxonomy_api_libraries:
  stage: build
  image: $CLOJURE_BUILD_IMAGE
  cache: *cache-template
  script:
    - clojure -A:dev:prof:kaocha:prod:build:test:clj-kondo:eastwood:cljfmt:nsorg:outdated -P
    - clojure -T:build update-build
  artifacts:
    paths:
      - target

# This job is defined in Auto DevOps template.
# Here we tell it to use cached dependencies and
# start after the documentation is build.
build:
  needs: [prepare_taxonomy_api_libraries]
  cache: *cache-template

schemathesis_get_graphql_tests:
  rules:
    - if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH
  allow_failure: true
  stage: test  
  image:
    name: $SCHEMATHESIS_IMAGE
    entrypoint: [""]
  script:
    - st run --junit-xml=schemathesis-get-graphql-junit.xml --hypothesis-deadline=None --show-trace --method GET --endpoint "^/v1/taxonomy/graphql" $TAXONOMY_DEVELOP_SWAGGER_URL
  artifacts:
    paths:
      - schemathesis-get-graphql-junit.xml
    reports:
      junit: schemathesis-get-graphql-junit.xml

schemathesis_exclude_graphql_tests:
  rules:
    - if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH
  allow_failure: true
  stage: test
  image:
    name: $SCHEMATHESIS_IMAGE
    entrypoint: [""]
  script:
    - st run --junit-xml=schemathesis-exclude-graphql-junit.xml --hypothesis-deadline=None --show-trace --endpoint "^/v1/taxonomy/(?!graphql).+" $TAXONOMY_DEVELOP_SWAGGER_URL
  artifacts:
    paths:
      - schemathesis-exclude-graphql-junit.xml
    reports:
      junit: schemathesis-exclude-graphql-junit.xml

datomic_kaocha_tests:
  stage: test
  needs: [prepare_taxonomy_api_libraries]
  image: $CLOJURE_BUILD_IMAGE
  cache: *cache-template
  script:
    - apt-get update && apt-get install -y python3-pip graphviz && pip3 install lcov_cobertura
    - DATABASE_BACKEND=:datomic-v.kaocha clj -M:test:kaocha --profile :ci
    - lcov_cobertura target/coverage/lcov.info -o coverage.xml
  coverage: /^\|\s+ALL FILES\s\|\s+(\d+\.\d+)\s\|\s+\d+\.\d+\s\|\s*$/
  artifacts:
    paths:
      - target/coverage/lcov.info
      - cobertura.xml
    reports:
      junit: target/junit.xml
      coverage_report:
        coverage_format: cobertura
        path: coverage.xml

server_api_tests:
  stage: test
  needs: [prepare_taxonomy_api_libraries]
  image: $CLOJURE_BUILD_IMAGE
  cache: *cache-template
  script:
    - apt-get update && apt-get install -y graphviz
    - ./test/gitlab-ci/api-tests.sh
  artifacts:
    paths:
      - target/junit-api-test-report.xml
    reports:
      junit: target/junit-api-test-report.xml

autodeploy_testing_read:
  stage: test
  image: docker:25.0.0-beta.2
  services:
    - docker:25.0.0-beta.2-dind
  resource_group: write-infra
  variables:
    OVERLAY: "jobtech-taxonomy-api-testing-read"
  rules:
    - if: $CI_COMMIT_REF_SLUG == $CI_DEFAULT_BRANCH
  needs:
    - code_quality
    - container_scanning
    - datomic_kaocha_tests
    - dockerfile-lint
    - secret_detection
    - semgrep-sast
    - server_api_tests
    - schemathesis_get_graphql_tests
    - schemathesis_exclude_graphql_tests
  script:
    - !reference [.deploy, script]

Look to the Container Build

In general it is a good idea to take a look in the Dockerfile to see what base image is being used and what additional software is installed before the taxonomy is built and installed.

Currently the content of the Dockerfile is the following:

# This is a multi-stage build.
# The first stage is used to prepare the runtime image.
# The second stage is used to build the application.
# The third stage is used to run the application.

# The local versions of the builder and runtime images.
# These are replaced by the CI/CD pipeline.
ARG BUILDER_BASE_IMAGE=clojure:temurin-22-tools-deps-jammy
ARG RUNTIME_BASE_IMAGE=eclipse-temurin:22-jre-alpine

# The image used to run the application
# Here we add the helper software that do not change often.
# Also setup the user that will run the application and
# give it permissions for the application folder.
# This caches the layer and speeds up the build.
FROM $RUNTIME_BASE_IMAGE as runtime

RUN apk add --update --no-cache graphviz ttf-freefont && \
    addgroup -S runner && \
    adduser -S runner -G runner

# The image used to build the application.
# We copy the entire directory so that we get the benefit
# of caching the dependencies and prepared libraries.
FROM $BUILDER_BASE_IMAGE as builder

WORKDIR /build

COPY . .
# If built in a clean repo, this step is needed
# to prepare the clojure libraries.
RUN clojure -A:prod:build -P

# Build the application uber-jar.
RUN clojure -T:build uber

# The final image that will run the application.
# We copy the built jar and the resources,
# expose the default port and run the application.
FROM runtime

WORKDIR /app

COPY --from=builder /build/target/app.jar app.jar
COPY --from=builder /build/resources/taxonomy.zip resources/taxonomy.zip
COPY --from=builder /build/resources/mappings resources/mappings

RUN chown -R runner:runner /app

USER runner

EXPOSE 3000

CMD java -jar app.jar

Look to the Build System

If the build fails there can be information about what was expected in the build.clj file. This file is mainly used for build in the Dockerfile, but it may be useful on its own also.

(ns build
  (:require [clojure.tools.build.api :as b]
            [java-time.api :as jt]))

(def class-dir "target/classes")
(def basis (b/create-basis {:project "deps.edn"}))

(defn ^:export update-build [_]
  (let [commit (b/git-process {:git-args "rev-parse --short HEAD"})
        branch (b/git-process {:git-args "branch --show-current"})
        build-info {:branch branch
                    :built (jt/format (jt/zoned-date-time))
                    :commit commit}]
    (spit "resources/build-info.edn" build-info)))

(def uber-file "target/app.jar")

(def copy-src ["src/clj" "env/prod/clj" "env/prod/resources" "resources"])

(defn clean [_]
  (b/delete {:path "target"}))

(defn ^:export uber [_]
  (clean nil)
  (b/copy-dir {:src-dirs copy-src
               :target-dir class-dir})
  (b/compile-clj {:basis basis
                  :src-dirs ["src"]
                  :ns-compile '[prod]
                  :bindings {#'clojure.core/*warn-on-reflection* true}
                  :class-dir class-dir})
  (b/uber {:class-dir class-dir
           :uber-file uber-file
           :basis basis
           :main 'prod}))

From source to running

The following chapters describes (roughly in logical order) how to go from the source code in the repository to having a running JobTech Taxonomy API in an production environment.

Configuration

How to configure the taxonomy; both for build and when starting the built software.

Building

How to build the taxonomy.

Testing

How to test the taxonomy.

Running

How to run the taxonomy.

Deploying

How to deploy the taxonomy to a production environment.

Monitoring

How to monitor the running taxonomy.

Project Layout

The JobTech Taxonomy API project is structured in a way that tries to make it easy to find the code and resources you need. The project is divided into several directories, each with a specific purpose. The following table below provides an overview of the directories in the project.

Major features that may not be necessary for all users are placed in the features directory. The env directory contains aliases and configurations for different environments. The resources directory contains files that are used by the JobTech Taxonomy API but that are not generated during build, found in a database or source code.

Features

The features directory contains optional features that can be enabled or disabled. These features are not necessary for the core functionality of the JobTech Taxonomy API. The features are implemented as middleware and can be enabled or disabled depending on the configuration. The features are intended to allow for easy extensibility of the JobTech Taxonomy API and to increase security by, for example, deploying without the taxonomy editor related features.

Directories

DirectoryDescription
docsDocumentation
docs/developmentDevelopment-related documentation.
docs/taxonomyEnd user documentation.
envFunctionality related to various aliases.
env/devSettings, tools and helpers for development.
env/kaochaConfiguration for the Kaocha test runner.
env/prodProduction environment configurations.
env/profProfiling configurations.
env/testTesting related code.
resourcesFiles that are used by the JobTech Taxonomy API but that are not generated during build, found in a database or source code.
resources/publicStatic web resources and documents used by the JobTech Taxonomy API.
srcSource code for most of the implementation of the JobTech Taxonomy API.
src/clj/jobtech_taxonomy/apiThe server, database and API implementation.
src/clj/jobtech_taxonomy/commonUtilities and code shared by the API and features.
src/clj/jobtech_taxonomy/featuresOptional features.
testRoot directory for test sources and resources.
test/clj/integrationFor tests that go from HTTP request to response.
test/clj/unitUnit test code.
test/clj/utilsHelper functions for the tests.
test/resourcesData used in tests.

Documentation

The documentation is written in markdown and is built using mdbook.

See documentation tools for more information on how to install and usemdbook.

The documentation is located in the docs directory and is split in two parts: taxonomy and development.

This is the docs/development part of the documentation and it is intended for developers working on the JobTech Taxonomy API. It explains how the system is built, how to set up a development environment, how to build and run the system, how to test it, and how to deploy it.

The docs/taxonomy part is the user documentation and is intended for users of the JobTech Taxonomy API. It explains what the Taxonomy is, what information it contains, how to use it, and how to integrate it with other systems.

Writing the documentation

Linting

When writing the documentation, do use the markdown linter and the spell checker.

The documentation is written in markdown and require no special tools to edit. You can use any text editor to write the documentation. To see the changes as you write you can however use mdbook to serve the documentation locally.

Serving the documentation

To serve the documentation locally run:

mdbook serve docs/taxonomy

or, if you wish to run mdbook in a way that allows you to access the documentation from other devices on your network:

mdbook serve -n 0.0.0.0 docs/taxonomy

Substitute docs/taxonomy for docs/development if you wish to work on the development part of the documentation.

Building the documentation

The CI system will build both the development and taxonomy parts of the documentation. It will then publish the development part to the GitLab Pages for the JobTech Taxonomy API repository. The taxonomy part is embedded into the image built for deployment of the JobTech Taxonomy API.

To build the documentation locally run:

mdbook build docs/taxonomy

or, if you wish to build the development part of the documentation:

mdbook build docs/development

Tools

Both mdbook and the plugins listed below are included in the CI Tools image. The plugins are used to render diagrams, admonition boxes, and provide tools to be used in the documentation. They are not required to build the documentation but are useful to make it more readable and informative.

Plugins

The following plugins are used to render diagrams, admonition boxes, and provide tools to be used in the documentation:

mdbook-admonish

This plugin allows you to add fancy notes to your documentation.

    ```admonish warning
    Writing software without tests is like skydiving without a parachute.
    ```

Will display as:

Warning

Writing software without tests is like skydiving without a parachute.

mdbook-katex

This lets us write math in the LaTeX style.

$$ \eta \beta \pi $$ becomes $$ \eta \beta \pi $$

mdbook-plantuml
    ```puml
    @startuml Hello!
    Bob->Alice : Hello!
    @enduml
    ```

Will display as:

PlantUML rendering error: Failed to generate diagram (error sending request for url (https://www.plantuml.com/plantuml/svg/SoWkIImgAStDKV18pSd9L-Hooa_IjNFCoKnELR22YJcavgK0bN0100==): error trying to connect: error:0A000086:SSL routines:tls_post_process_server_certificate:certificate verify failed:../ssl/statem/statem_clnt.c:1889: (unable to get local issuer certificate))

mdbook-mermaid
    ```mermaid
    mindmap
      root((Mermaid))
        Can be used for graphs
          Graphs
          Diagrams
          Flowcharts
        Just
          Another
            Tool
            Brick
          This
    ```

Will display as:

mindmap
  root((Mermaid))
    Can be used for graphs
      Graphs
      Diagrams
      Flowcharts
    Just
      Another
        Tool
        Brick
      This

Configuration

There are two main ways the system is configured. The first is during build when the selected aliases change how the system will start and what components get enabled. The second way the system is configured is by the provided command line arguments, environment variables and configuration files during startup.

More on how to build the taxonomy for specific configurations can be found in the at build section and (with more focus on the build system) in the build chapter.

The available configurations for the software is described in the at start section.

At Build

Configuration at build is achieved by enabling aliases that select one or more extra source directories (by convention stored under env/<alias name>/clj/).

These aliases are enabled as other aliases by using the -M and then adding a colon separated list of alias names. So to enable the test sources and kaocha functionality one would supply the flag -M:test:kaocha.

The aliases are defined in deps.edn.

At Start

The server can be configured from the command-line when it is started. For example, the following will start a server with the dev alias and default configuration:

clj -M:dev -m dev

By default, it will use the configuration found at config.edn:


{:backends
 [{:allow-create true
   :cfg
   {:db-name "jobtech-taxonomy-db" ,
    :server-type :datomic-local,
    :storage-dir ".taxonomy-db",
    :system "jobtech-taxonomy"},
   :id :datomic-v.dev,
   :init-db {:filename "resources/taxonomy.zip", :unzip true},
   :type :datomic}],
 :jobtech-taxonomy-api
 {:auth-tokens {:111 :user, :222 :admin}, :user-ids {:222 "admin-1"}},
 :mappings {:source-directory "resources/mappings"},
 :options
 {:deployment-name "Default Develop", :jetty-threads 32, :log-level :debug, :port 8080}}

The server port depends on which alias is being used.

The configuration file can be explicitly set using the --config flag or its short form -c:

clj -M:dev -m dev --config config.edn

The server port can be explicitly set using the --port flag or its short form -p:

clj -M:dev -m dev --port 5020

Summary of command-line flags

The following table is a summary of the available flags.

FlagArgumentDescription
--config, -cConfiguration fileSpecify which configuration file to use.
--port, -pPort numberSpecify the server port at which the API can be reached.

Provided configurations

Here is a list of some configuration files provided in this repository. For the actual configuration files used when the API is deployed, see the config*.edn files in the repository jobtech-taxonomy-api-infra.

PathDescription
config.ednConfiguration with sane defaults suitable for development.
env/dev/resources/config.dev.ednDefault configuration used with the dev alias.
env/prod/resources/config.ednDefault configuration used for production.
config.container.ednConfiguration suitable for containers.
test/resources/config/config.ednConfiguration used in some tests.
test/resources/config/config.datomic.ednConfiguration used for testing Datomic.
test/resources/config/config.datahike.ednConfiguration used for testing Datahike.
test/resources/config/config-local-storage.ednConfiguration used for testing local storage.

Configuration file structure

The configuration file contains an EDN-encoded data structure with the following keys:

KeyRequired?Description
:backendsYesA vector of configurations for database backends. Each configuration has an id.
:database-backendKey referring to the id of a backend under :backends.
:compare-backendsList backends to compare.
:optionsOptions for the server, such as port number.
:jobtech-taxonomy-apiSettings regarding the taxonomy.

At the :backends key, a set of configurations for database backends is listed, each with a unique id at its :id key. Under normal conditions, there will only be one backend configuration which is used. In that case, there is no need to provide values for the keys :database-backend and :compare-backends.

However, for the sake of comparing the correctness of different database backends, it is possible to list many backends at the :backends key. If the :database-backend key does not have a value, by default the first backend in the list of backends will be used. the :compare-backends can be set to a vector of backend ids for backends to be compared. It is also possible to specify which one of the backends that should be used by the API at the :database-backend key.

Here is an example of what a configuration can look like:

{:id :prod
 :options {:port 3000
           :log-level :info}
 :wanderung-source :nippy
 :wanderung {:nippy {:wanderung/type :nippy :filename "./test/data/taxonomy.nippy"}}
 :database-backend :datahike
 :compare-backends [:datomic :datahike]
 :backends [{:id :datahike
             :type :datahike
             :threads 12
             :cfg {:store {:backend :mem}
                   :attribute-refs? true
                   :name "from-nippy"}}
            {:id :datomic
             :type :datomic
             :threads 12
             :cfg {:db-name "jobtech-taxonomy-prod-2023-02-07-11-19-36"
                   :server-type :ion
                   :region "eu-central-1"
                   :system "tax-prod-v4"
                   :endpoint "http://entry.tax-prod-v4.eu-central-1.datomic.net:8182/"}}]}

Under the :backends key, we see the backends with ids :datahike and :datomic. Of these two, we specify that :datahike should be used at the :database-backend key and that the results of the two backends should be compared at the :compare-backends key.

Specific Database Backends

This section describes some specific database backends.

Datomic local file based - EBS

To utilize Datomic's local file-based backend for persistent storage, setting up the database storage is necessary. The most straightforward approach currently involves mounting a volume to the pod, with the underlying file system being Elastic Block Store (EBS) in AWS. For detailed configuration steps, refer to the infra repository's kustomize overlays. Below is a brief guide on setting up EBS in AWS.

Create volume To start, log in to your AWS account and navigate to the EC2 dashboard. From there, locate the "Elastic Block Store" section and click on "Volumes."

Create volume Within the "Volumes" section, locate the "Create Volume" button in the top right corner and click on it to proceed with creating a new volume.

Volumes Select the "gp2" volume type and specify the desired size (ensure it matches what you've set in the Persistent Volume (PV) and Persistent Volume Claim (PVC) configurations in the infra repository). Once the volume is created, you can copy the volume ID and paste it into your kustomize configuration in the infra repository.

Building

For the sake of deploying the taxonomy API to cloud infrastructure, it is recommended to build an uberjar that will contain all the dependencies needed for running the taxonomy. The following command will build an uberjar:

clj -T:build uber

It is possible to build a Docker image with the taxonomy API. The Docker image build process is specified in the Dockerfile at the root of this repository. The Dockerfile will invoke the command RUN clojure -T:build uber to build an uberjar used inside the Docker image. Building an uberjar ensures that all dependencies are resolved when the Docker image is built instead of when it is executed.

The Docker image can be built using podman (or docker) with the following command invoked from the root of this repository, where we tag the image my-taxonomy-image:

podman build -t my-taxonomy-image .

Build Requirements

The build requirements tend to vary a bit depending on what technologies are currently being used and features under development. The more stable ones are listed here and new ones should be added or removed as they appear and disappear.

Clojure

Install the latest version of Clojure for your platform. Except for Clojure's Java dependency this should be all that is required for a [first launch].

Testing the Taxonomy

To test the Taxonomy, we use Kaocha test runner. There are two types of tests: unit tests test isolated parts of the code and should run very fast for quick feedback during development. are located under test/clj/unit. Integration tests test various aspects of the system as a whole and are located under test/clj/integration. The integration test setup creates a temporary database for each test, which makes it safe to do any modifications without leaving traces behind. Test resource files can be found under test/resources. There is also some old tests that has not been broken apart yet to either integration or unit under directory base.

clj -M:test:kaocha will run all test on the default database backend (Datomic) in memory, and exclude Datahike specific tests.

It is possible to run the test suite with a specific database backend that is specified using the environment variable DATABASE_BACKEND. The unit tests use the configuration file test/resources/config/config.edn, which lists the backends with ids :datomic-v.kaocha and :datahike-v.kaocha. By setting the environment variable to one of these keys before we invoke the above command, the tests will be executed with that backend. In Bash, the syntax lets us set the environment variable on the same line as the command is invoked on:

DATABASE_BACKEND=:datomic-v.kaocha clj -M:test:kaocha

Useful Kaocha Commands

Kaocha provides several flags that can help tailor your testing process. Here are a few important ones:

--reporter

By default, Kaocha doesn't show much detail about the tests being run. To get more information, you can use the --reporter option. A good choice is the documentation reporter, which shows detailed output for each test.

  • Purpose: Controls how test results are displayed.
  • Example:
    clj -M:test:kaocha --reporter kaocha.report/documentation
    
    This command provides detailed output about each test run.

--focus

  • Purpose: Runs only a specific test or group of tests by name.
  • Example:
    clj -M:test:kaocha --focus my.test.namespace/specific-test
    

--focus-meta

  • Purpose: Runs tests that have specific metadata tags.
  • Example:
    clj -M:test:kaocha --focus-meta :integration
    

--seed

  • Purpose: Sets the seed for random test order to make test runs reproducible.
  • Example:
    clj -M:test:kaocha --seed 12345
    
    This ensures that tests run in the same order every time, which is useful for debugging.

--watch

  • Purpose: Automatically re-runs tests when files change.
  • Example:
    clj -M:test:kaocha --watch
    
    This keeps Kaocha running in the background, watching for file changes and rerunning tests as needed.

--fail-fast

  • Purpose: Stops the test suite as soon as a failure is encountered.
  • Example:
    clj -M:test:kaocha --fail-fast
    
    This is useful when you want to address the first error before running the rest of the tests.

You can combine different flags and commands to suit your needs. For a full list of reporters, see the Kaocha documentation.

Running Specific Tests

Kaocha lets you focus on specific tests with these options:

  • --focus: Run a specific test by its name.

    Command:

    clj -M:test:kaocha --focus my.test.namespace/specific-test
    
  • --focus-meta: Run tests that have specific metadata tags.

    Command:

    clj -M:test:kaocha --focus-meta :integration
    

Testing profiles

Select a test profile by providing the --profile <keyword> flag. Like clojure -M:kaocha --profile ci. Config files for the different profiles can be found under env/kaocha/resources

ProfilePurposeTest PathsPluginsReporterOutput Options
ciFor continuous integration. Runs a comprehensive set of tests with detailed reporting.test/clj/base
test/clj/unit
test/clj/integration
test/resources
:kaocha.plugin/cloverage
:kaocha.plugin/profiling
:kaocha.plugin/gc-profiling
:kaocha.plugin/junit-xml
kaocha.report/documentationJUnit XML report (target/junit.xml)
LCOV coverage report
devFor development environment. Runs different sets of tests with minimal output for quick feedback.test/clj/unit
test/clj/integration
test/clj/base
test/clj/utils
test/resources
Nonekaocha.report/dotsSimple console output
fullFor full test suite execution. Provides detailed coverage and profiling data.test/clj/unit
test/clj/integration
test/clj/base
test/clj/utils
test/resources
:kaocha.plugin/cloverage
:kaocha.plugin/profiling
:kaocha.plugin/gc-profiling
:kaocha.plugin/junit-xml
kaocha.report/documentationJUnit XML report (target/junit.xml)
LCOV and HTML coverage reports

Running code coverage

Code coverage is enabled by default in the :ci and :full profiles. This means that we will get a coverage report when running CI-pipeline. See --profile for more information.

The coverage report is generated by cloverage managed by the kaocha-cloverage plugin. The report is generated in the target/coverage directory.

To run with coverage enabled from the command line:

clj -M:test:kaocha --plugin cloverage

Testing in the REPL

Start the REPL with the test alias enabled, then run the tests:

user=> (use 'kaocha.repl)
user=> (run 'jobtech-taxonomy-api.test.graphql-test/graphql-test-1)

How to write an integration test

File and namespace

The tests and test resources reside in the test directory.

The test files are separated into two categories: unit and integration. The unit tests are for testing functions directly, while the integration tests are for testing calls through the API.

Test files are stored in test/clj/unit or test/clj/integration depending on what kind of test they contain. From that root they mirror the namespace they are testing. For example, the namespace jobtech-taxonomy.api.routes.services would have its unit tests in test/clj/unit/jobtech-taxonomy/api/routes/services_test.clj and its integration tests in test/clj/integration/jobtech-taxonomy/api/routes/services_test.clj. Sometimes when a module is large it can be tested in multiple files, for example test/clj/integration/jobtech-taxonomy/api/routes/services_test.clj and test/clj/unit/jobtech-taxonomy/api/routes/services_graphql_test.clj. This is generally an indication that the module should be split up.

You need to require [jobtech-taxonomy-api.test.test-utils :as util].

Define fixtures

Place one occurrence of this line in your test file: (test/use-fixtures :each util/fixture).

Define a test which calls functions directly

Here is a simple example of a test which asserts a skill concept, and then checks for its existence.

First, require

[jobtech-taxonomy-api.db.concept :as c]

Then write a test:

(test/deftest ^:concept-test-0 concept-test-0
  (test/testing "Test concept assertion."
    (c/assert-concept "skill" "cykla" "cykla")
    (let [found-concept (first (core/find-concept-by-preferred-term "cykla"))]
      (test/is (= "cykla" (get found-concept :preferred-label))))))

Integration API tests

The file test/clj/integration/jobtech_taxonomy_api/api_test.clj contains tests of the API derived mainly from OpenSearch logs. The tests can run in the following ways:

  • Against a local server launched by the fixture.
  • Against a mock app launched by the fixture.
  • Against a remote sever.

The namespace of that file is tagged with ^:real-server-test. That means that it will be detected by the function utils.api-test-helpers/run-real-server-tests-with-host. That function is called with the host address of a real server to test against, for example

(run-real-server-tests-with-host \"https://taxonomy.api.jobtechdev.se\")

The purpose of the tests in this namespace is:

  • To test the correctness of the taxonomy before deployment
  • To test the correctness of the taxonomy after deployment
  • To have a set of tests to measure the performance.

NOTE: We cannot always be sure exactly which taxonomy that we test. In case we test a remote server, we will assume that the version being run on the remote server is no older than version deployed on any production server at the time when these tests were written. In case some test here breaks when tested against a remote server because the version of the taxonomy deployed on that server is too new, these tests must be updated. But such errors would usually be caught during development since these tests are part of the test suite.

The data used in the tests is stored under test/resources/sample_requests in a folder hierarchy that reflects the hierarchy of the HTTP paths. For some of the tests, the response should be expected to be constant apart from variation in how elements in lists are ordered. But there are also tests where this may not be true. Those tests are:

  • API requests where we limit the result size using some limit parameter. Ideally, it would be good if the ordering is the same but there are no such guarantees from the API.
  • API requests where we don't ask for data from the latest version. When no version parameter is present in the request, the default version is typically the latest version.

Running the Taxonomy

After having cloned the repository and installed the [required software] there are several ways to start the Taxonomy API. From the command line, from an IDE or from a REPL. Starting from the command line uses one of several deps.edn aliases that modifies how the taxonomy starts.

The path to the configuration file can be set using either the environment variable TAXONOMY_CONFIG_PATH or the CLI flag --config <path> or -c <path>.

Starting the Taxonomy

The quickest way to get an interactive server is to just run:

clj -M:dev -m dev

This will start the API using the default configuration in config.edn.

The path to the configuration file can be set using either the environment variable TAXONOMY_CONFIG_PATH or the CLI flag --config <path> or -c <path>. Similarly the port numbers for the REPL and the server can be set. For the repl using TAXONOMY_REPL and -r NNNN or --repl NNNN and for the server using TAXONOMY_PORT and -p NNNN or --port NNNN.

Next step

Then open the following URL in a web browser:

http://localhost:3000/v1/taxonomy/swagger-ui/index.html

Authorize

Authorization is only needed to read unpublished taxonomy versions and for DB write access. To get write access, click the Authorize button, and enter your account code, defined in env/dev/resources/config.edn.

Running a query

curl -X GET --header 'Accept: application/json' 'http://localhost:3000/v1/taxonomy/main/concepts?preferred-label=Danska'

Deploying the Taxonomy

The Taxonomy is deployed using Docker/Podman and Kubernetes. There are specific instructions about how to deploy the Taxonomy to the JobTech infrastructure.

At JobTech, we use Podman instead of Docker for building containers. See specific instructions about Podman.

JobTech infrastructure

How to deploy the JobTech Taxonomy API in the various deployment environments. To be eligible for deployment to production a version needs to first have been deployed to testing.

Environments

The taxonomy is deployed to five different environments. Testing and production for both read and write environments, and an onprem (internal to AF) environment.

  1. jobtech-taxonomy-api-testing-read
  2. jobtech-taxonomy-api-testing-write
  3. jobtech-taxonomy-api-prod-read
  4. jobtech-taxonomy-api-prod-write
  5. onprem

The GitLab CI pipeline automatically deploys jobtech-taxonomy-api-testing-read.

Testing

Production

Traffic Timing

Prefer deploying to production early in the week, after 10:00 but before lunch if possible. Tuesday at 10:00 is probably the best time. This period is both less traffic intensive with respect to internal AF traffic and it gives a better chance of rolling back problematic deploys.

Deployment merge requests

When deploying it is good practice to create an MR against the infra repository and have it approved. But there are cases where it is ok to commit directly against the main branch. For example if multiple persons are working together on a deployment, if one is alone and the deploy is an important bugfix or security fix or if one really wants to.

Deployment to prod-read

  1. Select the latest commit SHA from https://gitlab.com/arbetsformedlingen/taxonomy-dev/backend/jobtech-taxonomy-api.
  2. Verify that it is the same SHA as the one that can be found under newTag in https://gitlab.com/arbetsformedlingen/taxonomy-dev/backend/jobtech-taxonomy-api-infra/-/blob/main/kustomize/overlays/jobtech-taxonomy-api-testing-read/kustomization.yaml?ref_type=heads.
  3. Verify that the testing read environment has updated and is functioning correctly.
  4. Verify that the configuration for the taxonomy API is the same in ´testing-read´ and ´prod-read´.
  5. Edit newTag in https://gitlab.com/arbetsformedlingen/taxonomy-dev/backend/jobtech-taxonomy-api-infra/-/blob/main/kustomize/overlays/jobtech-taxonomy-api-prod-read/kustomization.yaml?ref_type=heads to match the testing-read version.
  6. Await approval if necessary.
  7. Verify that the deployment went well in https://openshift-gitops-server-openshift-gitops.prod.services.jtech.se and https://console-openshift-console.prod.services.jtech.se/dashboards.
  8. Perform a staggered restart of the varnish cache system. Currently it will not do this automatically, so some changes will not be available to the users until the cache is restarted.
  9. Verify that the https://taxonomy.api.jobtechdev.se/v1/taxonomy/status/build call has the correct commit SHA.

Deployment to prod-write

  1. MAKE SURE THAT REDAKTIONEN KNOWS THIS IS HAPPENING!
  2. Select the latest commit SHA from https://gitlab.com/arbetsformedlingen/taxonomy-dev/backend/jobtech-taxonomy-api.
  3. Verify that it is the same SHA as the one that can be found under newTag in https://gitlab.com/arbetsformedlingen/taxonomy-dev/backend/jobtech-taxonomy-api-infra/-/blob/main/kustomize/overlays/jobtech-taxonomy-api-testing-write/kustomization.yaml?ref_type=heads.
  4. Verify that the testing read environment has updated and is functioning correctly.
  5. Verify that the configuration for the taxonomy API is the same in ´testing-write´ and ´prod-write´.
  6. Edit newTag in https://gitlab.com/arbetsformedlingen/taxonomy-dev/backend/jobtech-taxonomy-api-infra/-/blob/main/kustomize/overlays/jobtech-taxonomy-api-prod-write/kustomization.yaml?ref_type=heads to match the testing-read version.
  7. Await approval if necessary.
  8. Verify that the deployment went well in https://openshift-gitops-server-openshift-gitops.prod.services.jtech.se and https://console-openshift-console.prod.services.jtech.se/dashboards.
  9. Perform a staggered restart of the varnish cache system. Currently it will not do this automatically, so some changes will not be available to the users until the cache is restarted.
  10. Verify that the https://api-jobtech-taxonomy-api-prod-write.prod.services.jtech.se/v1/taxonomy/status/build call has the correct commit SHA.

Getting up and running using Podman

Ensure that the aws-secrets-test.txt and api-secrets-test.txt are present.

podman build -t jt-api . podman run --network="host" --name api --env-file=../jobtech-taxonomy-api-gitops/jobtech-taxonomy-api-deploy-secrets/test/aws-secrets-test.txt --env-file=../jobtech-taxonomy-api-gitops/jobtech-taxonomy-api-deploy-secrets/test/api-secrets-test.txt --env-file=dev.env --rm jt-api

Building and running

Build the image:

podman build . -t api

Run the API locally using podman with Datomic as backend, run:

podman run -p 3000:3000 --name api --rm api

Run the API locally using podman with Datahike as backend, with the database in the folder datahike-file-db, run:

podman run -v $PWD/datahike-file-db:/datahike-file-db:Z -e DATABASE_BACKEND=datahike -p 3000:3000 --name api --rm api

Stop the API in another terminal, run:

podman container stop api

Docker

Build the image:

podman build . -t api

Run the API locally using podman with Datomic as backend, run:

podman run -p 3000:3000 --name api --rm api

Run the API locally using podman with Datahike as backend, with the database in the folder datahike-file-db, run:

podman run -v $PWD/datahike-file-db:/datahike-file-db:Z -e DATABASE_BACKEND=datahike -p 3000:3000 --name api --rm api

Stop the API in another terminal, run:

podman container stop api

Monitoring

Please find the logs of deployed instances of the Taxonomy at one of the following addresses:

Technologies and libraries

This chapter contains some information regarding the larger components used to build the taxonomy.

The database and the logging systems.

Database

Database backends

The application uses Datomic flavoured datalog as its database query language. The actual database backend can be configured to be any Datomic compatible database, such as Datomic or Datahike.

The Datomic model is a bit different from the traditional SQL model. It is a graph database, where the data is stored as a set of facts. The facts are a list of tuples called datoms. A datom is a record of a relation between two entities, transaction data and a boolean indicating if the datom is applied or redacted. A transacted datom is never changed, but the seen value can be updated by later transactions. This means that the database is append-only and the history of the database is kept, so it is always possible to review the changes that led to the current state of the database.

Logging

By default, logging functionality is provided by the timbre library. This is used together with the slf4j-timbre and slf4j-api helpers that manages logging from components using slf4j logging.

Any Clojure data structures can be logged directly.

Examples

(ns example
 (:require [taoensso.timbre :as log]))

(log/info "Hello")
=>[2015-12-24 09:04:25,711][INFO][myapp.handler] Hello

(log/debug {:user {:id "Anonymous"}})
=>[2015-12-24 09:04:25,711][DEBUG][myapp.handler] {:user {:id "Anonymous"}}

Configuring logging

Each profile has its own log configuration found under the env/*/resources/taoensso.timbre.config.edn. For example, this is what the dev logging config looks like:

{:min-level [[#{"io.methvin.watcher.*"} :info]
             [#{"datahike.*"} :info]
             [#{"org.eclipse.jetty.*"} :warn]
             [#{"jobtech-taxonomy-api.db.database-connection"} :warn]
             [#{"jobtech-taxonomy.features.mapping.*"} :info]
             [#{"*"} :debug]]}

There is also a common configuration that is not specific to any profile found at resources/taoensso.timbre.config.edn:

{:min-level [[#{"*"} :info]]}

Logging of exceptions

(ns example
 (:require [taoensso.timbre :as log]))

(log/error (Exception. "I'm an error") "something bad happened")
=>[2015-12-24 09:43:47,193][ERROR][myapp.handler] something bad happened
  java.lang.Exception: I'm an error
    at myapp.handler$init.invoke(handler.clj:21)
    at myapp.core$start_http_server.invoke(core.clj:44)
    at myapp.core$start_app.invoke(core.clj:61)
    ...

Local development

When developing new functionality for jobtech-taxonomy-api, you want to use the read version of taxonomy database, which is contained in the tax-api source code repo under ~/resources/taxonomy.zip. The it is used in the default configuration, so you just need to start start the application either from REPL or with clj command described in and it will work. You can either run the taxonomy api from either your terminal or from a repl.

Editor specific instructions

These are some specific instructions for various editors to help you integrate it with this project and get a development REPL up and running in your editor.

EditorInstructions
EmacsThe file .dir-locals.el configures the REPL for development. Just call M-x cider-jack-in to run the REPL.
Vim...
VSCode, Calva...

Adding auto code formatting as a pre-commit hook

Run this command to configure git to look for hooks in .githooks/: git config --local core.hooksPath .githooks/

COMMON ERRORS

If you get :server-type must be :cloud, :peer-server, or :local you have forgot to start the taxonomy api. Run (start) in the user> namespace

References

Using the REPL

The common way of launching a REPL is from within an IDE. However, you can also launch a REPL directly from the terminal using the following command:

clj -M:dev

The advantage of running a REPL from the IDE is typically that you can send code from the IDE directly to the REPL to be evaluated.

Once started, it will look like this:

Clojure 1.11.3
user=>

To launch the server with the default configuration, do

user=> (start)

which will start a server and once started will display something like this:

2024-05-03T08:31:39.402Z jonas INFO [jobtech-taxonomy.api.core:33] - Started http://jonas:8080
{:started ["#'dev/http-server"]}

The user namespace is initialised from env/dev/clj/user, in particular the comment clauses towards the end describes how the functionality in the namespace is intended to be used.

(ns user
  {:clj-kondo/config '{:linters {:clojure-lsp/unused-public-var {:level :off}}}}
  (:require [clj-http.client :as client]
            [clojure.data.json :as cdj]
            [clojure.pprint :refer [pprint]]
            [clojure.spec.alpha :as s]
            [dev]
            [expound.alpha :as expound]
            [jobtech-taxonomy.api.config :refer [get-config transduce-backends]]
            [jobtech-taxonomy.api.core]
            [mount.core :as mount]
            [taoensso.timbre :as log]))

(alter-var-root #'s/*explain-out* (constantly expound/printer))

(defn add-taps []
  (add-tap (bound-fn* pprint)))

(def dev-cfg (atom (get-config [])))

(defn show-configs []
  (let [cfg @dev-cfg]
    {:available (mapv :id (:backends cfg))
     :active (:database-backend cfg)
     :multi (:compare-backends cfg)}))

(defn activate-config [cfg-id & cfg-ids]
  (swap! dev-cfg
         #(-> %
              (assoc :database-backend cfg-id)
              (dissoc  :compare-backends)
              (merge (if cfg-ids {:compare-backends (into [cfg-id] cfg-ids)} {}))))
  (show-configs))

(comment
  (load-config)
  (show-configs)
  (activate-config ':datahike-v.dev)
  @dev-cfg
  'bye)

(def api-root
  (str "http://localhost"
       (if-let [port (get-in @dev-cfg [:options :port])]
         (str ":" port "/")
         "/")))

(defn ^:export ppt []
  (pprint *1))

(defn ^:export get-raw-body [path]
  (-> (str api-root path)
      client/get
      :body))

(defn get-json-body [path]
  (-> (get-raw-body path)
      (cdj/read-str :key-fn keyword)))

(defn ^:export api-json []
  (get-json-body "v1/taxonomy/swagger.json"))

(defn ^:export tax-get [path]
  (get-json-body path))

(defn ^:export load-config
  ([] (reset! dev-cfg (get-config [])))
  ([cfg] (reset! dev-cfg cfg)))

(defn start []
  (reset! @#'mount/-args @dev-cfg)
  (mount/start-without #'dev/repl-server))

(defn stop []
  (mount/stop-except #'dev/repl-server))

(defn restart []
  (stop)
  (start))

(defn query-url [& re-parts]
  (->> (api-json)
       :paths
       (keep (fn [[k m]]
               (let [url (name k)]
                 (when (every? #(re-find % url) re-parts)
                   {:url (name k) :methods (keys m) :info m}))))
       vec))

(defn reduce-param [params]
  (->> (group-by :in params)
       (map (fn [[n g]]
              [n (mapv #(let [dissoc-falsy ((remove %) dissoc)]
                          (-> (dissoc % :in)
                              (dissoc-falsy :required)
                              (dissoc-falsy :deprecated))) g)]))))

(defn get-query [api-entry method]
  (when-let [method-info (get-in api-entry [:info method])]
    {:url (:url api-entry)
     :method method
     :summary (:summary method-info)
     :parameters (reduce-param (:parameters method-info))}))

(comment
  ;; This example demonstrates how you can start a short-lived server
  ;; to try things out. In this particular case, we attempt to reproduce
  ;; the datomic timeout reported in
  ;;
  ;; https://gitlab.com/arbetsformedlingen/taxonomy-dev/backend/jobtech-taxonomy-api/-/issues/652
  ;;
  #_{:clj-kondo/ignore [:unresolved-symbol]}
  (dev/with-http-server
    [{:keys [url]} (->> []
                        get-config
                        (transduce-backends (filter (comp #{:datomic} :type))))]
    (client/get
     (str url "/v1/taxonomy/suggesters/autocomplete?query-string=mjaoelimjaoelimjao&related-ids=i46j_HmG_v64&relation=narrower")
     {:accept :json}))

  )

(comment
  ;; Show what will be logged and how.
  (pprint log/*config*)
  ;; Possibly add taps.
  (add-taps)
  ;; Start by loading a config, any config, for the mount-system.
  ;; This is optional, but allows any config to be loaded.
  (load-config (get-config []))
  ;; Proceed by starting the mountable things, except for the REPL, since you are in it.
  (start)
  ;; Let's look at the Swagger API!
  ;; query-url takes some reg-exps that has to match the url
  (->> (query-url #"relation" #"changes" #"main")
       (keep #(get-query % :get))
       pprint)
  ;; (log/set-min-level! :debug)
  ;; Change the config. This particular one will break the system.
  (reset! @#'mount/-args {:options {:port 3000}})
  ;; Just to clear the air a bit.
  (restart)
  ;; Stop the system!
  (stop)
  'bye!
  )

IDE integration

Clojure development is usually easiest with a REPL running inside an IDE.

VSCode and Calva

Install VS Code and the Calva plugin.

Then follow the instructions in Local development to configure your system for local development.

Emacs and CIDER

Platforms

The Taxonomy can be run on Windows, Linux or Mac OS. When using Windows the Taxonomy API using WSL 2.

After having installed WSL 2 with a linux distro, you can follow the instructions below for Linux how set up the environment.

VS Code and Calva is likely to be a good choice for development.

To run the Taxonomy API under Linux, you will need:

  • Some JDK (Java Development Kit)
  • Clojure CLI, which also requires
    • bash
    • curl
    • rlwrap.

The Taxonomy API is Java-based and requires a JDK in order to be built and run. Make sure that you have a JDK on your system. OpenJDK is a good default. Read about how it can be installed on this page: https://openjdk.org/install/. However, the official Clojure page recommends Adoptium which might also be a good choice.

To install Clojure, visit install Clojure

For Graphviz endpoint to work locally, you'll need to have Graphviz installed locally. See https://graphviz.org/download/#linux for installation instructions under Linux.