Developers Guide to the JobTech Taxonomy API
The JobTech Taxonomy API is big. You just won't believe how vastly, hugely, mind-bogglingly big it is.
Here you can find information on how to install a development environment, build requirements, how the project is structured, and how to configure, build, test, and finally deploy the JobTech Taxonomy API.
Open Source
The JobTech Taxonomy API is provided under the EPL-2.0 license.
Copyright © 2019 JobTech
Introduction
The documentation is split into two different part.
This part, aimed at developers of the taxonomy, and a part for taxonomy users. If you have questions about what a taxonomy is, how the JobTech Taxonomy relates to the job market, and how to use the taxonomy it is that second part that you want.
For new developers
An overview of chapters describing fundamental project information.
From source to running
Chapters describing different steps in getting the taxonomy off the ground.
Development
The following chapters describe how the repository is structured, where and how the documentation is built, how to set up the development environment, and the major building blocks used in developing the taxonomy.
Project Layout
This chapter describes how the project is structured. Where to find what files.
Documentation
Some information on how this and the user documentation is written, where it is stored, and how to build it.
Development environment
How to set up a local development environment.
Technologies
This chapter introduces the larger components that are used by the taxonomy. Things like the database layer, the web server, logging, testing, documentation, and so on.
Getting started
Follow the instructions in https://clojure.org/guides/install_clojure to get Clojure up and running.
Starting the system
Now it should suffice to run clj -M:dev -m dev
in the clone of the taxonomy API repo to build and deploy the API. The logging messages should provide a link to where the API can be tested. If you run from your local machine and use the default config the link should be http://localhost:8080 or similar.
Running tests
clj -M:test:kaocha
runs all tests.clj -M:test:kaocha --watch
runs all tests continuously and looks for changes to the code.clj -M:test:kaocha --focus :unit --watch
runs only the (slightly faster) unit tests continuously.
Troubleshooting
Sometimes, documentation and code can get out of sync. When this happens—when the system builds and runs but not as described in the documentation—here are some tips to help you resolve the issue.
If you find discrepancies between the code and documentation, please create a merge request to update the documentation!
To troubleshoot continuous integration pipeline bugs, build errors, or unexpected runtime behavior, refer to the following files. They often provide clues about what is supposed to work:
- Dockerfile – Contains setup instructions for the Docker environment.
- .gitlab-ci.yml – Configures the continuous integration pipeline.
- build.clj – Manages the build process and dependencies.
Look to the CI system
The .gitlab-ci.yml
file contains information on how the system is built, tested, and deployed. It is an excellent starting point to see what step fails and why. It also gives a hint of what the expected environment for the build is and what commands should work.
When this documentation was built it looked like this:
include:
- remote: https://gitlab.com/arbetsformedlingen/devops/gitlab-cicd/jobtech-ci/-/raw/v2.1.1/jobtech-ci.yml
- remote: https://gitlab.com/arbetsformedlingen/devops/gitlab-cicd/jobtech-ci/-/raw/v2.1.1/deploy.yml
variables:
TEST_DISABLED: 1
DEPLOYS_DISABLED: 1
SCHEMATHESIS_IMAGE: schemathesis/schemathesis:3.28.1
CLOJURE_BUILD_IMAGE: clojure:temurin-22-tools-deps-1.11.3.1456-jammy
RUNTIME_BASE_IMAGE: eclipse-temurin:22.0.1_8-jre-alpine
AUTO_DEVOPS_BUILD_IMAGE_EXTRA_ARGS: "--build-arg BUILDER_BASE_IMAGE=$CLOJURE_BUILD_IMAGE --build-arg RUNTIME_BASE_IMAGE=$RUNTIME_BASE_IMAGE"
TAXONOMY_DEVELOP_SWAGGER_URL: http://jobtech-taxonomy-api-develop.apps.testing.services.jtech.se/v1/taxonomy/swagger.json
# This will store the libraries from deps.edn in a cache that will
# be restored in the next job that uses the same cache key
.cache-template: &cache-template
- key:
files:
- deps.edn
paths:
- .m2/repository
# Prepare and download the libraries in deps.edn.
# The result is cached between jobs and pipelines
# if there are no changes in deps.edn.
prepare_taxonomy_api_libraries:
stage: build
image: $CLOJURE_BUILD_IMAGE
cache: *cache-template
script:
- clojure -A:dev:prof:kaocha:prod:build:test:clj-kondo:eastwood:cljfmt:nsorg:outdated -P
- clojure -T:build update-build
artifacts:
paths:
- target
# This job is defined in Auto DevOps template.
# Here we tell it to use cached dependencies and
# start after the documentation is build.
build:
needs: [prepare_taxonomy_api_libraries]
cache: *cache-template
schemathesis_get_graphql_tests:
rules:
- if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH
allow_failure: true
stage: test
image:
name: $SCHEMATHESIS_IMAGE
entrypoint: [""]
script:
- st run --junit-xml=schemathesis-get-graphql-junit.xml --hypothesis-deadline=None --show-trace --method GET --endpoint "^/v1/taxonomy/graphql" $TAXONOMY_DEVELOP_SWAGGER_URL
artifacts:
paths:
- schemathesis-get-graphql-junit.xml
reports:
junit: schemathesis-get-graphql-junit.xml
schemathesis_exclude_graphql_tests:
rules:
- if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH
allow_failure: true
stage: test
image:
name: $SCHEMATHESIS_IMAGE
entrypoint: [""]
script:
- st run --junit-xml=schemathesis-exclude-graphql-junit.xml --hypothesis-deadline=None --show-trace --endpoint "^/v1/taxonomy/(?!graphql).+" $TAXONOMY_DEVELOP_SWAGGER_URL
artifacts:
paths:
- schemathesis-exclude-graphql-junit.xml
reports:
junit: schemathesis-exclude-graphql-junit.xml
datomic_kaocha_tests:
stage: test
needs: [prepare_taxonomy_api_libraries]
image: $CLOJURE_BUILD_IMAGE
cache: *cache-template
script:
- apt-get update && apt-get install -y python3-pip graphviz && pip3 install lcov_cobertura
- DATABASE_BACKEND=:datomic-v.kaocha clj -M:test:kaocha --profile :ci
- lcov_cobertura target/coverage/lcov.info -o coverage.xml
coverage: /^\|\s+ALL FILES\s\|\s+(\d+\.\d+)\s\|\s+\d+\.\d+\s\|\s*$/
artifacts:
paths:
- target/coverage/lcov.info
- cobertura.xml
reports:
junit: target/junit.xml
coverage_report:
coverage_format: cobertura
path: coverage.xml
server_api_tests:
stage: test
needs: [prepare_taxonomy_api_libraries]
image: $CLOJURE_BUILD_IMAGE
cache: *cache-template
script:
- apt-get update && apt-get install -y graphviz
- ./test/gitlab-ci/api-tests.sh
artifacts:
paths:
- target/junit-api-test-report.xml
reports:
junit: target/junit-api-test-report.xml
autodeploy_testing_read:
stage: test
image: docker:25.0.0-beta.2
services:
- docker:25.0.0-beta.2-dind
resource_group: write-infra
variables:
OVERLAY: "jobtech-taxonomy-api-testing-read"
rules:
- if: $CI_COMMIT_REF_SLUG == $CI_DEFAULT_BRANCH
needs:
- code_quality
- container_scanning
- datomic_kaocha_tests
- dockerfile-lint
- secret_detection
- semgrep-sast
- server_api_tests
- schemathesis_get_graphql_tests
- schemathesis_exclude_graphql_tests
script:
- !reference [.deploy, script]
Look to the Container Build
In general it is a good idea to take a look in the Dockerfile
to see what base image is being used and what additional software is installed before the taxonomy is built and installed.
Currently the content of the Dockerfile
is the following:
# This is a multi-stage build.
# The first stage is used to prepare the runtime image.
# The second stage is used to build the application.
# The third stage is used to run the application.
# The local versions of the builder and runtime images.
# These are replaced by the CI/CD pipeline.
ARG BUILDER_BASE_IMAGE=clojure:temurin-22-tools-deps-jammy
ARG RUNTIME_BASE_IMAGE=eclipse-temurin:22-jre-alpine
# The image used to run the application
# Here we add the helper software that do not change often.
# Also setup the user that will run the application and
# give it permissions for the application folder.
# This caches the layer and speeds up the build.
FROM $RUNTIME_BASE_IMAGE as runtime
RUN apk add --update --no-cache graphviz ttf-freefont && \
addgroup -S runner && \
adduser -S runner -G runner
# The image used to build the application.
# We copy the entire directory so that we get the benefit
# of caching the dependencies and prepared libraries.
FROM $BUILDER_BASE_IMAGE as builder
WORKDIR /build
COPY . .
# If built in a clean repo, this step is needed
# to prepare the clojure libraries.
RUN clojure -A:prod:build -P
# Build the application uber-jar.
RUN clojure -T:build uber
# The final image that will run the application.
# We copy the built jar and the resources,
# expose the default port and run the application.
FROM runtime
WORKDIR /app
COPY --from=builder /build/target/app.jar app.jar
COPY --from=builder /build/resources/taxonomy.zip resources/taxonomy.zip
COPY --from=builder /build/resources/mappings resources/mappings
RUN chown -R runner:runner /app
USER runner
EXPOSE 3000
CMD java -jar app.jar
Look to the Build System
If the build fails there can be information about what was expected in the build.clj
file. This file is mainly used for build in the Dockerfile, but it may be useful on its own also.
(ns build
(:require [clojure.tools.build.api :as b]
[java-time.api :as jt]))
(def class-dir "target/classes")
(def basis (b/create-basis {:project "deps.edn"}))
(defn ^:export update-build [_]
(let [commit (b/git-process {:git-args "rev-parse --short HEAD"})
branch (b/git-process {:git-args "branch --show-current"})
build-info {:branch branch
:built (jt/format (jt/zoned-date-time))
:commit commit}]
(spit "resources/build-info.edn" build-info)))
(def uber-file "target/app.jar")
(def copy-src ["src/clj" "env/prod/clj" "env/prod/resources" "resources"])
(defn clean [_]
(b/delete {:path "target"}))
(defn ^:export uber [_]
(clean nil)
(b/copy-dir {:src-dirs copy-src
:target-dir class-dir})
(b/compile-clj {:basis basis
:src-dirs ["src"]
:ns-compile '[prod]
:bindings {#'clojure.core/*warn-on-reflection* true}
:class-dir class-dir})
(b/uber {:class-dir class-dir
:uber-file uber-file
:basis basis
:main 'prod}))
From source to running
The following chapters describes (roughly in logical order) how to go from the source code in the repository to having a running JobTech Taxonomy API in an production environment.
Configuration
How to configure the taxonomy; both for build and when starting the built software.
Building
How to build the taxonomy.
Testing
How to test the taxonomy.
Running
How to run the taxonomy.
Deploying
How to deploy the taxonomy to a production environment.
Monitoring
How to monitor the running taxonomy.
Project Layout
The JobTech Taxonomy API project is structured in a way that tries to make it easy to find the code and resources you need. The project is divided into several directories, each with a specific purpose. The following table below provides an overview of the directories in the project.
Major features that may not be necessary for all users are placed in the features
directory. The env
directory contains aliases and configurations for different environments. The resources
directory contains files that are used by the JobTech Taxonomy API but that are not generated during build, found in a database or source code.
Features
The features
directory contains optional features that can be enabled or disabled. These features are not necessary for the core functionality of the JobTech Taxonomy API. The features are implemented as middleware and can be enabled or disabled depending on the configuration. The features are intended to allow for easy extensibility of the JobTech Taxonomy API and to increase security by, for example, deploying without the taxonomy editor related features.
Directories
Directory | Description |
---|---|
docs | Documentation |
docs/development | Development-related documentation. |
docs/taxonomy | End user documentation. |
env | Functionality related to various aliases. |
env/dev | Settings, tools and helpers for development. |
env/kaocha | Configuration for the Kaocha test runner. |
env/prod | Production environment configurations. |
env/prof | Profiling configurations. |
env/test | Testing related code. |
resources | Files that are used by the JobTech Taxonomy API but that are not generated during build, found in a database or source code. |
resources/public | Static web resources and documents used by the JobTech Taxonomy API. |
src | Source code for most of the implementation of the JobTech Taxonomy API. |
src/clj/jobtech_taxonomy/api | The server, database and API implementation. |
src/clj/jobtech_taxonomy/common | Utilities and code shared by the API and features. |
src/clj/jobtech_taxonomy/features | Optional features. |
test | Root directory for test sources and resources. |
test/clj/integration | For tests that go from HTTP request to response. |
test/clj/unit | Unit test code. |
test/clj/utils | Helper functions for the tests. |
test/resources | Data used in tests. |
Documentation
The documentation is written in markdown and is built using mdbook.
See documentation tools for more information on how to install and usemdbook.
The documentation is located in the docs
directory and is split in two parts: taxonomy
and development
.
This is the docs/development
part of the documentation and it is intended for developers working on the JobTech Taxonomy API. It explains how the system is built, how to set up a development environment, how to build and run the system, how to test it, and how to deploy it.
The docs/taxonomy
part is the user documentation and is intended for users of the JobTech Taxonomy API. It explains what the Taxonomy is, what information it contains, how to use it, and how to integrate it with other systems.
Writing the documentation
The documentation is written in markdown and require no special tools to edit. You can use any text editor to write the documentation. To see the changes as you write you can however use mdbook to serve the documentation locally.
Serving the documentation
To serve the documentation locally run:
mdbook serve docs/taxonomy
or, if you wish to run mdbook in a way that allows you to access the documentation from other devices on your network:
mdbook serve -n 0.0.0.0 docs/taxonomy
Substitute docs/taxonomy
for docs/development
if you wish to work on the development part of the documentation.
Building the documentation
The CI system will build both the development
and taxonomy
parts of the documentation. It will then publish the development
part to the GitLab Pages for the JobTech Taxonomy API repository. The taxonomy
part is embedded into the image built for deployment of the JobTech Taxonomy API.
To build the documentation locally run:
mdbook build docs/taxonomy
or, if you wish to build the development part of the documentation:
mdbook build docs/development
Tools
Both mdbook and the plugins listed below are included in the CI Tools image. The plugins are used to render diagrams, admonition boxes, and provide tools to be used in the documentation. They are not required to build the documentation but are useful to make it more readable and informative.
Plugins
The following plugins are used to render diagrams, admonition boxes, and provide tools to be used in the documentation:
mdbook-admonish
This plugin allows you to add fancy notes to your documentation.
```admonish warning
Writing software without tests is like skydiving without a parachute.
```
Will display as:
mdbook-katex
This lets us write math in the LaTeX style.
$$ \eta \beta \pi $$
becomes $$ \eta \beta \pi $$
mdbook-plantuml
```puml
@startuml Hello!
Bob->Alice : Hello!
@enduml
```
Will display as:
PlantUML rendering error: Failed to generate diagram (error sending request for url (https://www.plantuml.com/plantuml/svg/SoWkIImgAStDKV18pSd9L-Hooa_IjNFCoKnELR22YJcavgK0bN0100==): error trying to connect: error:0A000086:SSL routines:tls_post_process_server_certificate:certificate verify failed:../ssl/statem/statem_clnt.c:1889: (unable to get local issuer certificate))
mdbook-mermaid
```mermaid
mindmap
root((Mermaid))
Can be used for graphs
Graphs
Diagrams
Flowcharts
Just
Another
Tool
Brick
This
```
Will display as:
mindmap
root((Mermaid))
Can be used for graphs
Graphs
Diagrams
Flowcharts
Just
Another
Tool
Brick
This
Configuration
There are two main ways the system is configured. The first is during build when the selected aliases change how the system will start and what components get enabled. The second way the system is configured is by the provided command line arguments, environment variables and configuration files during startup.
More on how to build the taxonomy for specific configurations can be found in the at build section and (with more focus on the build system) in the build chapter.
The available configurations for the software is described in the at start section.
At Build
Configuration at build is achieved by enabling aliases that select one or more extra source directories (by convention stored under env/<alias name>/clj/
).
These aliases are enabled as other aliases by using the -M
and then adding a colon separated list of alias names. So to enable the test
sources and kaocha
functionality one would supply the flag -M:test:kaocha
.
The aliases are defined in deps.edn
.
At Start
The server can be configured from the command-line when it is started. For example, the following will start a server with the dev
alias and default configuration:
clj -M:dev -m dev
By default, it will use the configuration found at config.edn
:
{:backends
[{:allow-create true
:cfg
{:db-name "jobtech-taxonomy-db" ,
:server-type :datomic-local,
:storage-dir ".taxonomy-db",
:system "jobtech-taxonomy"},
:id :datomic-v.dev,
:init-db {:filename "resources/taxonomy.zip", :unzip true},
:type :datomic}],
:jobtech-taxonomy-api
{:auth-tokens {:111 :user, :222 :admin}, :user-ids {:222 "admin-1"}},
:mappings {:source-directory "resources/mappings"},
:options
{:deployment-name "Default Develop", :jetty-threads 32, :log-level :debug, :port 8080}}
The server port depends on which alias is being used.
The configuration file can be explicitly set using the --config
flag or its short form -c
:
clj -M:dev -m dev --config config.edn
The server port can be explicitly set using the --port
flag or its short form -p
:
clj -M:dev -m dev --port 5020
Summary of command-line flags
The following table is a summary of the available flags.
Flag | Argument | Description |
---|---|---|
--config , -c | Configuration file | Specify which configuration file to use. |
--port , -p | Port number | Specify the server port at which the API can be reached. |
Provided configurations
Here is a list of some configuration files provided in this repository. For the actual configuration files used when the API is deployed, see the config*.edn
files in the repository jobtech-taxonomy-api-infra
.
Path | Description |
---|---|
config.edn | Configuration with sane defaults suitable for development. |
env/dev/resources/config.dev.edn | Default configuration used with the dev alias. |
env/prod/resources/config.edn | Default configuration used for production. |
config.container.edn | Configuration suitable for containers. |
test/resources/config/config.edn | Configuration used in some tests. |
test/resources/config/config.datomic.edn | Configuration used for testing Datomic. |
test/resources/config/config.datahike.edn | Configuration used for testing Datahike. |
test/resources/config/config-local-storage.edn | Configuration used for testing local storage. |
Configuration file structure
The configuration file contains an EDN-encoded data structure with the following keys:
Key | Required? | Description |
---|---|---|
:backends | Yes | A vector of configurations for database backends. Each configuration has an id. |
:database-backend | Key referring to the id of a backend under :backends . | |
:compare-backends | List backends to compare. | |
:options | Options for the server, such as port number. | |
:jobtech-taxonomy-api | Settings regarding the taxonomy. |
At the :backends
key, a set of configurations for database backends is listed, each with a unique id at its :id
key. Under normal conditions, there will only be one backend configuration which is used. In that case, there is no need to provide values for the keys :database-backend
and :compare-backends
.
However, for the sake of comparing the correctness of different database backends, it is possible to list many backends at the :backends
key. If the :database-backend
key does not have a value, by default the first backend in the list of backends will be used. the :compare-backends
can be set to a vector of backend ids for backends to be compared. It is also possible to specify which one of the backends that should be used by the API at the :database-backend
key.
Here is an example of what a configuration can look like:
{:id :prod
:options {:port 3000
:log-level :info}
:wanderung-source :nippy
:wanderung {:nippy {:wanderung/type :nippy :filename "./test/data/taxonomy.nippy"}}
:database-backend :datahike
:compare-backends [:datomic :datahike]
:backends [{:id :datahike
:type :datahike
:threads 12
:cfg {:store {:backend :mem}
:attribute-refs? true
:name "from-nippy"}}
{:id :datomic
:type :datomic
:threads 12
:cfg {:db-name "jobtech-taxonomy-prod-2023-02-07-11-19-36"
:server-type :ion
:region "eu-central-1"
:system "tax-prod-v4"
:endpoint "http://entry.tax-prod-v4.eu-central-1.datomic.net:8182/"}}]}
Under the :backends
key, we see the backends with ids :datahike
and :datomic
. Of these two, we specify that :datahike
should be used at the :database-backend
key and that the results of the two backends should be compared at the :compare-backends
key.
Specific Database Backends
This section describes some specific database backends.
Datomic local file based - EBS
To utilize Datomic's local file-based backend for persistent storage, setting up the database storage is necessary. The most straightforward approach currently involves mounting a volume to the pod, with the underlying file system being Elastic Block Store (EBS) in AWS. For detailed configuration steps, refer to the infra repository's kustomize overlays. Below is a brief guide on setting up EBS in AWS.
To start, log in to your AWS account and navigate to the EC2 dashboard. From there, locate the "Elastic Block Store" section and click on "Volumes."
Within the "Volumes" section, locate the "Create Volume" button in the top right corner and click on it to proceed with creating a new volume.
Select the "gp2" volume type and specify the desired size (ensure it matches what you've set in the Persistent Volume (PV) and Persistent Volume Claim (PVC) configurations in the infra repository). Once the volume is created, you can copy the volume ID and paste it into your kustomize configuration in the infra repository.
Building
For the sake of deploying the taxonomy API to cloud infrastructure, it is recommended to build an uberjar that will contain all the dependencies needed for running the taxonomy. The following command will build an uberjar:
clj -T:build uber
It is possible to build a Docker image with the taxonomy API. The Docker image build process is specified in the Dockerfile at the root of this repository. The Dockerfile will invoke the command RUN clojure -T:build uber
to build an uberjar used inside the Docker image. Building an uberjar ensures that all dependencies are resolved when the Docker image is built instead of when it is executed.
The Docker image can be built using podman
(or docker
) with the following command invoked from the root of this repository, where we tag the image my-taxonomy-image
:
podman build -t my-taxonomy-image .
Build Requirements
The build requirements tend to vary a bit depending on what technologies are currently being used and features under development. The more stable ones are listed here and new ones should be added or removed as they appear and disappear.
Clojure
Install the latest version of Clojure for your platform. Except for Clojure's Java dependency this should be all that is required for a [first launch].
Testing the Taxonomy
To test the Taxonomy, we use Kaocha test runner. There are two types of tests: unit tests test isolated parts of the code and should run very fast for quick feedback during development. are located under test/clj/unit
. Integration tests test various aspects of the system as a whole and are located under test/clj/integration
. The integration test setup creates a temporary database for each test,
which makes it safe to do any modifications without leaving traces
behind. Test resource files can be found under test/resources
. There is also some old tests that has not been broken apart yet to either integration
or unit
under directory base
.
clj -M:test:kaocha
will run all test on the default database backend (Datomic) in memory, and exclude Datahike specific tests.
It is possible to run the test suite with a specific database backend that is specified using the environment variable DATABASE_BACKEND
. The unit tests use the configuration file test/resources/config/config.edn
, which lists the backends with ids :datomic-v.kaocha
and :datahike-v.kaocha
. By setting the environment variable to one of these keys before we invoke the above command, the tests will be executed with that backend. In Bash, the syntax lets us set the environment variable on the same line as the command is invoked on:
DATABASE_BACKEND=:datomic-v.kaocha clj -M:test:kaocha
Useful Kaocha Commands
Kaocha provides several flags that can help tailor your testing process. Here are a few important ones:
--reporter
By default, Kaocha doesn't show much detail about the tests being run. To get more information, you can use the --reporter
option. A good choice is the documentation
reporter, which shows detailed output for each test.
- Purpose: Controls how test results are displayed.
- Example:
This command provides detailed output about each test run.clj -M:test:kaocha --reporter kaocha.report/documentation
--focus
- Purpose: Runs only a specific test or group of tests by name.
- Example:
clj -M:test:kaocha --focus my.test.namespace/specific-test
--focus-meta
- Purpose: Runs tests that have specific metadata tags.
- Example:
clj -M:test:kaocha --focus-meta :integration
--seed
- Purpose: Sets the seed for random test order to make test runs reproducible.
- Example:
This ensures that tests run in the same order every time, which is useful for debugging.clj -M:test:kaocha --seed 12345
--watch
- Purpose: Automatically re-runs tests when files change.
- Example:
This keeps Kaocha running in the background, watching for file changes and rerunning tests as needed.clj -M:test:kaocha --watch
--fail-fast
- Purpose: Stops the test suite as soon as a failure is encountered.
- Example:
This is useful when you want to address the first error before running the rest of the tests.clj -M:test:kaocha --fail-fast
You can combine different flags and commands to suit your needs. For a full list of reporters, see the Kaocha documentation.
Running Specific Tests
Kaocha lets you focus on specific tests with these options:
-
--focus
: Run a specific test by its name.Command:
clj -M:test:kaocha --focus my.test.namespace/specific-test
-
--focus-meta
: Run tests that have specific metadata tags.Command:
clj -M:test:kaocha --focus-meta :integration
Testing profiles
Select a test profile by providing the --profile <keyword>
flag. Like clojure -M:kaocha --profile ci
. Config files for the different profiles can be found under env/kaocha/resources
Profile | Purpose | Test Paths | Plugins | Reporter | Output Options |
---|---|---|---|---|---|
ci | For continuous integration. Runs a comprehensive set of tests with detailed reporting. | test/clj/base test/clj/unit test/clj/integration test/resources | :kaocha.plugin/cloverage :kaocha.plugin/profiling :kaocha.plugin/gc-profiling :kaocha.plugin/junit-xml | kaocha.report/documentation | JUnit XML report (target/junit.xml )LCOV coverage report |
dev | For development environment. Runs different sets of tests with minimal output for quick feedback. | test/clj/unit test/clj/integration test/clj/base test/clj/utils test/resources | None | kaocha.report/dots | Simple console output |
full | For full test suite execution. Provides detailed coverage and profiling data. | test/clj/unit test/clj/integration test/clj/base test/clj/utils test/resources | :kaocha.plugin/cloverage :kaocha.plugin/profiling :kaocha.plugin/gc-profiling :kaocha.plugin/junit-xml | kaocha.report/documentation | JUnit XML report (target/junit.xml )LCOV and HTML coverage reports |
Running code coverage
Code coverage is enabled by default in the :ci
and :full
profiles. This means that we will get a coverage report when running CI-pipeline. See --profile for more information.
The coverage report is generated by cloverage
managed by the kaocha-cloverage
plugin. The report is generated in the target/coverage
directory.
To run with coverage enabled from the command line:
clj -M:test:kaocha --plugin cloverage
Testing in the REPL
Start the REPL with the test alias enabled, then run the tests:
user=> (use 'kaocha.repl)
user=> (run 'jobtech-taxonomy-api.test.graphql-test/graphql-test-1)
How to write an integration test
File and namespace
The tests and test resources reside in the test
directory.
The test files are separated into two categories: unit
and integration
. The unit
tests are for testing functions directly, while the integration
tests are for testing calls through the API.
Test files are stored in test/clj/unit
or test/clj/integration
depending on what kind of test they contain. From that root they mirror the namespace they are testing. For example, the namespace jobtech-taxonomy.api.routes.services
would have its unit tests in test/clj/unit/jobtech-taxonomy/api/routes/services_test.clj
and its integration tests in test/clj/integration/jobtech-taxonomy/api/routes/services_test.clj
. Sometimes when a module is large it can be tested in multiple files, for example test/clj/integration/jobtech-taxonomy/api/routes/services_test.clj
and test/clj/unit/jobtech-taxonomy/api/routes/services_graphql_test.clj
. This is generally an indication that the module should be split up.
You need to require [jobtech-taxonomy-api.test.test-utils :as util]
.
Define fixtures
Place one occurrence of this line in your test file:
(test/use-fixtures :each util/fixture)
.
Define a test which calls functions directly
Here is a simple example of a test which asserts a skill concept, and then checks for its existence.
First, require
[jobtech-taxonomy-api.db.concept :as c]
Then write a test:
(test/deftest ^:concept-test-0 concept-test-0
(test/testing "Test concept assertion."
(c/assert-concept "skill" "cykla" "cykla")
(let [found-concept (first (core/find-concept-by-preferred-term "cykla"))]
(test/is (= "cykla" (get found-concept :preferred-label))))))
Integration API tests
The file test/clj/integration/jobtech_taxonomy_api/api_test.clj
contains tests of the API derived mainly from
OpenSearch logs. The tests can run in the following ways:
- Against a local server launched by the fixture.
- Against a mock app launched by the fixture.
- Against a remote sever.
The namespace of that file is tagged with ^:real-server-test
.
That means that it will be detected by the function
utils.api-test-helpers/run-real-server-tests-with-host
.
That function is called with the host address of a real server
to test against, for example
(run-real-server-tests-with-host \"https://taxonomy.api.jobtechdev.se\")
The purpose of the tests in this namespace is:
- To test the correctness of the taxonomy before deployment
- To test the correctness of the taxonomy after deployment
- To have a set of tests to measure the performance.
NOTE: We cannot always be sure exactly which taxonomy that we test. In case we test a remote server, we will assume that the version being run on the remote server is no older than version deployed on any production server at the time when these tests were written. In case some test here breaks when tested against a remote server because the version of the taxonomy deployed on that server is too new, these tests must be updated. But such errors would usually be caught during development since these tests are part of the test suite.
The data used in the tests is stored under test/resources/sample_requests
in a folder hierarchy that reflects the hierarchy of the HTTP paths.
For some of the tests, the response should be expected to be constant
apart from variation in how elements in lists are ordered. But there are
also tests where this may not be true. Those tests are:
- API requests where we limit the result size using some
limit
parameter. Ideally, it would be good if the ordering is the same but there are no such guarantees from the API. - API requests where we don't ask for data from the latest version. When
no
version
parameter is present in the request, the default version is typically thelatest
version.
Running the Taxonomy
After having cloned the repository and installed the [required software] there are several ways to start the Taxonomy API. From the command line, from an IDE or from a REPL. Starting from the command line uses one of several deps.edn aliases that modifies how the taxonomy starts.
The path to the configuration file can be set using either the environment variable TAXONOMY_CONFIG_PATH
or the CLI flag --config <path>
or -c <path>
.
Starting the Taxonomy
The quickest way to get an interactive server is to just run:
clj -M:dev -m dev
This will start the API using the default configuration in config.edn.
The path to the configuration file can be set using either the environment variable TAXONOMY_CONFIG_PATH
or the CLI flag --config <path>
or -c <path>
. Similarly the port numbers for the REPL and the server can be set. For the repl using TAXONOMY_REPL
and -r NNNN
or --repl NNNN
and for the server using TAXONOMY_PORT
and -p NNNN
or --port NNNN
.
Next step
Then open the following URL in a web browser:
http://localhost:3000/v1/taxonomy/swagger-ui/index.html
Authorize
Authorization is only needed to read unpublished taxonomy versions and for DB write access.
To get write access, click the Authorize button, and enter your account code,
defined in env/dev/resources/config.edn
.
Running a query
curl -X GET --header 'Accept: application/json' 'http://localhost:3000/v1/taxonomy/main/concepts?preferred-label=Danska'
Deploying the Taxonomy
The Taxonomy is deployed using Docker/Podman and Kubernetes. There are specific instructions about how to deploy the Taxonomy to the JobTech infrastructure.
At JobTech, we use Podman instead of Docker for building containers. See specific instructions about Podman.
JobTech infrastructure
How to deploy the JobTech Taxonomy API in the various deployment environments. To be eligible for deployment to production a version needs to first have been deployed to testing.
Environments
The taxonomy is deployed to five different environments. Testing and production for both read and write environments, and an onprem (internal to AF) environment.
jobtech-taxonomy-api-testing-read
jobtech-taxonomy-api-testing-write
jobtech-taxonomy-api-prod-read
jobtech-taxonomy-api-prod-write
onprem
The GitLab CI pipeline automatically deploys jobtech-taxonomy-api-testing-read
.
Testing
- https://openshift-gitops-server-openshift-gitops.apps.testing.services.jtech.se
- https://console-openshift-console.apps.testing.services.jtech.se
Production
- https://openshift-gitops-server-openshift-gitops.prod.services.jtech.se
- https://console-openshift-console.prod.services.jtech.se/dashboards
Traffic Timing
Prefer deploying to production early in the week, after 10:00 but before lunch if possible. Tuesday at 10:00 is probably the best time. This period is both less traffic intensive with respect to internal AF traffic and it gives a better chance of rolling back problematic deploys.
Deployment merge requests
When deploying it is good practice to create an MR against the infra repository and have it approved. But there are cases where it is ok to commit directly against the main branch. For example if multiple persons are working together on a deployment, if one is alone and the deploy is an important bugfix or security fix or if one really wants to.
Deployment to prod-read
- Select the latest commit SHA from https://gitlab.com/arbetsformedlingen/taxonomy-dev/backend/jobtech-taxonomy-api.
- Verify that it is the same SHA as the one that can be found under
newTag
in https://gitlab.com/arbetsformedlingen/taxonomy-dev/backend/jobtech-taxonomy-api-infra/-/blob/main/kustomize/overlays/jobtech-taxonomy-api-testing-read/kustomization.yaml?ref_type=heads. - Verify that the testing read environment has updated and is functioning correctly.
- Verify that the configuration for the taxonomy API is the same in ´testing-read´ and ´prod-read´.
- Edit
newTag
in https://gitlab.com/arbetsformedlingen/taxonomy-dev/backend/jobtech-taxonomy-api-infra/-/blob/main/kustomize/overlays/jobtech-taxonomy-api-prod-read/kustomization.yaml?ref_type=heads to match the testing-read version. - Await approval if necessary.
- Verify that the deployment went well in https://openshift-gitops-server-openshift-gitops.prod.services.jtech.se and https://console-openshift-console.prod.services.jtech.se/dashboards.
- Perform a staggered restart of the varnish cache system. Currently it will not do this automatically, so some changes will not be available to the users until the cache is restarted.
- Verify that the https://taxonomy.api.jobtechdev.se/v1/taxonomy/status/build call has the correct commit SHA.
Deployment to prod-write
- MAKE SURE THAT REDAKTIONEN KNOWS THIS IS HAPPENING!
- Select the latest commit SHA from https://gitlab.com/arbetsformedlingen/taxonomy-dev/backend/jobtech-taxonomy-api.
- Verify that it is the same SHA as the one that can be found under
newTag
in https://gitlab.com/arbetsformedlingen/taxonomy-dev/backend/jobtech-taxonomy-api-infra/-/blob/main/kustomize/overlays/jobtech-taxonomy-api-testing-write/kustomization.yaml?ref_type=heads. - Verify that the testing read environment has updated and is functioning correctly.
- Verify that the configuration for the taxonomy API is the same in ´testing-write´ and ´prod-write´.
- Edit
newTag
in https://gitlab.com/arbetsformedlingen/taxonomy-dev/backend/jobtech-taxonomy-api-infra/-/blob/main/kustomize/overlays/jobtech-taxonomy-api-prod-write/kustomization.yaml?ref_type=heads to match the testing-read version. - Await approval if necessary.
- Verify that the deployment went well in https://openshift-gitops-server-openshift-gitops.prod.services.jtech.se and https://console-openshift-console.prod.services.jtech.se/dashboards.
- Perform a staggered restart of the varnish cache system. Currently it will not do this automatically, so some changes will not be available to the users until the cache is restarted.
- Verify that the https://api-jobtech-taxonomy-api-prod-write.prod.services.jtech.se/v1/taxonomy/status/build call has the correct commit SHA.
Getting up and running using Podman
Ensure that the aws-secrets-test.txt
and api-secrets-test.txt
are present.
podman build -t jt-api .
podman run --network="host" --name api --env-file=../jobtech-taxonomy-api-gitops/jobtech-taxonomy-api-deploy-secrets/test/aws-secrets-test.txt --env-file=../jobtech-taxonomy-api-gitops/jobtech-taxonomy-api-deploy-secrets/test/api-secrets-test.txt --env-file=dev.env --rm jt-api
Building and running
Build the image:
podman build . -t api
Run the API locally using podman with Datomic as backend, run:
podman run -p 3000:3000 --name api --rm api
Run the API locally using podman with Datahike as backend, with the database in the folder datahike-file-db, run:
podman run -v $PWD/datahike-file-db:/datahike-file-db:Z -e DATABASE_BACKEND=datahike -p 3000:3000 --name api --rm api
Stop the API in another terminal, run:
podman container stop api
Docker
Build the image:
podman build . -t api
Run the API locally using podman with Datomic as backend, run:
podman run -p 3000:3000 --name api --rm api
Run the API locally using podman with Datahike as backend, with the database in the folder datahike-file-db, run:
podman run -v $PWD/datahike-file-db:/datahike-file-db:Z -e DATABASE_BACKEND=datahike -p 3000:3000 --name api --rm api
Stop the API in another terminal, run:
podman container stop api
Monitoring
Please find the logs of deployed instances of the Taxonomy at one of the following addresses:
Technologies and libraries
This chapter contains some information regarding the larger components used to build the taxonomy.
The database and the logging systems.
Database
Database backends
The application uses Datomic flavoured datalog as its database query language. The actual database backend can be configured to be any Datomic compatible database, such as Datomic or Datahike.
The Datomic model is a bit different from the traditional SQL model. It is a graph database, where the data is stored as a set of facts. The facts are a list of tuples called datoms. A datom is a record of a relation between two entities, transaction data and a boolean indicating if the datom is applied or redacted. A transacted datom is never changed, but the seen value can be updated by later transactions. This means that the database is append-only and the history of the database is kept, so it is always possible to review the changes that led to the current state of the database.
Logging
By default, logging functionality is provided by the timbre
library. This is used together with the slf4j-timbre
and slf4j-api
helpers that manages logging from components using slf4j
logging.
Any Clojure data structures can be logged directly.
Examples
(ns example
(:require [taoensso.timbre :as log]))
(log/info "Hello")
=>[2015-12-24 09:04:25,711][INFO][myapp.handler] Hello
(log/debug {:user {:id "Anonymous"}})
=>[2015-12-24 09:04:25,711][DEBUG][myapp.handler] {:user {:id "Anonymous"}}
Configuring logging
Each profile has its own log configuration found under the env/*/resources/taoensso.timbre.config.edn
. For example, this is what the dev
logging config looks like:
{:min-level [[#{"io.methvin.watcher.*"} :info]
[#{"datahike.*"} :info]
[#{"org.eclipse.jetty.*"} :warn]
[#{"jobtech-taxonomy-api.db.database-connection"} :warn]
[#{"jobtech-taxonomy.features.mapping.*"} :info]
[#{"*"} :debug]]}
There is also a common configuration that is not specific to any profile found at resources/taoensso.timbre.config.edn
:
{:min-level [[#{"*"} :info]]}
Logging of exceptions
(ns example
(:require [taoensso.timbre :as log]))
(log/error (Exception. "I'm an error") "something bad happened")
=>[2015-12-24 09:43:47,193][ERROR][myapp.handler] something bad happened
java.lang.Exception: I'm an error
at myapp.handler$init.invoke(handler.clj:21)
at myapp.core$start_http_server.invoke(core.clj:44)
at myapp.core$start_app.invoke(core.clj:61)
...
Local development
When developing new functionality for jobtech-taxonomy-api, you want to use the read version of taxonomy database, which is contained in the tax-api source code repo under ~/resources/taxonomy.zip
. The it is used in the default configuration, so you just need to start start the application either from REPL or with clj
command described in and it will work. You can either run the taxonomy api from either your terminal or from a repl.
Editor specific instructions
These are some specific instructions for various editors to help you integrate it with this project and get a development REPL up and running in your editor.
Editor | Instructions |
---|---|
Emacs | The file .dir-locals.el configures the REPL for development. Just call M-x cider-jack-in to run the REPL. |
Vim | ... |
VSCode, Calva | ... |
Adding auto code formatting as a pre-commit hook
Run this command to configure git to look for hooks in .githooks/:
git config --local core.hooksPath .githooks/
COMMON ERRORS
If you get :server-type must be :cloud, :peer-server, or :local
you have forgot to start the taxonomy api. Run (start)
in the user>
namespace
References
Using the REPL
The common way of launching a REPL is from within an IDE. However, you can also launch a REPL directly from the terminal using the following command:
clj -M:dev
The advantage of running a REPL from the IDE is typically that you can send code from the IDE directly to the REPL to be evaluated.
Once started, it will look like this:
Clojure 1.11.3
user=>
To launch the server with the default configuration, do
user=> (start)
which will start a server and once started will display something like this:
2024-05-03T08:31:39.402Z jonas INFO [jobtech-taxonomy.api.core:33] - Started http://jonas:8080
{:started ["#'dev/http-server"]}
The user
namespace is initialised from env/dev/clj/user, in particular the comment
clauses towards the end describes how the functionality in the namespace is intended to be used.
(ns user
{:clj-kondo/config '{:linters {:clojure-lsp/unused-public-var {:level :off}}}}
(:require [clj-http.client :as client]
[clojure.data.json :as cdj]
[clojure.pprint :refer [pprint]]
[clojure.spec.alpha :as s]
[dev]
[expound.alpha :as expound]
[jobtech-taxonomy.api.config :refer [get-config transduce-backends]]
[jobtech-taxonomy.api.core]
[mount.core :as mount]
[taoensso.timbre :as log]))
(alter-var-root #'s/*explain-out* (constantly expound/printer))
(defn add-taps []
(add-tap (bound-fn* pprint)))
(def dev-cfg (atom (get-config [])))
(defn show-configs []
(let [cfg @dev-cfg]
{:available (mapv :id (:backends cfg))
:active (:database-backend cfg)
:multi (:compare-backends cfg)}))
(defn activate-config [cfg-id & cfg-ids]
(swap! dev-cfg
#(-> %
(assoc :database-backend cfg-id)
(dissoc :compare-backends)
(merge (if cfg-ids {:compare-backends (into [cfg-id] cfg-ids)} {}))))
(show-configs))
(comment
(load-config)
(show-configs)
(activate-config ':datahike-v.dev)
@dev-cfg
'bye)
(def api-root
(str "http://localhost"
(if-let [port (get-in @dev-cfg [:options :port])]
(str ":" port "/")
"/")))
(defn ^:export ppt []
(pprint *1))
(defn ^:export get-raw-body [path]
(-> (str api-root path)
client/get
:body))
(defn get-json-body [path]
(-> (get-raw-body path)
(cdj/read-str :key-fn keyword)))
(defn ^:export api-json []
(get-json-body "v1/taxonomy/swagger.json"))
(defn ^:export tax-get [path]
(get-json-body path))
(defn ^:export load-config
([] (reset! dev-cfg (get-config [])))
([cfg] (reset! dev-cfg cfg)))
(defn start []
(reset! @#'mount/-args @dev-cfg)
(mount/start-without #'dev/repl-server))
(defn stop []
(mount/stop-except #'dev/repl-server))
(defn restart []
(stop)
(start))
(defn query-url [& re-parts]
(->> (api-json)
:paths
(keep (fn [[k m]]
(let [url (name k)]
(when (every? #(re-find % url) re-parts)
{:url (name k) :methods (keys m) :info m}))))
vec))
(defn reduce-param [params]
(->> (group-by :in params)
(map (fn [[n g]]
[n (mapv #(let [dissoc-falsy ((remove %) dissoc)]
(-> (dissoc % :in)
(dissoc-falsy :required)
(dissoc-falsy :deprecated))) g)]))))
(defn get-query [api-entry method]
(when-let [method-info (get-in api-entry [:info method])]
{:url (:url api-entry)
:method method
:summary (:summary method-info)
:parameters (reduce-param (:parameters method-info))}))
(comment
;; This example demonstrates how you can start a short-lived server
;; to try things out. In this particular case, we attempt to reproduce
;; the datomic timeout reported in
;;
;; https://gitlab.com/arbetsformedlingen/taxonomy-dev/backend/jobtech-taxonomy-api/-/issues/652
;;
#_{:clj-kondo/ignore [:unresolved-symbol]}
(dev/with-http-server
[{:keys [url]} (->> []
get-config
(transduce-backends (filter (comp #{:datomic} :type))))]
(client/get
(str url "/v1/taxonomy/suggesters/autocomplete?query-string=mjaoelimjaoelimjao&related-ids=i46j_HmG_v64&relation=narrower")
{:accept :json}))
)
(comment
;; Show what will be logged and how.
(pprint log/*config*)
;; Possibly add taps.
(add-taps)
;; Start by loading a config, any config, for the mount-system.
;; This is optional, but allows any config to be loaded.
(load-config (get-config []))
;; Proceed by starting the mountable things, except for the REPL, since you are in it.
(start)
;; Let's look at the Swagger API!
;; query-url takes some reg-exps that has to match the url
(->> (query-url #"relation" #"changes" #"main")
(keep #(get-query % :get))
pprint)
;; (log/set-min-level! :debug)
;; Change the config. This particular one will break the system.
(reset! @#'mount/-args {:options {:port 3000}})
;; Just to clear the air a bit.
(restart)
;; Stop the system!
(stop)
'bye!
)
IDE integration
Clojure development is usually easiest with a REPL running inside an IDE.
VSCode and Calva
Install VS Code and the Calva plugin.
Then follow the instructions in Local development to configure your system for local development.
Emacs and CIDER
Platforms
The Taxonomy can be run on Windows, Linux or Mac OS. When using Windows the Taxonomy API using WSL 2.
After having installed WSL 2 with a linux distro, you can follow the instructions below for Linux how set up the environment.
VS Code and Calva is likely to be a good choice for development.
To run the Taxonomy API under Linux, you will need:
- Some JDK (Java Development Kit)
- Clojure CLI, which also requires
bash
curl
rlwrap
.
The Taxonomy API is Java-based and requires a JDK in order to be built and run. Make sure that you have a JDK on your system. OpenJDK is a good default. Read about how it can be installed on this page: https://openjdk.org/install/. However, the official Clojure page recommends Adoptium which might also be a good choice.
To install Clojure, visit install Clojure
For Graphviz endpoint to work locally, you'll need to have Graphviz installed locally. See https://graphviz.org/download/#linux for installation instructions under Linux.