Getting Started with Docupedia Autopilot#

Introduction#

The Docupedia Fetcher Autopilot is a tool that allows you to download the content along with the metadata of a Docupedia page. The fetcher will store in the evidence folder three types of content:

  • a simple html file,

  • a metadata file and

  • a styled html version along with all its assets(images, css styles, fonts, js, etc).

Preparation#

Docupedia Access Prerequisites#

In order for the Autopilot to work properly, a Personal Access Token (PAT) with access to the requested Docupedia page is needed.

To obtain the PAT go to your Docupedia profile icon in the top-right corner of the page, select Settings and click on Personal Access Tokens. Click here or go directly to your Docupedia URL e.g. https://<your-docupedia-site>/confluence/plugins/personalaccesstokens/usertokens.action to quickly generate your PAT.

Warning

The default value for the DOCUPEDIA_URL points to a Bosch server and can only be accessed from within BCN (Bosch Corporate Network). If you are not located in the BCN, you must change this. An appropriate value would be an URL including the context. For example, http://example.com:8080/confluence, where confluence represents the context.

Limitations#

The Docupedia Fetcher is currently limited to Docupedia pages without restrictions. If the page is restricted, the fetcher will not be able to download the content. This is due to the fact that the Docupedia REST API does not allow to download the content of restricted pages.

Adjust the environment variables#

To configure the Docupedia Fetcher, you will need to set the following environment variables:

  1. To obtain DOCUPEDIA_PAGE_ID go to your Docupedia page, click on the 3 dots in the top-right corner, select Page Information and look at the number following pageId= in the URL.

  2. Set DOCUPEDIA_PAT to the PAT obtained at the step above where you set up your Personal Access Tokens.

Optionally,

  1. Set DOCUPEDIA_URL in case you need a Docupedia URL including context. An example would be http://example.com:8080/confluence.

  2. Set DOCUPEDIA_PAGE_DIFF_VERSIONS in case you need to obtain the diff between two Docupedia page content versions. This should be represented by two numbers equal or less than 0(zero), separated by a comma.

  3. Set DOCUPEDIA_PAGE_DIFF_DATE_THRESHOLD in case you need to get a page version relative to a certain threshold. This should be in ISO 8601-1:2019 format representing Date and Time.

  4. Set OUTPUT_NAME to desired name of the output files. The default value is set to docupedia_content.

  5. Set OUTPUT_PATH to desired name of the output path. The default value is the current working directory.

For advanced use-cases you may configure:

  1. To obtain DOCUPEDIA_SCHEME_ID, go to your Docupedia page, open the Network tab in developer tools of your browser and clear the activity. Click on the 3 dots in the top-right corner and select Export to HTML. Select the request to export-scheme endpoint and in the response you get a list of scheme IDs. Use the desired id value from the list. The default value is set to bundled_default if no scheme is specified.

  2. To obtain DOCUPEDIA_EXPORTER_ID, look at the request parameters of the previous steps. The value is associated to exporterId request parameter. The default value is set to com.k15t.scroll.scroll-html:html-exporter.

Adjust the config file#

The qg-config.yaml#

Below is an example configuration file that runs Docupedia Fetcher. The autopilot is configured in lines: 7-15. Required environment variables are read from provided run environment variables or secrets. Then the autopilot is used by the check 1.1 in line 30 which is part of requirement 2.6.

In this example, a simple check is done in line 10, to check if the page was downloaded successfully.

 1metadata:
 2  version: v1
 3header:
 4  name: MACMA
 5  version: 1.16.0
 6autopilots:
 7  docupedia-autopilot:
 8    run: |
 9      docupedia-fetcher
10      filecheck exists "${{ env.OUTPUT_NAME }}.html"
11    env:
12      DOCUPEDIA_PAGE_ID: ${{ env.DOCUPEDIA_PAGE_ID }}
13      DOCUPEDIA_PAT: ${{ secrets.DOCUPEDIA_PAT }}
14      DOCUPEDIA_URL: ${{ env.DOCUPEDIA_URL }}
15      OUTPUT_NAME: docupedia_content
16finalize:
17  run: |
18    html-finalizer
19chapters:
20  "1":
21    title: Project management
22    requirements:
23      "2.6":
24        title: The requirements for information security and data protection are considered.
25        text: The data protection compliance have to be guaranteed
26        checks:
27          "1.1":
28            title: Download docupedia page content
29            automation:
30              autopilot: docupedia-autopilot