Getting Started with Docupedia Autopilot#
Introduction#
The Docupedia Fetcher Autopilot is a tool that allows you to download the content along with the metadata of a Docupedia page. The fetcher will store in the evidence folder three types of content:
a simple html file,
a metadata file and
a styled html version along with all its assets(images, css styles, fonts, js, etc).
Preparation#
Docupedia Access Prerequisites#
In order for the Autopilot to work properly, a Personal Access Token (PAT) with access to the requested Docupedia page is needed.
To obtain the PAT go to your Docupedia profile icon in the top-right corner of the page, select Settings and click on Personal Access Tokens. Click here or go directly to your Docupedia URL e.g. https://<your-docupedia-site>/confluence/plugins/personalaccesstokens/usertokens.action to quickly generate your PAT.
Warning
The default value for the DOCUPEDIA_URL points to a Bosch server and can only be accessed from within BCN (Bosch Corporate Network). If you are not located in the BCN, you must change this. An appropriate value would be an URL including the context. For example, http://example.com:8080/confluence, where confluence represents the context.
Limitations#
The Docupedia Fetcher is currently limited to Docupedia pages without restrictions. If the page is restricted, the fetcher will not be able to download the content. This is due to the fact that the Docupedia REST API does not allow to download the content of restricted pages.
Adjust the environment variables#
To configure the Docupedia Fetcher, you will need to set the following environment variables:
To obtain
DOCUPEDIA_PAGE_IDgo to your Docupedia page, click on the 3 dots in the top-right corner, select Page Information and look at the number followingpageId=in the URL.Set
DOCUPEDIA_PATto the PAT obtained at the step above where you set up your Personal Access Tokens.
Optionally,
Set
DOCUPEDIA_URLin case you need a Docupedia URL including context. An example would behttp://example.com:8080/confluence.Set
DOCUPEDIA_PAGE_DIFF_VERSIONSin case you need to obtain the diff between two Docupedia page content versions. This should be represented by two numbers equal or less than 0(zero), separated by a comma.Set
DOCUPEDIA_PAGE_DIFF_DATE_THRESHOLDin case you need to get a page version relative to a certain threshold. This should be in ISO 8601-1:2019 format representing Date and Time.Set
OUTPUT_NAMEto desired name of the output files. The default value is set todocupedia_content.Set
OUTPUT_PATHto desired name of the output path. The default value is the current working directory.
For advanced use-cases you may configure:
To obtain
DOCUPEDIA_SCHEME_ID, go to your Docupedia page, open the Network tab in developer tools of your browser and clear the activity. Click on the 3 dots in the top-right corner and select Export to HTML. Select the request toexport-schemeendpoint and in the response you get a list of scheme IDs. Use the desired id value from the list. The default value is set tobundled_defaultif no scheme is specified.To obtain
DOCUPEDIA_EXPORTER_ID, look at the request parameters of the previous steps. The value is associated toexporterIdrequest parameter. The default value is set tocom.k15t.scroll.scroll-html:html-exporter.
Adjust the config file#
The qg-config.yaml#
Below is an example configuration file that runs Docupedia Fetcher. The autopilot is configured in lines: 7-15. Required environment variables are read from provided run environment variables or secrets. Then the autopilot is used by the check 1.1 in line 30 which is part of requirement 2.6.
In this example, a simple check is done in line 10, to check if the page was downloaded successfully.
1metadata:
2 version: v1
3header:
4 name: MACMA
5 version: 1.16.0
6autopilots:
7 docupedia-autopilot:
8 run: |
9 docupedia-fetcher
10 filecheck exists "${{ env.OUTPUT_NAME }}.html"
11 env:
12 DOCUPEDIA_PAGE_ID: ${{ env.DOCUPEDIA_PAGE_ID }}
13 DOCUPEDIA_PAT: ${{ secrets.DOCUPEDIA_PAT }}
14 DOCUPEDIA_URL: ${{ env.DOCUPEDIA_URL }}
15 OUTPUT_NAME: docupedia_content
16finalize:
17 run: |
18 html-finalizer
19chapters:
20 "1":
21 title: Project management
22 requirements:
23 "2.6":
24 title: The requirements for information security and data protection are considered.
25 text: The data protection compliance have to be guaranteed
26 checks:
27 "1.1":
28 title: Download docupedia page content
29 automation:
30 autopilot: docupedia-autopilot