Getting Started with Docupedia Autopilot#
Introduction#
The Docupedia Fetcher Autopilot is a tool that allows you to download the content along with the metadata of a Docupedia page. The fetcher will store in the evidence folder three types of content:
a simple html file,
a metadata file and
a styled html version along with all its assets(images, css styles, fonts, js, etc).
Preparation#
Docupedia Access Prerequisites#
In order for the Autopilot to work properly, a Personal Access Token (PAT) with access to the requested Docupedia page is needed.
To obtain the PAT go to your Docupedia profile icon in the top-right corner of the page, select Settings and click on Personal Access Tokens. Click here or go directly to your Docupedia URL e.g. https://<your-docupedia-site>/confluence/plugins/personalaccesstokens/usertokens.action
to quickly generate your PAT.
Warning
The default value for the DOCUPEDIA_URL
points to a Bosch server and can only be accessed from within BCN (Bosch Corporate Network). If you are not located in the BCN, you must change this. An appropriate value would be an URL including the context. For example, http://example.com:8080/confluence
, where confluence
represents the context.
Limitations#
The Docupedia Fetcher is currently limited to Docupedia pages without restrictions. If the page is restricted, the fetcher will not be able to download the content. This is due to the fact that the Docupedia REST API does not allow to download the content of restricted pages.
Adjust the environment variables#
To configure the Docupedia Fetcher, you will need to set the following environment variables:
To obtain
DOCUPEDIA_PAGE_ID
go to your Docupedia page, click on the 3 dots in the top-right corner, select Page Information and look at the number followingpageId=
in the URL.Set
DOCUPEDIA_PAT
to the PAT obtained at the step above where you set up your Personal Access Tokens.
Optionally,
Set
DOCUPEDIA_URL
in case you need a Docupedia URL including context. An example would behttp://example.com:8080/confluence
.Set
DOCUPEDIA_PAGE_DIFF_VERSIONS
in case you need to obtain the diff between two Docupedia page content versions. This should be represented by two numbers equal or less than 0(zero), separated by a comma.Set
DOCUPEDIA_PAGE_DIFF_DATE_THRESHOLD
in case you need to get a page version relative to a certain threshold. This should be in ISO 8601-1:2019 format representing Date and Time.Set
OUTPUT_NAME
to desired name of the output files. The default value is set todocupedia_content
.Set
OUTPUT_PATH
to desired name of the output path. The default value is the current working directory.
For advanced use-cases you may configure:
To obtain
DOCUPEDIA_SCHEME_ID
, go to your Docupedia page, open the Network tab in developer tools of your browser and clear the activity. Click on the 3 dots in the top-right corner and select Export to HTML. Select the request toexport-scheme
endpoint and in the response you get a list of scheme IDs. Use the desired id value from the list. The default value is set tobundled_default
if no scheme is specified.To obtain
DOCUPEDIA_EXPORTER_ID
, look at the request parameters of the previous steps. The value is associated toexporterId
request parameter. The default value is set tocom.k15t.scroll.scroll-html:html-exporter
.
Adjust the config file#
The qg-config.yaml#
Below is an example configuration file that runs Docupedia Fetcher. The autopilot is configured in lines: 7-15. Required environment variables are read from provided run environment variables or secrets. Then the autopilot is used by the check 1.1 in line 30 which is part of requirement 2.6.
In this example, a simple check is done in line 10, to check if the page was downloaded successfully.
1metadata:
2 version: v1
3header:
4 name: MACMA
5 version: 1.16.0
6autopilots:
7 docupedia-autopilot:
8 run: |
9 docupedia-fetcher
10 filecheck exists "${{ env.OUTPUT_NAME }}.html"
11 env:
12 DOCUPEDIA_PAGE_ID: ${{ env.DOCUPEDIA_PAGE_ID }}
13 DOCUPEDIA_PAT: ${{ secrets.DOCUPEDIA_PAT }}
14 DOCUPEDIA_URL: ${{ env.DOCUPEDIA_URL }}
15 OUTPUT_NAME: docupedia_content
16finalize:
17 run: |
18 html-finalizer
19chapters:
20 "1":
21 title: Project management
22 requirements:
23 "2.6":
24 title: The requirements for information security and data protection are considered.
25 text: The data protection compliance have to be guaranteed
26 checks:
27 "1.1":
28 title: Download docupedia page content
29 automation:
30 autopilot: docupedia-autopilot