<?xml version="1.0" encoding="utf-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xml:base="https://idee.frank-siebert.de/">
 <channel>
  <title>Concept of new cognition elicitation personally thinking</title>
  <atom:link href="concept-rss.xml" rel="self" type="application/rss+xml"/>
  <link>https://idee.frank-siebert.de/concept-index.html</link>
  <description>Concept</description>
  <lastBuildDate>Wed, 18 Mar 2026 18:49:25 +0000</lastBuildDate>
  <language>en</language>
  <generator>pandoc, fs-commit-msg-hook 1.0</generator>
  <image>
   <url>https://idee.frank-siebert.de/image/favicon-256x256-150x150.png</url>
   <title>Concept of new cognition elicitation personally thinking</title>
   <link>https://idee.frank-siebert.de/concept-index.html</link>
   <width>64</width>
   <height>64</height>
  </image>
  <item>
   <title>Vim Plugin for Web
Publishing</title>
   <link>https://idee.frank-siebert.de/article/vim-plugin-for-web-publishing.html</link>
   <pubDate>Mon, 09 Mar 2026 10:08:33 +0000</pubDate>
   <guid isPermaLink="false">https://idee.frank-siebert.de/article/vim-plugin-for-web-publishing.html</guid>
   <description><![CDATA[I wrote lengthily how I moved away from wordpress to write articles in mediawiki and publish via git hook to my then new designed web site. Now its time to report the sunset of my mediawiki. I moved on to Vim and GitLab flavored Markdown. ...]]></description>
   <content:encoded><![CDATA[<div> <div> <h1> Vim Plugin for Web Publishing </h1> <div> <time datetime="2026-03-09T10:08:33" pubdate="true"> 2026-03-09 </time> <address> Frank Siebert </address> </div> <div> <figure> <a href="https://idee.frank-siebert.de/qrcode/vim-plugin-for-web-publishing.png"> <img src="https://idee.frank-siebert.de/qrcode/vim-plugin-for-web-publishing.png"/> </a> <figcaption> </figcaption> </figure> <figure> <a accesskey="p" href="https://idee.frank-siebert.de/pdf/vim-plugin-for-web-publishing.pdf" target="_blank" type="application/pdf"> <img src="https://idee.frank-siebert.de/image/3cd97bab8bb20288768b35fd72979ec3bbf4b2a8.png"/> </a> </figure> <a href="https://idee.frank-siebert.de/legal/creative-commons-cc0-1-0-universal.html"> <img src="https://idee.frank-siebert.de/image/CC-Icon.png"/> </a> <a href="https://idee.frank-siebert.de/legal/creative-commons-cc0-1-0-universal.html"> <img src="https://idee.frank-siebert.de/image/CC0-Icon.png"/> </a> </div> </div> <p> <strong> I wrote lengthily how I moved away from wordpress to write articles in mediawiki and publish via git hook to my then new designed web site. Now its time to report the sunset of my mediawiki. I moved on to Vim and GitLab flavored Markdown. </strong> </p> <!-- pdf --> <!-- article-licence: cc0 --> <div class="note"> <p> <strong> Update 5 2026-03-18: </strong> I used blockquote in the past for update notes like this as well. Since I decided to visualize quotes as it became common with a light-gray left border, I need now a different solution for the update notes. </p> <p> These are now maintained as: </p> <div class="sourceCode"> <pre class="sourceCode bash"><code class="sourceCode bash"><span><a aria-hidden="true" href="#cb1-1"></a><span class="op">&gt;</span> <span class="ex">[!note]</span></span> <span><a aria-hidden="true" href="#cb1-2"></a><span class="op">&gt;</span> <span class="ex">**Update</span> yyyy-MM-dd:<span class="pp">**</span> Note text as it applies</span></code></pre> </div> <p> which creates <code> &lt;div class="note"&gt;&lt;div class="title"&gt;Note&lt;/div&gt;...&lt;/div&gt; </code> in the HTML. I updated the code to remove the superfluous <code> &lt;div class="title"&gt; </code> tag. I considered this a style matter, see function <code> style() </code> . </p> <p> <strong> Update 4 2026-03-18: </strong> Markdown does not define any standard to control columnwidth at table columns. Applying a style update to an old article, I required to enable such a feature. </p> <div class="sourceCode"> <pre class="sourceCode bash"><code class="sourceCode bash"><span><a aria-hidden="true" href="#cb2-1"></a><span class="kw">|</span> <span class="ex">%25%</span> Header1 <span class="kw">|</span> <span class="ex">%25%</span> Header2 <span class="kw">|</span> <span class="ex">%50%</span> Header3 <span class="kw">|</span></span> <span><a aria-hidden="true" href="#cb2-2"></a><span class="kw">|</span><span class="ex">:-------------</span><span class="kw">|</span><span class="ex">:-------------</span><span class="kw">|</span><span class="ex">:----------------------</span><span class="kw">|</span></span> <span><a aria-hidden="true" href="#cb2-3"></a><span class="kw">|</span> <span class="ex">text</span> 1 <span class="kw">|</span> <span class="ex">text</span> 2 <span class="kw">|</span> <span class="ex">Much</span> longer text here <span class="kw">|</span></span></code></pre> </div> <p> Above formating directions lead to columnwidth entries at the <code> &lt;th&gt; </code> tags. It's in the first part of the new function <code> tables() </code> </p> <p> <strong> Update 3 2026-03-18: </strong> Markdown does not define any standard to create rowspan for table cells. For the update of an existing article I needed it. </p> <div class="sourceCode"> <pre class="sourceCode bash"><code class="sourceCode bash"><span><a aria-hidden="true" href="#cb3-1"></a><span class="kw">|</span> <span class="ex">Header1</span> <span class="kw">|</span> <span class="ex">Header2</span> <span class="kw">|</span> <span class="ex">Header3</span> <span class="kw">|</span></span> <span><a aria-hidden="true" href="#cb3-2"></a><span class="kw">|</span><span class="ex">:-------------</span><span class="kw">|</span><span class="ex">:-------------</span><span class="kw">|</span><span class="ex">:----------------------</span><span class="kw">|</span></span> <span><a aria-hidden="true" href="#cb3-3"></a><span class="kw">|</span> <span class="ex">text</span> 1 <span class="kw">|</span> <span class="ex">text</span> a <span class="kw">|</span> <span class="ex">belongs</span> to 1.a <span class="kw">|</span></span> <span><a aria-hidden="true" href="#cb3-4"></a><span class="kw">|</span> <span class="ex">^</span> <span class="kw">|</span> <span class="ex">text</span> b <span class="kw">|</span> <span class="ex">belongs</span> to 1.b <span class="kw">|</span></span> <span><a aria-hidden="true" href="#cb3-5"></a><span class="kw">|</span> <span class="ex">text</span> 2 <span class="kw">|</span> <span class="ex">^</span> <span class="kw">|</span> <span class="ex">belongs</span> to 2.b <span class="kw">|</span></span></code></pre> </div> <p> The char <code> ^ </code> is used to signal, that this cell gets joined with its predecessor. No further text in this cell must exist. The implementation is in the second part of the new function `tables() </p> <p> <strong> Update 2 2026-03-18: </strong> Weasy does automatic page breaks in tables, even repeating the table header on the new page. However, depending on the space requirement of the single cell, some cells of one row might be on one page and some other on the other. </p> <p> I enabled now to set a manual page-break in the first table-cell of the row to be moved to the next page via <code> &lt;!-- page-break --&gt; </code> . The implementation is in method <code> page_break() </code> . </p> <p> <strong> Update 1 2026-03-18: </strong> subprocess calls got the additional parameter <code> check=True </code> . </p> </div> <br/> <br/> <br/> <hr/> <p> I'll not go into the details of the migration. It's enough to mention that I extracted all wiki content as mediawiki files and converted those to markdown via Pandoc. </p> <p> Those mediawiki files, which where my article source files, where converted the same way. </p> <p> No, this time I will restrict myself to describe the authoring and publishing plugin I created for Vim. Another article might follow to describe the implementation of the deployment via git hook. </p> <p> The appearance of the web site stays the same, and my design goals for the site as well. </p> <ul class="incremental"> <li> No JavaScript, at least not for the portal, only for the content, if it does require it. </li> <li> No resources linked into the site, which are loaded from 3rd-party servers. </li> </ul> <p> Sure, my content has links to other sites, as I'm used to provide references in my articles. But you have to click on these consciously to be taken to the referenced page. The visitor is in control. </p> <!-- page-break --> <h2> Table of Content </h2> <!-- page-break --> <h2> Plugin Short Description </h2> <p> It's a bit more, but this is the core functionality: </p> <ul class="incremental"> <li> Render markdown into HTML, applying the semantic HTML structure, the style and, if applicable, media links, as they show up in the published result. </li> <li> Hand the rendered HTML and its assets over to the deployment staging area of my site. </li> </ul> <p> Some assets are generated together with the HTML, the QR-Code which contains the link to the article and the PDF containing the same article as the HTML. </p> <p> If an audio is recorded by me to provide the article as podcast, that is linked into the HTML as well. </p> <p> Those steps where done before by software launched via git-hook during commit, which means that my deployment process became much leaner now. But that is part of the deployment to the web-site, and I plan to write a separate article about that. </p> <h2> Meta Data </h2> <p> There are two kinds of meta data, which, due to the lack of a better place, have to be inserted into the markdown file. To place this meta data information into the markdown file, I decided to use XML/HTML comments like: </p> <pre class="hmtl"><code>&lt;!-- pdf --&gt;</code></pre> <p> The plugin provides a list of meta data comments to insert the selected one at the current cursor position. </p> <h3> Control Data </h3> <p> Shall a PDF be created? Do I want to have a table of content, and where would I like to have that? Where shall the references be placed? Where are additional page-breaks required in the PDF? Which language is the article in? </p> <p> While writing mainly in German, there are sometimes exceptions to that. This article, for example, is in English, and will therefore not be shown as part of the "idee"-portal, but of the "concept"-portal. This makes the language information in my case control data. </p> <p> It is, however, also article meta data. </p> <h3> Article Meta Data </h3> <p> Who is the author? Under which Licence is the content provided? </p> <p> As long as I write all articles myself, always using the creative-commons zero licence, I could just hard code this information. And a lot things, which should be customizable, are currently not. But I decided to enable guest articles with different content licence. At least in principle that function is there, I would need to add some information about this other content licence the first time it is used. </p> <!-- page-break --> <h2> Workflow and File System Structure </h2> <p> Lets first take a look at the target file system structure. </p> <div class="sourceCode"> <pre class="sourceCode bash"><code class="sourceCode bash"><span><a aria-hidden="true" href="#cb5-1"></a><span class="ex">frank@Asimov:~/projects/idee/website$</span> tree <span class="at">-d</span> <span class="at">-L</span> 1</span> <span><a aria-hidden="true" href="#cb5-2"></a><span class="bu">.</span></span> <span><a aria-hidden="true" href="#cb5-3"></a><span class="ex">├──</span> archive</span> <span><a aria-hidden="true" href="#cb5-4"></a><span class="ex">├──</span> article</span> <span><a aria-hidden="true" href="#cb5-5"></a><span class="ex">├──</span> audio</span> <span><a aria-hidden="true" href="#cb5-6"></a><span class="ex">├──</span> css</span> <span><a aria-hidden="true" href="#cb5-7"></a><span class="ex">├──</span> env</span> <span><a aria-hidden="true" href="#cb5-8"></a><span class="ex">├──</span> files</span> <span><a aria-hidden="true" href="#cb5-9"></a><span class="ex">├──</span> image</span> <span><a aria-hidden="true" href="#cb5-10"></a><span class="ex">├──</span> js</span> <span><a aria-hidden="true" href="#cb5-11"></a><span class="ex">├──</span> legal</span> <span><a aria-hidden="true" href="#cb5-12"></a><span class="ex">├──</span> MathJax <span class="at">-</span><span class="op">&gt;</span> SimpleMathJax/resources/MathJax/es5</span> <span><a aria-hidden="true" href="#cb5-13"></a><span class="ex">├──</span> pdf</span> <span><a aria-hidden="true" href="#cb5-14"></a><span class="ex">├──</span> portal</span> <span><a aria-hidden="true" href="#cb5-15"></a><span class="ex">├──</span> qrcode</span> <span><a aria-hidden="true" href="#cb5-16"></a><span class="ex">├──</span> SimpleMathJax</span> <span><a aria-hidden="true" href="#cb5-17"></a><span class="ex">└──</span> sitemap</span></code></pre> </div> <p> Articles, audios, files, images, JavaScripts, PDFs and QR-Codes are placed in separate folders next to next of each other. </p> <p> At one hand we might want to re-use an image or JavaScript already existing, at the other hand we want to have easy access to those files we are working with, and want to avoid searching for these in a long list of files day by day. </p> <p> To facilitate both options, the authoring happens in a separate folder, one per article currently in creation, with the asset-folders beneath and a symbolic link to the idee folder, by which existing assets can be referenced in the markdown file. </p> <p> That looks as follows: </p> <h3> File System Structure </h3> <div class="sourceCode"> <pre class="sourceCode bash"><code class="sourceCode bash"><span><a aria-hidden="true" href="#cb6-1"></a><span class="ex">frank@Asimov:~/projects/writing/1-vim-plugin-idee$</span> tree </span> <span><a aria-hidden="true" href="#cb6-2"></a><span class="bu">.</span></span> <span><a aria-hidden="true" href="#cb6-3"></a><span class="ex">├──</span> audio</span> <span><a aria-hidden="true" href="#cb6-4"></a><span class="ex">├──</span> files</span> <span><a aria-hidden="true" href="#cb6-5"></a><span class="ex">├──</span> idee <span class="at">-</span><span class="op">&gt;</span> /home/frank/projects/idee</span> <span><a aria-hidden="true" href="#cb6-6"></a><span class="ex">├──</span> image</span> <span><a aria-hidden="true" href="#cb6-7"></a><span class="ex">├──</span> js</span> <span><a aria-hidden="true" href="#cb6-8"></a><span class="ex">├──</span> pdf</span> <span><a aria-hidden="true" href="#cb6-9"></a><span class="ex">├──</span> qrcode</span> <span><a aria-hidden="true" href="#cb6-10"></a><span class="ex">└──</span> vim-plugin-for-web-publishing.md</span></code></pre> </div> <p> To create this setup for authoring again and again by hand would be too much work. It is done by the plugin command <code> IdeeFolders </code> . </p> <h3> Workflow </h3> <p> The workflow is quite lean and basic: </p> <ol class="incremental" type="1"> <li> create article folder in your preferred location </li> <li> <code> vim your-markdown-file.md </code> </li> <li> write the article </li> <li> call command <code> IdeeFolders </code> </li> <li> create and link assets </li> <li> add meta data via command <code> IdeeMeta </code> or <c-y> </c-y> </li> <li> call command <code> IdeeDisplay </code> to render HTML (opens the page in Firefox) </li> <li> re-iterate steps 5 to 7 until everything fits </li> <li> call command <code> IdeePublish </code> </li> </ol> <p> The command <code> IdeePublish </code> copies the markdown, the HTML and all assets to the idee folder structure and adjusts the relative links used during authoring to fit to the new location. </p> <p> Step 9 is the handover to the deployment of the article. </p> <h2> Feature List </h2> <p> As shown in the workflow description, the plugin features 4 commands: </p> <ul class="incremental"> <li> <code> IdeeFolders </code> creates the asset folder and the symbolic link to the idee website project. </li> <li> <code> IdeeMeta </code> provides a list of meta data comments </li> <li> <code> IdeeDisplay </code> creates the HTML and opens it in the web-browser </li> <li> <code> IdeePublish </code> hands the article and its assets over to the deployment </li> </ul> <h2> Plugin Folder Structure </h2> <p> This, I confess, took some time to figure out. Where do I need to place what and how to I make sure that it is loaded when a markdown file is edited? </p> <p> As soon as it works, it seems to be simple. </p> <div class="sourceCode"> <pre class="sourceCode bash"><code class="sourceCode bash"><span><a aria-hidden="true" href="#cb7-1"></a><span class="ex">frank@Asimov:~/projects/writing/1-vim-plugin-idee$</span> tree <span class="at">-L</span> 5 ~/.vim/pack/idee/</span> <span><a aria-hidden="true" href="#cb7-2"></a><span class="ex">/home/frank/.vim/pack/idee/</span></span> <span><a aria-hidden="true" href="#cb7-3"></a><span class="ex">└──</span> start</span> <span><a aria-hidden="true" href="#cb7-4"></a> <span class="ex">└──</span> plugin</span> <span><a aria-hidden="true" href="#cb7-5"></a> <span class="ex">├──</span> ftplugin</span> <span><a aria-hidden="true" href="#cb7-6"></a> <span class="ex">│ </span> └── markdown_idee.vim</span> <span><a aria-hidden="true" href="#cb7-7"></a> <span class="ex">└──</span> python3</span> <span><a aria-hidden="true" href="#cb7-8"></a> <span class="ex">└──</span> idee</span> <span><a aria-hidden="true" href="#cb7-9"></a> <span class="ex">├──</span> display.py</span> <span><a aria-hidden="true" href="#cb7-10"></a> <span class="ex">├──</span> folders.py</span> <span><a aria-hidden="true" href="#cb7-11"></a> <span class="ex">├──</span> __init__.py</span> <span><a aria-hidden="true" href="#cb7-12"></a> <span class="ex">├──</span> publish.py</span> <span><a aria-hidden="true" href="#cb7-13"></a> <span class="ex">└──</span> __pycache__</span></code></pre> </div> <p> The folder <code> pack </code> probably stands for package, I do not know. If so, then I have a plugin package named <code> idee </code> , with a folder <code> start </code> which is, as I understand, searched during the start of vim, whether applicable plugins are found below. </p> <p> Below folder <code> plugin </code> there is the <code> ftplugin </code> folder, where the code resides, which makes the python functions available via new vim commands. </p> <p> For whatever reason the python code needs to be placed below a python3 folder or it is not found by vim. </p> <!-- page-break --> <h2> File Type Plugin markdown_idee.vim </h2> <p> There is a binding naming convention for file type plugins, and if you do not adhere to it, it is not loaded when a file of that type is edited. </p> <p> The name has to start with the file type, as it is known to vim. That can be followed by an underline and whatever you like to write afterwards. I stuck with the plugin name, which is also my web-site project name. </p> <p> <strong> markdown_idee.vim </strong> </p> <div class="sourceCode"> <pre class="sourceCode bash"><code class="sourceCode bash"><span><a aria-hidden="true" href="#cb8-1"></a><span class="ex">py3</span> import idee</span> <span><a aria-hidden="true" href="#cb8-2"></a></span> <span><a aria-hidden="true" href="#cb8-3"></a><span class="kw">function</span><span class="ot">! </span><span class="fu">markdown_idee#meta_comments()</span></span> <span><a aria-hidden="true" href="#cb8-4"></a> <span class="bu">let</span> <span class="va">l</span><span class="op">:</span><span class="va">comments</span> <span class="op">=</span> <span class="op">[</span></span> <span><a aria-hidden="true" href="#cb8-5"></a> <span class="dt">\ </span><span class="st">'&lt;!-- author: --&gt; '</span>,</span> <span><a aria-hidden="true" href="#cb8-6"></a> <span class="dt">\ </span><span class="st">'&lt;!-- article-licence: cc0 --&gt; '</span>,</span> <span><a aria-hidden="true" href="#cb8-7"></a> <span class="dt">\ </span><span class="st">'&lt;!-- en-US --&gt; '</span>,</span> <span><a aria-hidden="true" href="#cb8-8"></a> <span class="dt">\ </span><span class="st">'&lt;!-- page-break --&gt; '</span>,</span> <span><a aria-hidden="true" href="#cb8-9"></a> <span class="dt">\ </span><span class="st">'&lt;!-- pdf --&gt; '</span>,</span> <span><a aria-hidden="true" href="#cb8-10"></a> <span class="dt">\ </span><span class="st">'&lt;!-- references --&gt; '</span>,</span> <span><a aria-hidden="true" href="#cb8-11"></a> <span class="dt">\ </span><span class="st">'&lt;!-- toc --&gt; '</span></span> <span><a aria-hidden="true" href="#cb8-12"></a> <span class="dt">\ </span><span class="op">]</span></span> <span><a aria-hidden="true" href="#cb8-13"></a></span> <span><a aria-hidden="true" href="#cb8-14"></a> <span class="kw">function</span><span class="ot">! </span><span class="ex">s:meta_insert</span><span class="er">(</span><span class="fu">id</span>, result<span class="kw">)</span> <span class="ex">closure</span></span> <span><a aria-hidden="true" href="#cb8-15"></a> <span class="cf">if</span> <span class="ex">a:result</span> <span class="op">&gt;</span> 0 <span class="kw">&amp;&amp;</span> <span class="ex">a:result</span> <span class="op">&lt;</span>= len<span class="er">(</span><span class="ex">l:comments</span><span class="kw">)</span></span> <span><a aria-hidden="true" href="#cb8-16"></a> <span class="ex">execute</span> <span class="st">"normal! i"</span> . l:comments[a:result <span class="at">-</span> 1]</span> <span><a aria-hidden="true" href="#cb8-17"></a> <span class="ex">endif</span></span> <span><a aria-hidden="true" href="#cb8-18"></a> <span class="ex">endfunction</span></span> <span><a aria-hidden="true" href="#cb8-19"></a></span> <span><a aria-hidden="true" href="#cb8-20"></a> <span class="bu">let</span> <span class="va">winid</span> <span class="op">=</span> <span class="va">popup_menu</span><span class="er">(</span><span class="ex">l:comments,</span> <span class="co">#{</span></span> <span><a aria-hidden="true" href="#cb8-21"></a> <span class="ex">\ callback:</span> <span class="st">'s:meta_insert'</span></span> <span><a aria-hidden="true" href="#cb8-22"></a> <span class="ex">\ }</span><span class="kw">)</span></span> <span><a aria-hidden="true" href="#cb8-23"></a><span class="ex">endfunction</span></span> <span><a aria-hidden="true" href="#cb8-24"></a></span> <span><a aria-hidden="true" href="#cb8-25"></a><span class="bu">command</span> IdeeFolders :py3 idee.folders.folders<span class="er">(</span><span class="ex">vim.eval</span><span class="er">(</span><span class="st">"expand('%')"</span><span class="kw">))</span></span> <span><a aria-hidden="true" href="#cb8-26"></a><span class="bu">command</span> IdeeDisplay :py3 idee.display.display<span class="er">(</span><span class="ex">vim.eval</span><span class="er">(</span><span class="st">"expand('%')"</span><span class="kw">))</span></span> <span><a aria-hidden="true" href="#cb8-27"></a><span class="bu">command</span> IdeeMeta call markdown_idee#meta_comments<span class="er">(</span><span class="kw">)</span></span> <span><a aria-hidden="true" href="#cb8-28"></a><span class="bu">command</span> IdeePublish :py3 idee.publish.publish<span class="er">(</span><span class="ex">vim.eval</span><span class="er">(</span><span class="st">"expand('%')"</span><span class="kw">))</span></span></code></pre> </div> <p> The documentation of vim still refers to the command <code> python </code> , but that doesn't work, it needs to be <code> py3 </code> . The file type plugin imports the python package idee with its modules. </p> <p> The <code> meta_comments() </code> function is also defined in vim-script, defining a nested callback function to insert the selected comment into the current editor buffer at the current cursor position. </p> <p> Oh, yes, the function shows a popup menu with the comments to choose from. </p> <p> After loading the python package and defining the function <code> meta_comments() </code> the file type plugin binds the python functions and the vim-script function to new vim commands. </p> <p> The current markdown file, the file the author is working in when using the commands, is represented by <code> % </code> . Python, called via <code> py3 </code> doesn't know that and wouldn't know what to do with that percent sign. Therefore this placeholder for the current file name needs to be expanded into the file name using the python function <code> vim.eval() </code> provided by the vim developers. </p> <!-- page-break --> <h2> Module initialization </h2> <p> The python modules are imported via the <code> __init__.py </code> of the package. </p> <p> <strong> __init__.py </strong> </p> <div class="sourceCode"> <pre class="sourceCode python"><code class="sourceCode python"><span><a aria-hidden="true" href="#cb9-1"></a><span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb9-2"></a><span class="co">The package 'idee' provides functions to be used via command in vim during the</span></span> <span><a aria-hidden="true" href="#cb9-3"></a><span class="co">editing of markdown files, which are meant to be published as articles on the</span></span> <span><a aria-hidden="true" href="#cb9-4"></a><span class="co">idee-website or the concept-website.</span></span> <span><a aria-hidden="true" href="#cb9-5"></a></span> <span><a aria-hidden="true" href="#cb9-6"></a><span class="co">Functions provided are:</span></span> <span><a aria-hidden="true" href="#cb9-7"></a><span class="co"> idee.folders():</span></span> <span><a aria-hidden="true" href="#cb9-8"></a><span class="co"> - Creates the folder structure for assets</span></span> <span><a aria-hidden="true" href="#cb9-9"></a><span class="co"> - Creates a link to the idee/concept website project</span></span> <span><a aria-hidden="true" href="#cb9-10"></a><span class="co"> idee.display()</span></span> <span><a aria-hidden="true" href="#cb9-11"></a><span class="co"> - Creates idee/concept website style HTML from the markdown file.</span></span> <span><a aria-hidden="true" href="#cb9-12"></a><span class="co"> - Creates a corresponding PDF file, if requested.</span></span> <span><a aria-hidden="true" href="#cb9-13"></a><span class="co"> - Shows the result in Firefox</span></span> <span><a aria-hidden="true" href="#cb9-14"></a><span class="co"> idee.publish()</span></span> <span><a aria-hidden="true" href="#cb9-15"></a><span class="co"> - Does the handover to the idee/concept website project for deployment</span></span> <span><a aria-hidden="true" href="#cb9-16"></a><span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb9-17"></a><span class="im">import</span> os</span> <span><a aria-hidden="true" href="#cb9-18"></a><span class="im">from</span> pathlib <span class="im">import</span> Path</span> <span><a aria-hidden="true" href="#cb9-19"></a><span class="im">import</span> pkg_resources</span> <span><a aria-hidden="true" href="#cb9-20"></a></span> <span><a aria-hidden="true" href="#cb9-21"></a><span class="im">import</span> idee.folders</span> <span><a aria-hidden="true" href="#cb9-22"></a><span class="im">import</span> idee.display</span> <span><a aria-hidden="true" href="#cb9-23"></a><span class="im">import</span> idee.publish</span></code></pre> </div> <!-- page-break --> <h2> IdeeFolders </h2> <p> The module <code> folders </code> creates subfolders below the folder the current markdown file is in. </p> <h3> Function folders() </h3> <p> <strong> folders.py - folders() </strong> </p> <div class="sourceCode"> <pre class="sourceCode python"><code class="sourceCode python"><span><a aria-hidden="true" href="#cb10-1"></a><span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb10-2"></a><span class="co">folders(filepath) creates folders for idee website assets. It creates also a</span></span> <span><a aria-hidden="true" href="#cb10-3"></a><span class="co">symbolic link to the idee project to make existing assets available.</span></span> <span><a aria-hidden="true" href="#cb10-4"></a></span> <span><a aria-hidden="true" href="#cb10-5"></a><span class="co">These folders are:</span></span> <span><a aria-hidden="true" href="#cb10-6"></a><span class="co"> - files - arbitrary linked files</span></span> <span><a aria-hidden="true" href="#cb10-7"></a><span class="co"> - image - image files</span></span> <span><a aria-hidden="true" href="#cb10-8"></a><span class="co"> - js - javascript files for interactive content</span></span> <span><a aria-hidden="true" href="#cb10-9"></a><span class="co"> - pdf - article pdf</span></span> <span><a aria-hidden="true" href="#cb10-10"></a><span class="co"> - qrcode - article qrcode</span></span> <span><a aria-hidden="true" href="#cb10-11"></a><span class="co"> - audio - recording of the article</span></span> <span><a aria-hidden="true" href="#cb10-12"></a></span> <span><a aria-hidden="true" href="#cb10-13"></a><span class="co">For the target project idee a symbolic link is created</span></span> <span><a aria-hidden="true" href="#cb10-14"></a><span class="co"> - ln -s idee</span></span> <span><a aria-hidden="true" href="#cb10-15"></a></span> <span><a aria-hidden="true" href="#cb10-16"></a><span class="co">@date: 2026-02-19</span></span> <span><a aria-hidden="true" href="#cb10-17"></a><span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb10-18"></a></span> <span><a aria-hidden="true" href="#cb10-19"></a><span class="im">import</span> os</span> <span><a aria-hidden="true" href="#cb10-20"></a><span class="im">import</span> string</span> <span><a aria-hidden="true" href="#cb10-21"></a><span class="im">from</span> pathlib <span class="im">import</span> Path</span> <span><a aria-hidden="true" href="#cb10-22"></a></span> <span><a aria-hidden="true" href="#cb10-23"></a><span class="kw">def</span> folders(filepath: string):</span> <span><a aria-hidden="true" href="#cb10-24"></a> <span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb10-25"></a><span class="co"> Make the folder of the given markdown file a workfolder.</span></span> <span><a aria-hidden="true" href="#cb10-26"></a><span class="co"> This is done by creating the subfolders for possible assets.</span></span> <span><a aria-hidden="true" href="#cb10-27"></a></span> <span><a aria-hidden="true" href="#cb10-28"></a><span class="co"> Returns</span></span> <span><a aria-hidden="true" href="#cb10-29"></a><span class="co"> -------</span></span> <span><a aria-hidden="true" href="#cb10-30"></a><span class="co"> Feedback Message via stdout</span></span> <span><a aria-hidden="true" href="#cb10-31"></a><span class="co"> """</span></span> <span><a aria-hidden="true" href="#cb10-32"></a></span> <span><a aria-hidden="true" href="#cb10-33"></a> dlist <span class="op">=</span> [<span class="st">"files"</span>, <span class="st">"image"</span>, <span class="st">"js"</span>, <span class="st">"pdf"</span>, <span class="st">"qrcode"</span>, <span class="st">"audio"</span>]</span> <span><a aria-hidden="true" href="#cb10-34"></a></span> <span><a aria-hidden="true" href="#cb10-35"></a> inpath <span class="op">=</span> Path(filepath)</span> <span><a aria-hidden="true" href="#cb10-36"></a> workdirpath <span class="op">=</span> inpath.parents[<span class="dv">0</span>]</span> <span><a aria-hidden="true" href="#cb10-37"></a> workdirpath <span class="op">=</span> workdirpath.resolve()</span> <span><a aria-hidden="true" href="#cb10-38"></a></span> <span><a aria-hidden="true" href="#cb10-39"></a> <span class="co"># create symbolic link to idee project</span></span> <span><a aria-hidden="true" href="#cb10-40"></a> idee <span class="op">=</span> workdirpath <span class="op">/</span> <span class="st">"idee"</span></span> <span><a aria-hidden="true" href="#cb10-41"></a> <span class="cf">if</span> <span class="kw">not</span> idee.exists():</span> <span><a aria-hidden="true" href="#cb10-42"></a> os.symlink(Path(<span class="st">"/home/frank/projects/idee"</span>).resolve(), idee)</span> <span><a aria-hidden="true" href="#cb10-43"></a></span> <span><a aria-hidden="true" href="#cb10-44"></a> <span class="cf">for</span> d <span class="kw">in</span> dlist:</span> <span><a aria-hidden="true" href="#cb10-45"></a> d <span class="op">=</span> workdirpath <span class="op">/</span> d</span> <span><a aria-hidden="true" href="#cb10-46"></a> <span class="cf">if</span> <span class="kw">not</span> d.exists():</span> <span><a aria-hidden="true" href="#cb10-47"></a> d.mkdir()</span></code></pre> </div> <!-- page-break --> <h2> IdeeDisplay </h2> <p> The module <code> display </code> creates HTML, on request via meta data comment also PDF, and opens the created page in Firefox. It makes sense to look at this module part by part. </p> <p> <strong> display.py - imports and debugging </strong> </p> <div class="sourceCode"> <pre class="sourceCode python"><code class="sourceCode python"><span><a aria-hidden="true" href="#cb11-1"></a><span class="co">#!/user/bin/python3</span></span> <span><a aria-hidden="true" href="#cb11-2"></a><span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb11-3"></a><span class="co">Generate and Display HTML in idee-website style from github flavored</span></span> <span><a aria-hidden="true" href="#cb11-4"></a><span class="co">markdown.</span></span> <span><a aria-hidden="true" href="#cb11-5"></a></span> <span><a aria-hidden="true" href="#cb11-6"></a><span class="co"> To debug the function display(), start this executable python file with</span></span> <span><a aria-hidden="true" href="#cb11-7"></a><span class="co"> the markdown file as parameter.</span></span> <span><a aria-hidden="true" href="#cb11-8"></a></span> <span><a aria-hidden="true" href="#cb11-9"></a><span class="co"> </span><span class="al">TODO</span><span class="co">: Check alternatives to soft deprecated module getopt.</span></span> <span><a aria-hidden="true" href="#cb11-10"></a></span> <span><a aria-hidden="true" href="#cb11-11"></a><span class="co">@date: 2026-02-05</span></span> <span><a aria-hidden="true" href="#cb11-12"></a><span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb11-13"></a></span> <span><a aria-hidden="true" href="#cb11-14"></a><span class="im">import</span> sys</span> <span><a aria-hidden="true" href="#cb11-15"></a><span class="im">import</span> os</span> <span><a aria-hidden="true" href="#cb11-16"></a><span class="im">import</span> subprocess</span> <span><a aria-hidden="true" href="#cb11-17"></a><span class="im">import</span> string</span> <span><a aria-hidden="true" href="#cb11-18"></a><span class="im">from</span> pathlib <span class="im">import</span> Path</span> <span><a aria-hidden="true" href="#cb11-19"></a><span class="im">from</span> datetime <span class="im">import</span> datetime</span> <span><a aria-hidden="true" href="#cb11-20"></a><span class="im">import</span> re</span> <span><a aria-hidden="true" href="#cb11-21"></a><span class="im">import</span> copy</span> <span><a aria-hidden="true" href="#cb11-22"></a><span class="im">import</span> getopt</span> <span><a aria-hidden="true" href="#cb11-23"></a></span> <span><a aria-hidden="true" href="#cb11-24"></a><span class="im">from</span> selenium <span class="im">import</span> webdriver</span> <span><a aria-hidden="true" href="#cb11-25"></a><span class="im">from</span> selenium.webdriver.chrome.options <span class="im">import</span> Options</span> <span><a aria-hidden="true" href="#cb11-26"></a></span> <span><a aria-hidden="true" href="#cb11-27"></a><span class="im">from</span> bs4 <span class="im">import</span> BeautifulSoup</span> <span><a aria-hidden="true" href="#cb11-28"></a><span class="im">from</span> bs4 <span class="im">import</span> Comment</span> <span><a aria-hidden="true" href="#cb11-29"></a><span class="im">from</span> bs4.builder._htmlparser <span class="im">import</span> HTMLParserTreeBuilder</span> <span><a aria-hidden="true" href="#cb11-30"></a><span class="im">from</span> weasyprint <span class="im">import</span> HTML</span> <span><a aria-hidden="true" href="#cb11-31"></a><span class="im">from</span> weasyprint <span class="im">import</span> CSS</span> <span><a aria-hidden="true" href="#cb11-32"></a><span class="im">import</span> qrcode</span> <span><a aria-hidden="true" href="#cb11-33"></a>...</span> <span><a aria-hidden="true" href="#cb11-34"></a><span class="cf">if</span> <span class="va">__name__</span> <span class="op">==</span> <span class="st">"__main__"</span>:</span> <span><a aria-hidden="true" href="#cb11-35"></a></span> <span><a aria-hidden="true" href="#cb11-36"></a> MARKDOWNFILE <span class="op">=</span> <span class="va">None</span></span> <span><a aria-hidden="true" href="#cb11-37"></a></span> <span><a aria-hidden="true" href="#cb11-38"></a> <span class="cf">try</span>:</span> <span><a aria-hidden="true" href="#cb11-39"></a> opts, args <span class="op">=</span> getopt.getopt(sys.argv[<span class="dv">1</span>:], [<span class="st">"o"</span>])</span> <span><a aria-hidden="true" href="#cb11-40"></a></span> <span><a aria-hidden="true" href="#cb11-41"></a> <span class="cf">except</span> getopt.GetoptError:</span> <span><a aria-hidden="true" href="#cb11-42"></a> <span class="bu">print</span>(<span class="st">"No Parameter given"</span>)</span> <span><a aria-hidden="true" href="#cb11-43"></a> sys.exit(<span class="dv">2</span>)</span> <span><a aria-hidden="true" href="#cb11-44"></a></span> <span><a aria-hidden="true" href="#cb11-45"></a> <span class="cf">if</span> <span class="bu">len</span>(args) <span class="op">==</span> <span class="dv">0</span>:</span> <span><a aria-hidden="true" href="#cb11-46"></a> <span class="bu">print</span>(<span class="st">"No Parameter given"</span>)</span> <span><a aria-hidden="true" href="#cb11-47"></a> sys.exit(<span class="dv">2</span>)</span> <span><a aria-hidden="true" href="#cb11-48"></a></span> <span><a aria-hidden="true" href="#cb11-49"></a> MARKDOWNFILE <span class="op">=</span> args[<span class="dv">0</span>]</span> <span><a aria-hidden="true" href="#cb11-50"></a> <span class="cf">if</span> <span class="kw">not</span> MARKDOWNFILE:</span> <span><a aria-hidden="true" href="#cb11-51"></a> <span class="bu">print</span>(<span class="st">"No Parameter given"</span>)</span> <span><a aria-hidden="true" href="#cb11-52"></a> sys.exit(<span class="dv">2</span>)</span> <span><a aria-hidden="true" href="#cb11-53"></a></span> <span><a aria-hidden="true" href="#cb11-54"></a> display(Path(MARKDOWNFILE))</span> <span><a aria-hidden="true" href="#cb11-55"></a></span> <span><a aria-hidden="true" href="#cb11-56"></a> sys.exit(<span class="dv">0</span>)</span></code></pre> </div> <p> The code shows the imports and the <code> __main__ </code> part used to call the display function from terminal for e.g. debug purposes. The ellipsis shown in the middle of this code snipped stands for all the function definitions in between, which in the following will be shown one by one. </p> <h3> Function display() </h3> <p> The display function uses Pandoc to create the initial HTML from the markdown file. Afterwards the module <code> BeautifulSoup </code> is used to adjust the HTML to the style used for the idee/concept portal. </p> <p> <strong> display.py - display() - part 1 </strong> </p> <div class="sourceCode"> <pre class="sourceCode python"><code class="sourceCode python"><span><a aria-hidden="true" href="#cb12-1"></a><span class="kw">def</span> display(filepath: string):</span> <span><a aria-hidden="true" href="#cb12-2"></a> <span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb12-3"></a><span class="co"> Creates an html web page frm the markdown file.</span></span> <span><a aria-hidden="true" href="#cb12-4"></a></span> <span><a aria-hidden="true" href="#cb12-5"></a><span class="co"> The html web page is fully flavored for that site,</span></span> <span><a aria-hidden="true" href="#cb12-6"></a><span class="co"> including the optional linked in pdf and audio versions</span></span> <span><a aria-hidden="true" href="#cb12-7"></a><span class="co"> and content licence information.</span></span> <span><a aria-hidden="true" href="#cb12-8"></a></span> <span><a aria-hidden="true" href="#cb12-9"></a><span class="co"> Returns</span></span> <span><a aria-hidden="true" href="#cb12-10"></a><span class="co"> -------</span></span> <span><a aria-hidden="true" href="#cb12-11"></a><span class="co"> None.</span></span> <span><a aria-hidden="true" href="#cb12-12"></a><span class="co"> """</span></span> <span><a aria-hidden="true" href="#cb12-13"></a></span> <span><a aria-hidden="true" href="#cb12-14"></a> mdpath <span class="op">=</span> Path(filepath)</span> <span><a aria-hidden="true" href="#cb12-15"></a> workdirpath <span class="op">=</span> mdpath.parents[<span class="dv">0</span>]</span> <span><a aria-hidden="true" href="#cb12-16"></a></span> <span><a aria-hidden="true" href="#cb12-17"></a> <span class="co"># check that this a designated workdirectory</span></span> <span><a aria-hidden="true" href="#cb12-18"></a> idee <span class="op">=</span> workdirpath <span class="op">/</span> <span class="st">"idee"</span></span> <span><a aria-hidden="true" href="#cb12-19"></a> <span class="cf">if</span> <span class="kw">not</span> idee.exists():</span> <span><a aria-hidden="true" href="#cb12-20"></a> <span class="bu">print</span>(<span class="st">"Use first IdeeFolders to designate this location as"</span></span> <span><a aria-hidden="true" href="#cb12-21"></a> <span class="st">" workdirectory, or use the PreVim-Plugin instead."</span></span> <span><a aria-hidden="true" href="#cb12-22"></a> )</span> <span><a aria-hidden="true" href="#cb12-23"></a> <span class="cf">return</span></span> <span><a aria-hidden="true" href="#cb12-24"></a></span> <span><a aria-hidden="true" href="#cb12-25"></a> htmlpath <span class="op">=</span> workdirpath <span class="op">/</span> mdpath.stem</span> <span><a aria-hidden="true" href="#cb12-26"></a> htmlpath <span class="op">=</span> htmlpath.with_suffix(<span class="st">".html"</span>)</span> <span><a aria-hidden="true" href="#cb12-27"></a></span> <span><a aria-hidden="true" href="#cb12-28"></a> <span class="co"># To enable --toc, the parameter -s (standalone) needs to be set.</span></span> <span><a aria-hidden="true" href="#cb12-29"></a> <span class="co"># This parameter leads to the generation of an html header with</span></span> <span><a aria-hidden="true" href="#cb12-30"></a> <span class="co"># some meta tags.</span></span> <span><a aria-hidden="true" href="#cb12-31"></a></span> <span><a aria-hidden="true" href="#cb12-32"></a> <span class="co"># The TOC is created as &lt;nav id="TOC"&gt; tag,</span></span> <span><a aria-hidden="true" href="#cb12-33"></a></span> <span><a aria-hidden="true" href="#cb12-34"></a> <span class="co"># Own meta data lines need be injected and</span></span> <span><a aria-hidden="true" href="#cb12-35"></a> <span class="co"># the toc needs to be moved to the correct location if specified,</span></span> <span><a aria-hidden="true" href="#cb12-36"></a> <span class="co"># or removed, if specified.</span></span> <span><a aria-hidden="true" href="#cb12-37"></a> htmltext <span class="op">=</span> subprocess.run( [ <span class="st">"pandoc"</span>, <span class="st">"-s"</span>,</span> <span><a aria-hidden="true" href="#cb12-38"></a> <span class="co"># create table of content</span></span> <span><a aria-hidden="true" href="#cb12-39"></a> <span class="st">"--toc"</span>, <span class="st">"--toc-depth=5"</span>,</span> <span><a aria-hidden="true" href="#cb12-40"></a> <span class="co"># use MathJax</span></span> <span><a aria-hidden="true" href="#cb12-41"></a> <span class="st">"--mathjax"</span> <span class="op">+</span></span> <span><a aria-hidden="true" href="#cb12-42"></a> <span class="st">"=./idee/website/MathJax/tex-chtml.js"</span>,</span> <span><a aria-hidden="true" href="#cb12-43"></a> <span class="co"># github flavor markdown</span></span> <span><a aria-hidden="true" href="#cb12-44"></a> <span class="st">"-f"</span>, <span class="st">"gfm"</span>,</span> <span><a aria-hidden="true" href="#cb12-45"></a> <span class="co"># html as output format</span></span> <span><a aria-hidden="true" href="#cb12-46"></a> <span class="st">"-t"</span>, <span class="st">"html"</span>,</span> <span><a aria-hidden="true" href="#cb12-47"></a> <span class="co"># input file</span></span> <span><a aria-hidden="true" href="#cb12-48"></a> <span class="st">"-i"</span>, mdpath</span> <span><a aria-hidden="true" href="#cb12-49"></a> <span class="co"># don't use stdout, return the result</span></span> <span><a aria-hidden="true" href="#cb12-50"></a> ],</span> <span><a aria-hidden="true" href="#cb12-51"></a> capture_output<span class="op">=</span><span class="va">True</span>,</span> <span><a aria-hidden="true" href="#cb12-52"></a> check<span class="op">=</span><span class="va">True</span>)</span></code></pre> </div> <p> The function first checks the existence of the symbolic link to the idee website project. If that link does not exist, then obviously the current directory had not been prepared to serve as authoring directory for an idee/concept article. </p> <!-- page-break --> <p> Pandoc is called to convert from GitLab flavored markdown to HTML and is told to process also MathTex statements like: </p> <p> From "The Cosmological Constant as Event Horizon" <sup> ( 1 ) </sup> </p> <p> <span class="math display"> \[\begin{equation} \nabla_{\mu}g^{\mu} = \frac{d\mathsf{\Theta}}{d\tau} + \frac{1}{3}\mathsf{\Theta}^{2} = R_{\mu\nu}u^{\mu}u^{\nu} = \mathsf{\Lambda} - 4\pi G(\rho + 3p) \end{equation}\] </span> </p> <p> The HTML-comments in the markdown are preserved during conversion to HTML. With this meta information and with the target HTML structure in mind, the returned HTML is processed further. </p> <p> <strong> display.py - display() - part 2 </strong> </p> <div class="sourceCode"> <pre class="sourceCode python"><code class="sourceCode python"><span><a aria-hidden="true" href="#cb13-1"></a> html_doc <span class="op">=</span> htmltext.stdout.decode(<span class="st">"utf-8"</span>)</span> <span><a aria-hidden="true" href="#cb13-2"></a></span> <span><a aria-hidden="true" href="#cb13-3"></a> <span class="co"># do some things the soup does not want to do</span></span> <span><a aria-hidden="true" href="#cb13-4"></a> html_doc <span class="op">=</span> html_doc.replace(<span class="st">"&lt;body&gt;"</span>,</span> <span><a aria-hidden="true" href="#cb13-5"></a> <span class="st">"&lt;body&gt;&lt;main&gt;&lt;article&gt;&lt;header&gt;&lt;/header&gt;"</span>)</span> <span><a aria-hidden="true" href="#cb13-6"></a> html_doc <span class="op">=</span> html_doc.replace(<span class="st">"&lt;/body&gt;"</span>, <span class="st">"&lt;/article&gt;&lt;/main&gt;&lt;/body&gt;"</span>)</span> <span><a aria-hidden="true" href="#cb13-7"></a></span> <span><a aria-hidden="true" href="#cb13-8"></a> builder <span class="op">=</span> HTMLParserTreeBuilder()</span> <span><a aria-hidden="true" href="#cb13-9"></a> soup <span class="op">=</span> BeautifulSoup(html_doc, builder<span class="op">=</span>builder)</span> <span><a aria-hidden="true" href="#cb13-10"></a></span> <span><a aria-hidden="true" href="#cb13-11"></a> <span class="co"># remove script added by pandoc</span></span> <span><a aria-hidden="true" href="#cb13-12"></a> scripts <span class="op">=</span> soup.find_all(<span class="st">"script"</span>)</span> <span><a aria-hidden="true" href="#cb13-13"></a> <span class="cf">for</span> script <span class="kw">in</span> scripts:</span> <span><a aria-hidden="true" href="#cb13-14"></a> script.decompose()</span> <span><a aria-hidden="true" href="#cb13-15"></a></span> <span><a aria-hidden="true" href="#cb13-16"></a> style(soup)</span> <span><a aria-hidden="true" href="#cb13-17"></a> urnmeta(soup, mdpath.stem)</span> <span><a aria-hidden="true" href="#cb13-18"></a> images(soup, workdirpath)</span> <span><a aria-hidden="true" href="#cb13-19"></a> language(soup)</span> <span><a aria-hidden="true" href="#cb13-20"></a> title(soup)</span> <span><a aria-hidden="true" href="#cb13-21"></a> dates(soup, workdirpath, mdpath.stem)</span> <span><a aria-hidden="true" href="#cb13-22"></a> tables(soup)</span> <span><a aria-hidden="true" href="#cb13-23"></a> article_author(soup)</span> <span><a aria-hidden="true" href="#cb13-24"></a> article_qrcode(soup, workdirpath, mdpath.stem)</span> <span><a aria-hidden="true" href="#cb13-25"></a> article_licence(soup)</span> <span><a aria-hidden="true" href="#cb13-26"></a> article_audio(soup, workdirpath, mdpath.stem)</span> <span><a aria-hidden="true" href="#cb13-27"></a> movetoc(soup)</span> <span><a aria-hidden="true" href="#cb13-28"></a> footnotes(soup)</span> <span><a aria-hidden="true" href="#cb13-29"></a> <span class="co"># keep this as last step</span></span> <span><a aria-hidden="true" href="#cb13-30"></a> article_pdf(soup, workdirpath, mdpath.stem, htmlpath)</span> <span><a aria-hidden="true" href="#cb13-31"></a></span> <span><a aria-hidden="true" href="#cb13-32"></a> html_doc <span class="op">=</span> soup.prettify()</span> <span><a aria-hidden="true" href="#cb13-33"></a></span> <span><a aria-hidden="true" href="#cb13-34"></a> <span class="cf">with</span> <span class="bu">open</span>(htmlpath, <span class="st">'w'</span>, encoding<span class="op">=</span><span class="st">'utf-8'</span>) <span class="im">as</span> outfile:</span> <span><a aria-hidden="true" href="#cb13-35"></a> <span class="bu">print</span>(html_doc, <span class="bu">file</span><span class="op">=</span>outfile)</span> <span><a aria-hidden="true" href="#cb13-36"></a> outfile.flush()</span> <span><a aria-hidden="true" href="#cb13-37"></a> outfile.close()</span> <span><a aria-hidden="true" href="#cb13-38"></a></span> <span><a aria-hidden="true" href="#cb13-39"></a> <span class="bu">print</span>(<span class="ss">f'wrote file </span><span class="sc">{</span>htmlpath<span class="sc">}</span><span class="ss">'</span>)</span> <span><a aria-hidden="true" href="#cb13-40"></a></span> <span><a aria-hidden="true" href="#cb13-41"></a> subprocess.run([<span class="st">"firefox"</span>, htmlpath], capture_output<span class="op">=</span><span class="va">False</span>, check<span class="op">=</span><span class="va">True</span>)</span></code></pre> </div> <p> That's it. First I sneak in the target HTML structure, having a main tag containing the article tag, which contains a header as first element to place some header information about the article. </p> <p> Afterwards I create a BeautifulSoup and then remove the scripts placed by Pandoc, since I want to have HTML free of scripts as long as its not absolutely required. </p> <p> Then I have a number of functions running on that soup to add and modify some details based on the meta data and the assets found, into which we will look next in detail. </p> <!-- page-break --> <h3> Function style() </h3> <p> The function style() adds everything to the soup, which I consider to be part of the portals style. </p> <p> <strong> display.py - style() </strong> </p> <div class="sourceCode"> <pre class="sourceCode python"><code class="sourceCode python"><span><a aria-hidden="true" href="#cb14-1"></a><span class="kw">def</span> style(soup):</span> <span><a aria-hidden="true" href="#cb14-2"></a> <span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb14-3"></a><span class="co"> Show some personal style.</span></span> <span><a aria-hidden="true" href="#cb14-4"></a><span class="co"> """</span></span> <span><a aria-hidden="true" href="#cb14-5"></a> <span class="co"># inject the generator meta information.</span></span> <span><a aria-hidden="true" href="#cb14-6"></a> <span class="co"># one exists already</span></span> <span><a aria-hidden="true" href="#cb14-7"></a> tag <span class="op">=</span> soup.find(<span class="st">"meta"</span>, attrs<span class="op">=</span>{<span class="st">"name"</span>: <span class="st">"generator"</span>})</span> <span><a aria-hidden="true" href="#cb14-8"></a> tag.attrs.update({<span class="st">"name"</span>: <span class="st">"generator"</span>, <span class="st">"content"</span>: <span class="st">"vim, pandoc, idee"</span>})</span> <span><a aria-hidden="true" href="#cb14-9"></a></span> <span><a aria-hidden="true" href="#cb14-10"></a> head <span class="op">=</span> soup.find(<span class="st">"head"</span>)</span> <span><a aria-hidden="true" href="#cb14-11"></a> <span class="co"># inject ghost message from Terry Prachett</span></span> <span><a aria-hidden="true" href="#cb14-12"></a> <span class="co"># http://www.gnuterrypratchett.com/</span></span> <span><a aria-hidden="true" href="#cb14-13"></a> meta <span class="op">=</span> soup.new_tag(<span class="st">"meta"</span>)</span> <span><a aria-hidden="true" href="#cb14-14"></a> meta.attrs.update({<span class="st">"http-equiv"</span>: <span class="st">"X-Clacks-Overhead"</span>,</span> <span><a aria-hidden="true" href="#cb14-15"></a> <span class="st">"content"</span>: <span class="st">"GNU Terry Pratchett"</span>})</span> <span><a aria-hidden="true" href="#cb14-16"></a> head.insert(<span class="dv">6</span>, meta)</span> <span><a aria-hidden="true" href="#cb14-17"></a></span> <span><a aria-hidden="true" href="#cb14-18"></a> <span class="co"># pandoc creates a style tag. if --mathml option is set</span></span> <span><a aria-hidden="true" href="#cb14-19"></a> tag <span class="op">=</span> soup.find(<span class="st">"style"</span>)</span> <span><a aria-hidden="true" href="#cb14-20"></a> <span class="cf">if</span> tag:</span> <span><a aria-hidden="true" href="#cb14-21"></a> tag.decompose()</span> <span><a aria-hidden="true" href="#cb14-22"></a> <span class="co"># my own style</span></span> <span><a aria-hidden="true" href="#cb14-23"></a> <span class="co"># inject stylesheet link &lt;link rel="stylesheet"</span></span> <span><a aria-hidden="true" href="#cb14-24"></a> <span class="co"># href="../website/css/fs.css"/&gt;</span></span> <span><a aria-hidden="true" href="#cb14-25"></a> link <span class="op">=</span> soup.new_tag(<span class="st">"link"</span>)</span> <span><a aria-hidden="true" href="#cb14-26"></a> link.attrs.update({<span class="st">"rel"</span>: <span class="st">"stylesheet"</span>, <span class="st">"href"</span>: <span class="st">"./idee/website/css/fs.css"</span>})</span> <span><a aria-hidden="true" href="#cb14-27"></a> head.insert(<span class="dv">7</span>, link)</span> <span><a aria-hidden="true" href="#cb14-28"></a></span> <span><a aria-hidden="true" href="#cb14-29"></a> <span class="co"># pandoc creates &lt;div class="note"&gt;&lt;div class="title"&gt;...&lt;/div&gt;&lt;/div&gt;</span></span> <span><a aria-hidden="true" href="#cb14-30"></a> <span class="co"># for &gt; [!nete] and similar for &gt; [!important]. The titles have to go.</span></span> <span><a aria-hidden="true" href="#cb14-31"></a> tags <span class="op">=</span> soup.find_all(<span class="st">"div"</span>, class_<span class="op">=</span><span class="st">"title"</span>)</span> <span><a aria-hidden="true" href="#cb14-32"></a> <span class="cf">for</span> tag <span class="kw">in</span> tags:</span> <span><a aria-hidden="true" href="#cb14-33"></a> tag.decompose()</span> <span><a aria-hidden="true" href="#cb14-34"></a></span> <span><a aria-hidden="true" href="#cb14-35"></a> <span class="co"># favicon</span></span> <span><a aria-hidden="true" href="#cb14-36"></a> <span class="co"># href="../image/favicon.ico" rel="icon" type="image/x-icon"</span></span> <span><a aria-hidden="true" href="#cb14-37"></a> link <span class="op">=</span> soup.new_tag(<span class="st">"link"</span>)</span> <span><a aria-hidden="true" href="#cb14-38"></a> link.attrs.update({<span class="st">"rel"</span>: <span class="st">"icon"</span>, <span class="st">"href"</span>: <span class="st">"./idee/website/image/favicon.ico"</span>,</span> <span><a aria-hidden="true" href="#cb14-39"></a> <span class="st">"type"</span>: <span class="st">"image/x-icon"</span></span> <span><a aria-hidden="true" href="#cb14-40"></a> })</span> <span><a aria-hidden="true" href="#cb14-41"></a> head.insert(<span class="dv">8</span>, link)</span> <span><a aria-hidden="true" href="#cb14-42"></a></span> <span><a aria-hidden="true" href="#cb14-43"></a> math <span class="op">=</span> soup.find(<span class="st">"span"</span>, class_<span class="op">=</span><span class="st">"math"</span>)</span> <span><a aria-hidden="true" href="#cb14-44"></a> <span class="cf">if</span> math:</span> <span><a aria-hidden="true" href="#cb14-45"></a> <span class="co"># behind title tag, before MathJax loading</span></span> <span><a aria-hidden="true" href="#cb14-46"></a> script <span class="op">=</span> soup.new_tag(<span class="st">"script"</span>)</span> <span><a aria-hidden="true" href="#cb14-47"></a> script.attrs.update({<span class="st">"type"</span>: <span class="st">"text/javascript"</span>})</span> <span><a aria-hidden="true" href="#cb14-48"></a> script.append(<span class="st">"window.MathJax={tex: {tags: 'span'</span><span class="sc">}}</span><span class="st">;"</span>) <span class="co"># instead of all</span></span> <span><a aria-hidden="true" href="#cb14-49"></a> head.insert(<span class="dv">9</span>, script)</span> <span><a aria-hidden="true" href="#cb14-50"></a></span> <span><a aria-hidden="true" href="#cb14-51"></a> script <span class="op">=</span> soup.new_tag(<span class="st">"script"</span>)</span> <span><a aria-hidden="true" href="#cb14-52"></a> script.attrs.update({<span class="st">"type"</span>: <span class="st">"text/javascript"</span>, </span> <span><a aria-hidden="true" href="#cb14-53"></a> <span class="st">"src"</span>: <span class="st">"./idee/website/MathJax/tex-chtml.js"</span>})</span> <span><a aria-hidden="true" href="#cb14-54"></a> head.insert(<span class="dv">9</span>, script)</span></code></pre> </div> <p> Part of style is the honoring the software used to create the web-page in the generator meta data. I used vim, Pandoc and now my own software, named like my web-site and vim-plugin simply 'idee'. </p> <p> Terry Prattchet gets honored in the X-Clacks-Overhead <sup> ( 2 ) </sup> , which is not very effectful, but my web-server honors him as well in the HTTP Header, which makes the X-Clacks Browser-Extension signal his presence in the overhead. </p> <p> Not less important is the link to the CSS-stylesheet, the web-sites favicon icon, and, only in case span tags of type 'math' are found, the script for MathJax. </p> <h3> Function urnmeta() </h3> <p> In my previous implementation the title used in mediawiki became the file name of the mediawiki file, and I had to compute an URN, an unique resource name, to be used as HTML file name. </p> <p> That meant very long file names and, well, the mediawiki file names even had spaces. That mess is over now, at least for new articles. </p> <p> <strong> display.py - urnmeta() </strong> </p> <div class="sourceCode"> <pre class="sourceCode python"><code class="sourceCode python"><span><a aria-hidden="true" href="#cb15-1"></a><span class="kw">def</span> urnmeta(soup, urn):</span> <span><a aria-hidden="true" href="#cb15-2"></a> <span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb15-3"></a><span class="co"> The name of the markdown file becomes urn for the web.</span></span> <span><a aria-hidden="true" href="#cb15-4"></a><span class="co"> Filenames are all lowercase with '-' instead of spaces,</span></span> <span><a aria-hidden="true" href="#cb15-5"></a><span class="co"> and roman letters only (no german umlaute).</span></span> <span><a aria-hidden="true" href="#cb15-6"></a></span> <span><a aria-hidden="true" href="#cb15-7"></a><span class="co"> I got tired of mixed case filenames with spaces anyhow during</span></span> <span><a aria-hidden="true" href="#cb15-8"></a><span class="co"> my use of my mediawiki extract tool.</span></span> <span><a aria-hidden="true" href="#cb15-9"></a><span class="co"> """</span></span> <span><a aria-hidden="true" href="#cb15-10"></a> <span class="co"># meta data</span></span> <span><a aria-hidden="true" href="#cb15-11"></a> head <span class="op">=</span> soup.find(<span class="st">"head"</span>)</span> <span><a aria-hidden="true" href="#cb15-12"></a> meta <span class="op">=</span> soup.new_tag(<span class="st">"meta"</span>)</span> <span><a aria-hidden="true" href="#cb15-13"></a> meta.attrs.update({</span> <span><a aria-hidden="true" href="#cb15-14"></a> <span class="st">"property"</span>: <span class="st">"article:urn"</span>,</span> <span><a aria-hidden="true" href="#cb15-15"></a> <span class="st">"content"</span>: urn})</span> <span><a aria-hidden="true" href="#cb15-16"></a> head.insert(<span class="dv">6</span>, meta)</span></code></pre> </div> <p> As URN parameter <code> mdpath.stem </code> is used, that forces me to take care while I name my markdown file, that I adhere to the self set rules. </p> <p> Yes, I should probably add a check for that, and probably I will. And probably I should retire that meta data entry, since the information exists first in the file name and later in the URL. </p> <!-- page-break --> <h3> Function images() </h3> <p> This function looks at all img tags and, if no relative path is given to the image, looks in which of the two possible image folders the image can be found. </p> <p> That means, that I can use the image file name in the markdown without any path. </p> <p> Images used in the header, especially the qrcode image, are in different folders. Therefore this function has to be called early and not be moved to later process stages. </p> <p> <strong> display.py - images() </strong> </p> <div class="sourceCode"> <pre class="sourceCode python"><code class="sourceCode python"><span><a aria-hidden="true" href="#cb16-1"></a><span class="kw">def</span> images(soup, workpath):</span> <span><a aria-hidden="true" href="#cb16-2"></a> <span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb16-3"></a><span class="co"> Run early, before we place pictures in the header.</span></span> <span><a aria-hidden="true" href="#cb16-4"></a></span> <span><a aria-hidden="true" href="#cb16-5"></a><span class="co"> Inspect image source path. Keep realtive paths alone,</span></span> <span><a aria-hidden="true" href="#cb16-6"></a><span class="co"> lookup image location if only the image filename is given.</span></span> <span><a aria-hidden="true" href="#cb16-7"></a></span> <span><a aria-hidden="true" href="#cb16-8"></a><span class="co"> PARAMETER</span></span> <span><a aria-hidden="true" href="#cb16-9"></a><span class="co"> soup:</span></span> <span><a aria-hidden="true" href="#cb16-10"></a><span class="co"> The html soup</span></span> <span><a aria-hidden="true" href="#cb16-11"></a><span class="co"> workpath: </span></span> <span><a aria-hidden="true" href="#cb16-12"></a><span class="co"> The location of the md-file</span></span> <span><a aria-hidden="true" href="#cb16-13"></a><span class="co"> """</span></span> <span><a aria-hidden="true" href="#cb16-14"></a> imgs <span class="op">=</span> soup.find_all(<span class="st">"img"</span>)</span> <span><a aria-hidden="true" href="#cb16-15"></a> <span class="cf">for</span> image <span class="kw">in</span> imgs:</span> <span><a aria-hidden="true" href="#cb16-16"></a> src <span class="op">=</span> image.attrs.get(<span class="st">"src"</span>)</span> <span><a aria-hidden="true" href="#cb16-17"></a> <span class="cf">if</span> src <span class="kw">and</span> <span class="st">"./"</span> <span class="kw">not</span> <span class="kw">in</span> src:</span> <span><a aria-hidden="true" href="#cb16-18"></a> imagepath <span class="op">=</span> <span class="va">None</span></span> <span><a aria-hidden="true" href="#cb16-19"></a> imgp <span class="op">=</span> <span class="va">None</span></span> <span><a aria-hidden="true" href="#cb16-20"></a> <span class="cf">for</span> imgp <span class="kw">in</span> [<span class="st">"image"</span>, <span class="st">"idee/website/image"</span>]:</span> <span><a aria-hidden="true" href="#cb16-21"></a> imgp <span class="op">=</span> workpath <span class="op">/</span> imgp <span class="op">/</span> src</span> <span><a aria-hidden="true" href="#cb16-22"></a> <span class="cf">if</span> imgp.exists():</span> <span><a aria-hidden="true" href="#cb16-23"></a> imagepath <span class="op">=</span> imgp</span> <span><a aria-hidden="true" href="#cb16-24"></a> <span class="cf">break</span></span> <span><a aria-hidden="true" href="#cb16-25"></a></span> <span><a aria-hidden="true" href="#cb16-26"></a> <span class="cf">if</span> imagepath:</span> <span><a aria-hidden="true" href="#cb16-27"></a> src <span class="op">=</span> <span class="ss">f"./</span><span class="sc">{</span>os<span class="sc">.</span>path<span class="sc">.</span>relpath(imagepath, workpath)<span class="sc">}</span><span class="ss">"</span></span> <span><a aria-hidden="true" href="#cb16-28"></a> image.attrs.update({<span class="st">"src"</span>: src})</span> <span><a aria-hidden="true" href="#cb16-29"></a></span> <span><a aria-hidden="true" href="#cb16-30"></a> <span class="cf">if</span> src:</span> <span><a aria-hidden="true" href="#cb16-31"></a> <span class="co"># make image anchors with target _blank to open in separate tab</span></span> <span><a aria-hidden="true" href="#cb16-32"></a> anchor <span class="op">=</span> soup.new_tag(<span class="st">"a"</span>)</span> <span><a aria-hidden="true" href="#cb16-33"></a> anchor.attrs.update({<span class="st">"href"</span>: src, <span class="st">"target"</span>: <span class="st">"_blank"</span>})</span> <span><a aria-hidden="true" href="#cb16-34"></a> image.insert_after(anchor)</span> <span><a aria-hidden="true" href="#cb16-35"></a> anchor.append(image)</span></code></pre> </div> <h3> Function iscomment() </h3> <p> Time to introduce a very small helper function to find all comments we placed in the markdown file. </p> <p> <strong> display.py - iscomment() </strong> </p> <div class="sourceCode"> <pre class="sourceCode python"><code class="sourceCode python"><span><a aria-hidden="true" href="#cb17-1"></a><span class="kw">def</span> iscomment(elem):</span> <span><a aria-hidden="true" href="#cb17-2"></a> <span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb17-3"></a><span class="co"> Helper function to search for comments</span></span> <span><a aria-hidden="true" href="#cb17-4"></a></span> <span><a aria-hidden="true" href="#cb17-5"></a><span class="co"> source:</span></span> <span><a aria-hidden="true" href="#cb17-6"></a><span class="co"> https://www.tutorialspoint.com/beautiful_soup/beautiful_soup_find_all_comments.htm</span></span> <span><a aria-hidden="true" href="#cb17-7"></a><span class="co"> """</span></span> <span><a aria-hidden="true" href="#cb17-8"></a> <span class="cf">return</span> <span class="bu">isinstance</span>(elem, Comment)</span></code></pre> </div> <h3> Function language() </h3> <p> The language function maintains all language specific aspects. That's for one the <code> xml-lang </code> information in the HTML tag and the language meta tag, but it's also the HTML comment with the information, whether the idee web-site portal or the concept web-site portal has to be included for the SSI header injection done be the web-server. </p> <p> If no language information was provided in the markdown file, it defaults to German. </p> <p> <strong> display.py - language() </strong> </p> <div class="sourceCode"> <pre class="sourceCode python"><code class="sourceCode python"><span><a aria-hidden="true" href="#cb18-1"></a><span class="kw">def</span> language(soup):</span> <span><a aria-hidden="true" href="#cb18-2"></a> <span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb18-3"></a><span class="co"> Check for &lt;!-- en-US --&gt; language information comment.</span></span> <span><a aria-hidden="true" href="#cb18-4"></a><span class="co"> If not, assume de-DE.</span></span> <span><a aria-hidden="true" href="#cb18-5"></a></span> <span><a aria-hidden="true" href="#cb18-6"></a><span class="co"> Populate the language attributes at the html tag and insert</span></span> <span><a aria-hidden="true" href="#cb18-7"></a><span class="co"> the include comment for the portal.</span></span> <span><a aria-hidden="true" href="#cb18-8"></a><span class="co"> """</span></span> <span><a aria-hidden="true" href="#cb18-9"></a></span> <span><a aria-hidden="true" href="#cb18-10"></a> <span class="co"># look for a language comment</span></span> <span><a aria-hidden="true" href="#cb18-11"></a> comments <span class="op">=</span> soup.find_all(string<span class="op">=</span>iscomment)</span> <span><a aria-hidden="true" href="#cb18-12"></a> c <span class="op">=</span> <span class="va">None</span></span> <span><a aria-hidden="true" href="#cb18-13"></a> <span class="cf">for</span> comment <span class="kw">in</span> comments:</span> <span><a aria-hidden="true" href="#cb18-14"></a> <span class="cf">if</span> comment <span class="kw">in</span> <span class="st">' en-US '</span>:</span> <span><a aria-hidden="true" href="#cb18-15"></a> c <span class="op">=</span> comment</span> <span><a aria-hidden="true" href="#cb18-16"></a> <span class="cf">break</span></span> <span><a aria-hidden="true" href="#cb18-17"></a> <span class="cf">if</span> c:</span> <span><a aria-hidden="true" href="#cb18-18"></a> lang <span class="op">=</span> <span class="bu">str</span>(c).strip()</span> <span><a aria-hidden="true" href="#cb18-19"></a> c.decompose()</span> <span><a aria-hidden="true" href="#cb18-20"></a> <span class="cf">else</span>:</span> <span><a aria-hidden="true" href="#cb18-21"></a> lang <span class="op">=</span> <span class="st">"de-DE"</span></span> <span><a aria-hidden="true" href="#cb18-22"></a></span> <span><a aria-hidden="true" href="#cb18-23"></a> <span class="co"># SSI header injection is a function of the language</span></span> <span><a aria-hidden="true" href="#cb18-24"></a> body <span class="op">=</span> soup.find(<span class="st">"body"</span>)</span> <span><a aria-hidden="true" href="#cb18-25"></a> <span class="cf">if</span> lang.startswith(<span class="st">"de"</span>):</span> <span><a aria-hidden="true" href="#cb18-26"></a> c <span class="op">=</span> Comment(<span class="st">'# include file="/portal/idee-header.html" '</span>)</span> <span><a aria-hidden="true" href="#cb18-27"></a> <span class="cf">else</span>:</span> <span><a aria-hidden="true" href="#cb18-28"></a> c <span class="op">=</span> Comment(<span class="st">'# include file="/portal/concept-header.html" '</span>)</span> <span><a aria-hidden="true" href="#cb18-29"></a></span> <span><a aria-hidden="true" href="#cb18-30"></a> body.insert(<span class="dv">0</span>, c)</span> <span><a aria-hidden="true" href="#cb18-31"></a></span> <span><a aria-hidden="true" href="#cb18-32"></a> <span class="co"># inject language informationi</span></span> <span><a aria-hidden="true" href="#cb18-33"></a> html <span class="op">=</span> soup.find(<span class="st">"html"</span>)</span> <span><a aria-hidden="true" href="#cb18-34"></a> html.attrs.update({<span class="st">"lang"</span>: lang})</span> <span><a aria-hidden="true" href="#cb18-35"></a> html.attrs.update({<span class="st">"xml:lang"</span>: lang})</span> <span><a aria-hidden="true" href="#cb18-36"></a></span> <span><a aria-hidden="true" href="#cb18-37"></a> <span class="co"># and 2 lines meta information</span></span> <span><a aria-hidden="true" href="#cb18-38"></a> <span class="co"># &lt;meta content="de-DE" property="og:locale"/&gt;</span></span> <span><a aria-hidden="true" href="#cb18-39"></a> <span class="co"># &lt;meta content="Idee" property="og:site_name"/&gt;</span></span> <span><a aria-hidden="true" href="#cb18-40"></a> head <span class="op">=</span> soup.find(<span class="st">"head"</span>)</span> <span><a aria-hidden="true" href="#cb18-41"></a></span> <span><a aria-hidden="true" href="#cb18-42"></a> meta <span class="op">=</span> soup.new_tag(<span class="st">"meta"</span>)</span> <span><a aria-hidden="true" href="#cb18-43"></a> meta.attrs.update({</span> <span><a aria-hidden="true" href="#cb18-44"></a> <span class="st">"property"</span>: <span class="st">"og:locale"</span>,</span> <span><a aria-hidden="true" href="#cb18-45"></a> <span class="st">"content"</span>: lang})</span> <span><a aria-hidden="true" href="#cb18-46"></a> head.insert(<span class="dv">6</span>, meta)</span> <span><a aria-hidden="true" href="#cb18-47"></a></span> <span><a aria-hidden="true" href="#cb18-48"></a> meta <span class="op">=</span> soup.new_tag(<span class="st">"meta"</span>)</span> <span><a aria-hidden="true" href="#cb18-49"></a> meta.attrs.update({</span> <span><a aria-hidden="true" href="#cb18-50"></a> <span class="st">"property"</span>: <span class="st">"og:site_name"</span>,</span> <span><a aria-hidden="true" href="#cb18-51"></a> <span class="st">"content"</span>: <span class="st">"idee"</span> <span class="cf">if</span> lang.startswith(<span class="st">"de"</span>) <span class="cf">else</span> <span class="st">"concept"</span>})</span> <span><a aria-hidden="true" href="#cb18-52"></a> head.insert(<span class="dv">6</span>, meta)</span></code></pre> </div> <!-- page-break --> <h3> Function title() </h3> <p> The title is also inserted in multiple locations, in a title tag in the header and in a meta tag in the HTML head. </p> <p> <strong> display.py - title() </strong> </p> <div class="sourceCode"> <pre class="sourceCode python"><code class="sourceCode python"><span><a aria-hidden="true" href="#cb19-1"></a><span class="kw">def</span> title(soup):</span> <span><a aria-hidden="true" href="#cb19-2"></a> <span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb19-3"></a><span class="co"> Move the title to the location we want it to have and fill also the</span></span> <span><a aria-hidden="true" href="#cb19-4"></a><span class="co"> corresponding meta data tag with the title information.</span></span> <span><a aria-hidden="true" href="#cb19-5"></a><span class="co"> """</span></span> <span><a aria-hidden="true" href="#cb19-6"></a> h1 <span class="op">=</span> soup.find(<span class="st">"h1"</span>)</span> <span><a aria-hidden="true" href="#cb19-7"></a> t <span class="op">=</span> <span class="va">None</span></span> <span><a aria-hidden="true" href="#cb19-8"></a> <span class="cf">if</span> h1:</span> <span><a aria-hidden="true" href="#cb19-9"></a> t <span class="op">=</span> h1.text</span> <span><a aria-hidden="true" href="#cb19-10"></a> h1.decompose()</span> <span><a aria-hidden="true" href="#cb19-11"></a> <span class="cf">else</span>:</span> <span><a aria-hidden="true" href="#cb19-12"></a> t <span class="op">=</span> <span class="st">"No title found"</span></span> <span><a aria-hidden="true" href="#cb19-13"></a></span> <span><a aria-hidden="true" href="#cb19-14"></a> <span class="co"># target header structure: check div exists, otherwise create</span></span> <span><a aria-hidden="true" href="#cb19-15"></a> <span class="co"># &lt;header&gt;&lt;h1&gt;&lt;/h1&gt;&lt;div&gt;&lt;time&gt;&lt;/time&gt;&lt;address&gt;&lt;/address&gt;&lt;/div&gt;&lt;/header&gt;</span></span> <span><a aria-hidden="true" href="#cb19-16"></a> header <span class="op">=</span> soup.find(<span class="st">"header"</span>)</span> <span><a aria-hidden="true" href="#cb19-17"></a> h1 <span class="op">=</span> soup.new_tag(<span class="st">"h1"</span>)</span> <span><a aria-hidden="true" href="#cb19-18"></a> h1.string <span class="op">=</span> t</span> <span><a aria-hidden="true" href="#cb19-19"></a> header.insert(<span class="dv">0</span>, h1)</span> <span><a aria-hidden="true" href="#cb19-20"></a></span> <span><a aria-hidden="true" href="#cb19-21"></a> <span class="co"># meta data</span></span> <span><a aria-hidden="true" href="#cb19-22"></a> head <span class="op">=</span> soup.find(<span class="st">"head"</span>)</span> <span><a aria-hidden="true" href="#cb19-23"></a> meta <span class="op">=</span> soup.new_tag(<span class="st">"meta"</span>)</span> <span><a aria-hidden="true" href="#cb19-24"></a> meta.attrs.update({</span> <span><a aria-hidden="true" href="#cb19-25"></a> <span class="st">"property"</span>: <span class="st">"og:title"</span>,</span> <span><a aria-hidden="true" href="#cb19-26"></a> <span class="st">"content"</span>: t})</span> <span><a aria-hidden="true" href="#cb19-27"></a> head.insert(<span class="dv">6</span>, meta)</span> <span><a aria-hidden="true" href="#cb19-28"></a></span> <span><a aria-hidden="true" href="#cb19-29"></a> titletag <span class="op">=</span> soup.find(<span class="st">"title"</span>)</span> <span><a aria-hidden="true" href="#cb19-30"></a> titletag.string <span class="op">=</span> t</span></code></pre> </div> <p> Since Pandoc already created an <code> h1 </code> tag, it needs just to be moved into the desired location. </p> <!-- page-break --> <h3> Function dates() </h3> <p> Only two dates are saved in the HTML, the published date, which is put into the meta data and visibly into the article header, and the last modified date. </p> <p> As I canceled my central meta data storage, I have to look whether an older version of the article exists and read the correct publishing date from that article. </p> <p> <strong> display.py - dates() </strong> </p> <div class="sourceCode"> <pre class="sourceCode python"><code class="sourceCode python"><span><a aria-hidden="true" href="#cb20-1"></a><span class="kw">def</span> dates(soup, workpath, urn):</span> <span><a aria-hidden="true" href="#cb20-2"></a> <span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb20-3"></a><span class="co"> Create the meta and also the visual date tags.</span></span> <span><a aria-hidden="true" href="#cb20-4"></a><span class="co"> - article:modified_time: current local time</span></span> <span><a aria-hidden="true" href="#cb20-5"></a><span class="co"> - article_published_time: current local time if article is published first</span></span> <span><a aria-hidden="true" href="#cb20-6"></a><span class="co"> time, otherwise the time is taken from the published article.</span></span> <span><a aria-hidden="true" href="#cb20-7"></a><span class="co"> """</span></span> <span><a aria-hidden="true" href="#cb20-8"></a> head <span class="op">=</span> soup.find(<span class="st">"head"</span>)</span> <span><a aria-hidden="true" href="#cb20-9"></a> d <span class="op">=</span> datetime.now().isoformat()</span> <span><a aria-hidden="true" href="#cb20-10"></a></span> <span><a aria-hidden="true" href="#cb20-11"></a> meta <span class="op">=</span> soup.new_tag(<span class="st">"meta"</span>)</span> <span><a aria-hidden="true" href="#cb20-12"></a> meta.attrs.update({</span> <span><a aria-hidden="true" href="#cb20-13"></a> <span class="st">"property"</span>: <span class="st">"article:modified_time"</span>,</span> <span><a aria-hidden="true" href="#cb20-14"></a> <span class="st">"content"</span>: d[:<span class="dv">19</span>]})</span> <span><a aria-hidden="true" href="#cb20-15"></a> head.insert(<span class="dv">6</span>, meta)</span> <span><a aria-hidden="true" href="#cb20-16"></a></span> <span><a aria-hidden="true" href="#cb20-17"></a> htmlpath <span class="op">=</span> workpath <span class="op">/</span>Path(<span class="ss">f"idee/website/article/</span><span class="sc">{</span>urn<span class="sc">}</span><span class="ss">"</span>).with_suffix(<span class="st">".html"</span>)</span> <span><a aria-hidden="true" href="#cb20-18"></a> <span class="cf">if</span> htmlpath.exists():</span> <span><a aria-hidden="true" href="#cb20-19"></a> builder <span class="op">=</span> HTMLParserTreeBuilder()</span> <span><a aria-hidden="true" href="#cb20-20"></a> <span class="cf">with</span> <span class="bu">open</span>(htmlpath, <span class="st">'r'</span>, encoding<span class="op">=</span><span class="st">'utf-8'</span>) <span class="im">as</span> pf:</span> <span><a aria-hidden="true" href="#cb20-21"></a> published_html <span class="op">=</span> pf.read()</span> <span><a aria-hidden="true" href="#cb20-22"></a> pf.close()</span> <span><a aria-hidden="true" href="#cb20-23"></a> published_soup <span class="op">=</span> BeautifulSoup(published_html, builder<span class="op">=</span>builder)</span> <span><a aria-hidden="true" href="#cb20-24"></a> timetag <span class="op">=</span> published_soup.find(<span class="st">"time"</span>, attrs<span class="op">=</span>{<span class="st">"pubdate"</span>: <span class="st">"true"</span>})</span> <span><a aria-hidden="true" href="#cb20-25"></a> <span class="cf">if</span> timetag:</span> <span><a aria-hidden="true" href="#cb20-26"></a> d <span class="op">=</span> timetag[<span class="st">"datetime"</span>]</span> <span><a aria-hidden="true" href="#cb20-27"></a> <span class="cf">else</span>:</span> <span><a aria-hidden="true" href="#cb20-28"></a> <span class="bu">print</span>(<span class="st">"Warning: Published article has no pubdate."</span>)</span> <span><a aria-hidden="true" href="#cb20-29"></a></span> <span><a aria-hidden="true" href="#cb20-30"></a> meta <span class="op">=</span> soup.new_tag(<span class="st">"meta"</span>)</span> <span><a aria-hidden="true" href="#cb20-31"></a> meta.attrs.update({</span> <span><a aria-hidden="true" href="#cb20-32"></a> <span class="st">"property"</span>: <span class="st">"article:published_time"</span>,</span> <span><a aria-hidden="true" href="#cb20-33"></a> <span class="st">"content"</span>: d[:<span class="dv">19</span>]})</span> <span><a aria-hidden="true" href="#cb20-34"></a> head.insert(<span class="dv">6</span>, meta)</span> <span><a aria-hidden="true" href="#cb20-35"></a></span> <span><a aria-hidden="true" href="#cb20-36"></a> <span class="co"># target header structure: check div exists, otherwise create</span></span> <span><a aria-hidden="true" href="#cb20-37"></a> <span class="co"># &lt;header&gt;&lt;h1&gt;&lt;/h1&gt;&lt;div&gt;&lt;time&gt;&lt;/time&gt;&lt;address&gt;&lt;/address&gt;&lt;/div&gt;&lt;/header&gt;</span></span> <span><a aria-hidden="true" href="#cb20-38"></a></span> <span><a aria-hidden="true" href="#cb20-39"></a> header <span class="op">=</span> soup.find(<span class="st">"header"</span>)</span> <span><a aria-hidden="true" href="#cb20-40"></a> div <span class="op">=</span> header.find(<span class="st">"div"</span>)</span> <span><a aria-hidden="true" href="#cb20-41"></a></span> <span><a aria-hidden="true" href="#cb20-42"></a> <span class="cf">if</span> <span class="kw">not</span> div:</span> <span><a aria-hidden="true" href="#cb20-43"></a> div <span class="op">=</span> soup.new_tag(<span class="st">"div"</span>)</span> <span><a aria-hidden="true" href="#cb20-44"></a> header.append(div)</span> <span><a aria-hidden="true" href="#cb20-45"></a></span> <span><a aria-hidden="true" href="#cb20-46"></a> t <span class="op">=</span> soup.new_tag(<span class="st">"time"</span>)</span> <span><a aria-hidden="true" href="#cb20-47"></a> div.insert(<span class="dv">0</span>, t)</span> <span><a aria-hidden="true" href="#cb20-48"></a> t.string <span class="op">=</span> d[:<span class="dv">10</span>]</span> <span><a aria-hidden="true" href="#cb20-49"></a> t.attrs.update({<span class="st">"datetime"</span>: d[:<span class="dv">19</span>]})</span> <span><a aria-hidden="true" href="#cb20-50"></a> <span class="co"># probably deprecated by itemprop alternative</span></span> <span><a aria-hidden="true" href="#cb20-51"></a> t.attrs.update({<span class="st">"pubdate"</span>: <span class="st">"true"</span>})</span></code></pre> </div> <!-- page-break --> <h3> Function tables() </h3> <p> Enables table column width control via <code> %dd% </code> entry in front of the respective column title. Numbers are always interpretated as percentage. </p> <p> Enables the creation of rowspan for individual cells, by marking the cells to join via <code> ^ </code> . </p> <p> <strong> display.py - tables() </strong> </p> <div class="sourceCode"> <pre class="sourceCode python"><code class="sourceCode python"><span><a aria-hidden="true" href="#cb21-1"></a><span class="kw">def</span> tables(soup):</span> <span><a aria-hidden="true" href="#cb21-2"></a> <span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb21-3"></a><span class="co"> 1. Look in &lt;th&gt; tags for %dd% entries at the start of the text.</span></span> <span><a aria-hidden="true" href="#cb21-4"></a><span class="co"> Use dd for style: with:dd%</span></span> <span><a aria-hidden="true" href="#cb21-5"></a><span class="co"> 2. Look in &lt;td&gt; tags for lone ^ entries.</span></span> <span><a aria-hidden="true" href="#cb21-6"></a><span class="co"> Update the rowspan in the respective &lt;td&gt; tag of the predecessing row.</span></span> <span><a aria-hidden="true" href="#cb21-7"></a><span class="co"> Dispose the ^ tag.</span></span> <span><a aria-hidden="true" href="#cb21-8"></a><span class="co"> 3. Look in &lt;td&gt; tags for lone &lt; entries.</span></span> <span><a aria-hidden="true" href="#cb21-9"></a><span class="co"> Update the columnspan in the predecessing &lt;td&gt; tag.</span></span> <span><a aria-hidden="true" href="#cb21-10"></a><span class="co"> Dispose the &lt; tags. (implementation pending)</span></span> <span><a aria-hidden="true" href="#cb21-11"></a><span class="co"> """</span></span> <span><a aria-hidden="true" href="#cb21-12"></a></span> <span><a aria-hidden="true" href="#cb21-13"></a> <span class="co"># 1</span></span> <span><a aria-hidden="true" href="#cb21-14"></a> tags <span class="op">=</span> soup.find_all(<span class="st">"th"</span>)</span> <span><a aria-hidden="true" href="#cb21-15"></a> <span class="cf">for</span> tag <span class="kw">in</span> tags:</span> <span><a aria-hidden="true" href="#cb21-16"></a> text <span class="op">=</span> tag.string</span> <span><a aria-hidden="true" href="#cb21-17"></a> match <span class="op">=</span> re.match(<span class="vs">r'^%(\d</span><span class="sc">{2}</span><span class="vs">%)'</span>, text)</span> <span><a aria-hidden="true" href="#cb21-18"></a> width <span class="op">=</span> match.group(<span class="dv">1</span>) <span class="cf">if</span> match <span class="cf">else</span> <span class="va">None</span></span> <span><a aria-hidden="true" href="#cb21-19"></a> <span class="cf">if</span> width:</span> <span><a aria-hidden="true" href="#cb21-20"></a> s <span class="op">=</span> tag.get(<span class="st">"style"</span>)</span> <span><a aria-hidden="true" href="#cb21-21"></a> <span class="cf">if</span> s:</span> <span><a aria-hidden="true" href="#cb21-22"></a> s <span class="op">=</span> <span class="ss">f"</span><span class="sc">{</span>s<span class="sc">}</span><span class="ss"> width:</span><span class="sc">{</span>width<span class="sc">}</span><span class="ss">;"</span></span> <span><a aria-hidden="true" href="#cb21-23"></a> <span class="cf">else</span>:</span> <span><a aria-hidden="true" href="#cb21-24"></a> s <span class="op">=</span> <span class="ss">f"width:</span><span class="sc">{</span>width<span class="sc">}</span><span class="ss">"</span></span> <span><a aria-hidden="true" href="#cb21-25"></a> tag.attrs.update({<span class="st">"style"</span>: s})</span> <span><a aria-hidden="true" href="#cb21-26"></a> text <span class="op">=</span> text[<span class="dv">4</span>:].strip() <span class="co"># fits for one and two digits</span></span> <span><a aria-hidden="true" href="#cb21-27"></a> tag.string <span class="op">=</span> text</span> <span><a aria-hidden="true" href="#cb21-28"></a></span> <span><a aria-hidden="true" href="#cb21-29"></a> <span class="co"># 2</span></span> <span><a aria-hidden="true" href="#cb21-30"></a> <span class="co"># to prevent index becoming a moving target, iterate backwards</span></span> <span><a aria-hidden="true" href="#cb21-31"></a> <span class="cf">while</span> <span class="va">True</span>:</span> <span><a aria-hidden="true" href="#cb21-32"></a> tags <span class="op">=</span> soup.find_all(<span class="st">"td"</span>, string<span class="op">=</span><span class="st">'^'</span>)</span> <span><a aria-hidden="true" href="#cb21-33"></a> <span class="cf">if</span> <span class="kw">not</span> tags:</span> <span><a aria-hidden="true" href="#cb21-34"></a> <span class="cf">break</span></span> <span><a aria-hidden="true" href="#cb21-35"></a> tag <span class="op">=</span> tags[<span class="op">-</span><span class="dv">1</span>]</span> <span><a aria-hidden="true" href="#cb21-36"></a> <span class="cf">if</span> tag.parent: <span class="co"># we decompose tags as we go, therefore we have to check</span></span> <span><a aria-hidden="true" href="#cb21-37"></a> sibl <span class="op">=</span> tag.parent.find_all(<span class="st">"td"</span>)</span> <span><a aria-hidden="true" href="#cb21-38"></a> <span class="co">## obviously search for index does compare strings, not identity</span></span> <span><a aria-hidden="true" href="#cb21-39"></a> index <span class="op">=</span> <span class="bu">len</span>(sibl) <span class="op">-</span> <span class="dv">1</span> <span class="op">-</span> sibl[::<span class="op">-</span><span class="dv">1</span>].index(tag) <span class="co"># backward search</span></span> <span><a aria-hidden="true" href="#cb21-40"></a> pstag <span class="op">=</span> tag.parent.find_previous_sibling(<span class="st">"tr"</span>) <span class="co"># parent search tag</span></span> <span><a aria-hidden="true" href="#cb21-41"></a> rowspan <span class="op">=</span> <span class="dv">2</span></span> <span><a aria-hidden="true" href="#cb21-42"></a> tag.decompose()</span> <span><a aria-hidden="true" href="#cb21-43"></a> <span class="cf">while</span> pstag <span class="kw">and</span> pstag.find_all(<span class="st">"td"</span>)[index].string <span class="kw">in</span> <span class="st">'^'</span>:</span> <span><a aria-hidden="true" href="#cb21-44"></a> rowspan <span class="op">+=</span> <span class="dv">1</span></span> <span><a aria-hidden="true" href="#cb21-45"></a> pstag.find_all(<span class="st">"td"</span>)[index].decompose()</span> <span><a aria-hidden="true" href="#cb21-46"></a> pstag <span class="op">=</span> pstag.find_previous_sibling(<span class="st">"tr"</span>)</span> <span><a aria-hidden="true" href="#cb21-47"></a> <span class="cf">if</span> pstag:</span> <span><a aria-hidden="true" href="#cb21-48"></a> pstag.find_all(<span class="st">"td"</span>)[index].attrs.update({<span class="st">"rowspan"</span>: <span class="ss">f"</span><span class="sc">{</span>rowspan<span class="sc">}</span><span class="ss">"</span>})</span></code></pre> </div> <!-- page-break --> <h3> Function article_author() </h3> <p> The author of the article is written into the meta data and visibly into the article header. </p> <p> If a comment exists, naming an author, this information is used. Otherwise my name is assumed to be correct. </p> <p> <strong> display.py - dates() </strong> </p> <div class="sourceCode"> <pre class="sourceCode python"><code class="sourceCode python"><span><a aria-hidden="true" href="#cb22-1"></a><span class="kw">def</span> article_author(soup):</span> <span><a aria-hidden="true" href="#cb22-2"></a> <span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb22-3"></a><span class="co"> Check for author information provided as comment.</span></span> <span><a aria-hidden="true" href="#cb22-4"></a><span class="co"> If not, assume Frank Siebert.</span></span> <span><a aria-hidden="true" href="#cb22-5"></a></span> <span><a aria-hidden="true" href="#cb22-6"></a><span class="co"> Populate the author information in the soup.</span></span> <span><a aria-hidden="true" href="#cb22-7"></a><span class="co"> """</span></span> <span><a aria-hidden="true" href="#cb22-8"></a> comments <span class="op">=</span> soup.find_all(string<span class="op">=</span>iscomment)</span> <span><a aria-hidden="true" href="#cb22-9"></a> tag <span class="op">=</span> <span class="va">None</span></span> <span><a aria-hidden="true" href="#cb22-10"></a> a <span class="op">=</span> <span class="va">None</span></span> <span><a aria-hidden="true" href="#cb22-11"></a> <span class="cf">for</span> comment <span class="kw">in</span> comments:</span> <span><a aria-hidden="true" href="#cb22-12"></a> <span class="cf">if</span> <span class="st">'author:'</span> <span class="kw">in</span> comment.text:</span> <span><a aria-hidden="true" href="#cb22-13"></a> tag <span class="op">=</span> comment</span> <span><a aria-hidden="true" href="#cb22-14"></a> <span class="cf">break</span></span> <span><a aria-hidden="true" href="#cb22-15"></a> <span class="cf">if</span> tag:</span> <span><a aria-hidden="true" href="#cb22-16"></a> a <span class="op">=</span> tag.text.split(<span class="st">':'</span>)[<span class="dv">1</span>].strip()</span> <span><a aria-hidden="true" href="#cb22-17"></a> tag.decompose()</span> <span><a aria-hidden="true" href="#cb22-18"></a> <span class="cf">else</span>:</span> <span><a aria-hidden="true" href="#cb22-19"></a> a <span class="op">=</span> <span class="st">"Frank Siebert"</span></span> <span><a aria-hidden="true" href="#cb22-20"></a></span> <span><a aria-hidden="true" href="#cb22-21"></a> head <span class="op">=</span> soup.find(<span class="st">"head"</span>)</span> <span><a aria-hidden="true" href="#cb22-22"></a> meta <span class="op">=</span> soup.new_tag(<span class="st">"meta"</span>)</span> <span><a aria-hidden="true" href="#cb22-23"></a> meta.attrs.update({</span> <span><a aria-hidden="true" href="#cb22-24"></a> <span class="st">"property"</span>: <span class="st">"article:author"</span>,</span> <span><a aria-hidden="true" href="#cb22-25"></a> <span class="st">"content"</span>: a})</span> <span><a aria-hidden="true" href="#cb22-26"></a> head.insert(<span class="dv">6</span>, meta)</span> <span><a aria-hidden="true" href="#cb22-27"></a></span> <span><a aria-hidden="true" href="#cb22-28"></a> <span class="co"># target header structure: check div exists, otherwise create</span></span> <span><a aria-hidden="true" href="#cb22-29"></a> <span class="co"># &lt;header&gt;&lt;h1&gt;&lt;/h1&gt;&lt;div&gt;&lt;time&gt;&lt;/time&gt;&lt;address&gt;&lt;/address&gt;&lt;/div&gt;&lt;/header&gt;</span></span> <span><a aria-hidden="true" href="#cb22-30"></a> header <span class="op">=</span> soup.find(<span class="st">"header"</span>)</span> <span><a aria-hidden="true" href="#cb22-31"></a> div <span class="op">=</span> header.find(<span class="st">"div"</span>)</span> <span><a aria-hidden="true" href="#cb22-32"></a></span> <span><a aria-hidden="true" href="#cb22-33"></a> <span class="cf">if</span> <span class="kw">not</span> div:</span> <span><a aria-hidden="true" href="#cb22-34"></a> div <span class="op">=</span> soup.new_tag(<span class="st">"div"</span>)</span> <span><a aria-hidden="true" href="#cb22-35"></a> header.append(div)</span> <span><a aria-hidden="true" href="#cb22-36"></a></span> <span><a aria-hidden="true" href="#cb22-37"></a> address <span class="op">=</span> soup.new_tag(<span class="st">"address"</span>)</span> <span><a aria-hidden="true" href="#cb22-38"></a> address.string <span class="op">=</span> a</span> <span><a aria-hidden="true" href="#cb22-39"></a> div.append(address)</span></code></pre> </div> <!-- page-break --> <h3> Function article_qrcode() </h3> <p> Every article shows the QR-Code of its URL. This makes it easy to open the article also on a mobile, if you first see it on a computer screen or on paper. </p> <p> The future article location is fully known by its URN. </p> <p> <strong> display.py - article_qrcode() </strong> </p> <div class="sourceCode"> <pre class="sourceCode python"><code class="sourceCode python"><span><a aria-hidden="true" href="#cb23-1"></a><span class="kw">def</span> article_qrcode(soup, workpath, urn):</span> <span><a aria-hidden="true" href="#cb23-2"></a> <span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb23-3"></a><span class="co"> Run after author()</span></span> <span><a aria-hidden="true" href="#cb23-4"></a><span class="co"> PARAMETER</span></span> <span><a aria-hidden="true" href="#cb23-5"></a><span class="co"> soup:</span></span> <span><a aria-hidden="true" href="#cb23-6"></a><span class="co"> workpath: The location of the md-file</span></span> <span><a aria-hidden="true" href="#cb23-7"></a><span class="co"> urn: unique resource name of the article</span></span> <span><a aria-hidden="true" href="#cb23-8"></a></span> <span><a aria-hidden="true" href="#cb23-9"></a><span class="co"> Based on thr urn we know the url the web-page will have on</span></span> <span><a aria-hidden="true" href="#cb23-10"></a><span class="co"> idee.frank-siebert.de, and url the qrcode image will have.</span></span> <span><a aria-hidden="true" href="#cb23-11"></a></span> <span><a aria-hidden="true" href="#cb23-12"></a><span class="co"> For this url we create the qrcode image, if it does not already exist, and</span></span> <span><a aria-hidden="true" href="#cb23-13"></a><span class="co"> show it in the html header.</span></span> <span><a aria-hidden="true" href="#cb23-14"></a><span class="co"> """</span></span> <span><a aria-hidden="true" href="#cb23-15"></a> qrpath <span class="op">=</span> <span class="va">None</span></span> <span><a aria-hidden="true" href="#cb23-16"></a> qrp <span class="op">=</span> <span class="va">None</span></span> <span><a aria-hidden="true" href="#cb23-17"></a> <span class="cf">for</span> qrp <span class="kw">in</span> [<span class="st">"idee/website/qrcode"</span>, <span class="st">"qrcode"</span>]:</span> <span><a aria-hidden="true" href="#cb23-18"></a> qrp <span class="op">=</span> workpath <span class="op">/</span> qrp <span class="op">/</span> urn</span> <span><a aria-hidden="true" href="#cb23-19"></a> qrp <span class="op">=</span> qrp.with_suffix(<span class="st">".png"</span>)</span> <span><a aria-hidden="true" href="#cb23-20"></a> <span class="cf">if</span> qrp.exists():</span> <span><a aria-hidden="true" href="#cb23-21"></a> qrpath <span class="op">=</span> qrp</span> <span><a aria-hidden="true" href="#cb23-22"></a> <span class="cf">break</span></span> <span><a aria-hidden="true" href="#cb23-23"></a></span> <span><a aria-hidden="true" href="#cb23-24"></a> <span class="cf">if</span> <span class="kw">not</span> qrpath: <span class="co"># create it</span></span> <span><a aria-hidden="true" href="#cb23-25"></a> qrpath <span class="op">=</span> qrp</span> <span><a aria-hidden="true" href="#cb23-26"></a> docurl <span class="op">=</span> <span class="st">"https://idee.frank-siebert.de/article/"</span> <span class="op">+</span> urn <span class="op">+</span> <span class="st">".html"</span></span> <span><a aria-hidden="true" href="#cb23-27"></a> image <span class="op">=</span> qrcode.make(data<span class="op">=</span>docurl)</span> <span><a aria-hidden="true" href="#cb23-28"></a> image.save(qrpath)</span> <span><a aria-hidden="true" href="#cb23-29"></a></span> <span><a aria-hidden="true" href="#cb23-30"></a> <span class="co"># write into the soup, first figure in the second div of the header</span></span> <span><a aria-hidden="true" href="#cb23-31"></a> header <span class="op">=</span> soup.find(<span class="st">"header"</span>)</span> <span><a aria-hidden="true" href="#cb23-32"></a> div <span class="op">=</span> header.find_all(<span class="st">"div"</span>)</span> <span><a aria-hidden="true" href="#cb23-33"></a> <span class="cf">if</span> <span class="bu">len</span>(div) <span class="op">==</span> <span class="dv">0</span>:</span> <span><a aria-hidden="true" href="#cb23-34"></a> <span class="bu">print</span>(<span class="st">"call article_qrcode() after author()"</span>)</span> <span><a aria-hidden="true" href="#cb23-35"></a> <span class="cf">elif</span> <span class="bu">len</span>(div) <span class="op">==</span> <span class="dv">1</span>:</span> <span><a aria-hidden="true" href="#cb23-36"></a> div <span class="op">=</span> soup.new_tag(<span class="st">"div"</span>)</span> <span><a aria-hidden="true" href="#cb23-37"></a> header.append(div)</span> <span><a aria-hidden="true" href="#cb23-38"></a> <span class="cf">else</span>:</span> <span><a aria-hidden="true" href="#cb23-39"></a> div <span class="op">=</span> div[<span class="dv">1</span>]</span> <span><a aria-hidden="true" href="#cb23-40"></a></span> <span><a aria-hidden="true" href="#cb23-41"></a> figure <span class="op">=</span> soup.new_tag(<span class="st">"figure"</span>)</span> <span><a aria-hidden="true" href="#cb23-42"></a> div.append(figure)</span> <span><a aria-hidden="true" href="#cb23-43"></a></span> <span><a aria-hidden="true" href="#cb23-44"></a> caption <span class="op">=</span> soup.new_tag(<span class="st">"figcaption"</span>)</span> <span><a aria-hidden="true" href="#cb23-45"></a> figure.insert(<span class="dv">0</span>, caption)</span> <span><a aria-hidden="true" href="#cb23-46"></a></span> <span><a aria-hidden="true" href="#cb23-47"></a> anchor <span class="op">=</span> soup.new_tag(<span class="st">"a"</span>)</span> <span><a aria-hidden="true" href="#cb23-48"></a> anchor.attrs.update({<span class="st">"href"</span>: <span class="ss">f"./</span><span class="sc">{</span>os<span class="sc">.</span>path<span class="sc">.</span>relpath(qrpath, workpath)<span class="sc">}</span><span class="ss">"</span>})</span> <span><a aria-hidden="true" href="#cb23-49"></a> figure.insert(<span class="dv">0</span>, anchor)</span> <span><a aria-hidden="true" href="#cb23-50"></a></span> <span><a aria-hidden="true" href="#cb23-51"></a> img <span class="op">=</span> soup.new_tag(<span class="st">"img"</span>)</span> <span><a aria-hidden="true" href="#cb23-52"></a> img.attrs.update({<span class="st">"width"</span>: <span class="st">"150px"</span>, <span class="st">"height"</span>: <span class="st">"150px"</span>})</span> <span><a aria-hidden="true" href="#cb23-53"></a> img.attrs.update({<span class="st">"src"</span>: <span class="ss">f"./</span><span class="sc">{</span>os<span class="sc">.</span>path<span class="sc">.</span>relpath(qrpath, workpath)<span class="sc">}</span><span class="ss">"</span>})</span> <span><a aria-hidden="true" href="#cb23-54"></a> img.attrs.update({<span class="st">"alt"</span>: <span class="st">"QR Code"</span>})</span> <span><a aria-hidden="true" href="#cb23-55"></a> anchor.insert(<span class="dv">0</span>, img)</span></code></pre> </div> <!-- page-break --> <h3> Function article_licence() </h3> <p> This function looks, whether a comment with licence information exists, and includes then the respective licence information into the article header. </p> <p> A dictionary with a respective configuration exists currently only in the python module and contains only two entries for the "creative commons zero" licence. This is also the licence used, if no other licence information is found. </p> <p> <strong> Module variable LICENCES </strong> </p> <div class="sourceCode"> <pre class="sourceCode python"><code class="sourceCode python"><span><a aria-hidden="true" href="#cb24-1"></a>LICENCES <span class="op">=</span> {</span> <span><a aria-hidden="true" href="#cb24-2"></a> <span class="st">"cc0"</span>: [</span> <span><a aria-hidden="true" href="#cb24-3"></a> {</span> <span><a aria-hidden="true" href="#cb24-4"></a> <span class="st">"href"</span>:</span> <span><a aria-hidden="true" href="#cb24-5"></a> <span class="st">"./idee/website/legal/creative-commons-cc0-1-0-universal.html"</span>,</span> <span><a aria-hidden="true" href="#cb24-6"></a> <span class="st">"alt"</span>: <span class="st">"Creative Commons"</span>,</span> <span><a aria-hidden="true" href="#cb24-7"></a> <span class="st">"img"</span>: <span class="st">"./idee/website/image/CC-Icon.png"</span></span> <span><a aria-hidden="true" href="#cb24-8"></a> },</span> <span><a aria-hidden="true" href="#cb24-9"></a> {</span> <span><a aria-hidden="true" href="#cb24-10"></a> <span class="st">"href"</span>:</span> <span><a aria-hidden="true" href="#cb24-11"></a> <span class="st">"./idee/website/legal/creative-commons-cc0-1-0-universal.html"</span>,</span> <span><a aria-hidden="true" href="#cb24-12"></a> <span class="st">"alt"</span>: <span class="st">"Zero"</span>,</span> <span><a aria-hidden="true" href="#cb24-13"></a> <span class="st">"img"</span>: <span class="st">"./idee/website/image/CC0-Icon.png"</span></span> <span><a aria-hidden="true" href="#cb24-14"></a> }</span> <span><a aria-hidden="true" href="#cb24-15"></a> ]</span> <span><a aria-hidden="true" href="#cb24-16"></a> }</span></code></pre> </div> <p> <strong> display.py - article_licence() </strong> </p> <div class="sourceCode"> <pre class="sourceCode python"><code class="sourceCode python"><span><a aria-hidden="true" href="#cb25-1"></a><span class="kw">def</span> article_licence(soup):</span> <span><a aria-hidden="true" href="#cb25-2"></a> <span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb25-3"></a><span class="co"> Insert the licence information, if requested by the author via</span></span> <span><a aria-hidden="true" href="#cb25-4"></a><span class="co"> &lt;!-- article-licence: cc0 --&gt;. Other licences might be added as required</span></span> <span><a aria-hidden="true" href="#cb25-5"></a></span> <span><a aria-hidden="true" href="#cb25-6"></a><span class="co"> </span><span class="al">TODO</span><span class="co">: A variant of this method could search for &lt;!-- licence: xy --&gt;</span></span> <span><a aria-hidden="true" href="#cb25-7"></a><span class="co"> informations to replace them with licence information for 3d-party content.</span></span> <span><a aria-hidden="true" href="#cb25-8"></a><span class="co"> """</span></span> <span><a aria-hidden="true" href="#cb25-9"></a> comments <span class="op">=</span> soup.find_all(string<span class="op">=</span>iscomment)</span> <span><a aria-hidden="true" href="#cb25-10"></a> c <span class="op">=</span> <span class="va">None</span></span> <span><a aria-hidden="true" href="#cb25-11"></a> <span class="cf">for</span> comment <span class="kw">in</span> comments:</span> <span><a aria-hidden="true" href="#cb25-12"></a> <span class="cf">if</span> <span class="st">'article-licence:'</span> <span class="kw">in</span> comment:</span> <span><a aria-hidden="true" href="#cb25-13"></a> c <span class="op">=</span> comment</span> <span><a aria-hidden="true" href="#cb25-14"></a> <span class="cf">break</span></span> <span><a aria-hidden="true" href="#cb25-15"></a></span> <span><a aria-hidden="true" href="#cb25-16"></a> <span class="cf">if</span> <span class="kw">not</span> c:</span> <span><a aria-hidden="true" href="#cb25-17"></a> <span class="cf">return</span></span> <span><a aria-hidden="true" href="#cb25-18"></a></span> <span><a aria-hidden="true" href="#cb25-19"></a> c <span class="op">=</span> c.split(<span class="st">":"</span>)[<span class="dv">1</span>].strip()</span> <span><a aria-hidden="true" href="#cb25-20"></a></span> <span><a aria-hidden="true" href="#cb25-21"></a> licences <span class="op">=</span> LICENCES.get(c)</span> <span><a aria-hidden="true" href="#cb25-22"></a> <span class="cf">if</span> <span class="kw">not</span> licences:</span> <span><a aria-hidden="true" href="#cb25-23"></a> <span class="bu">print</span>(<span class="ss">f"No matching licence information found for </span><span class="sc">{</span>c<span class="sc">}</span><span class="ss">."</span>)</span> <span><a aria-hidden="true" href="#cb25-24"></a> <span class="cf">return</span></span> <span><a aria-hidden="true" href="#cb25-25"></a></span> <span><a aria-hidden="true" href="#cb25-26"></a> header <span class="op">=</span> soup.find(<span class="st">"header"</span>)</span> <span><a aria-hidden="true" href="#cb25-27"></a> div <span class="op">=</span> header.find_all(<span class="st">"div"</span>)</span> <span><a aria-hidden="true" href="#cb25-28"></a> <span class="cf">if</span> <span class="bu">len</span>(div) <span class="op">&lt;</span> <span class="dv">2</span>:</span> <span><a aria-hidden="true" href="#cb25-29"></a> <span class="bu">print</span>(<span class="st">"Call article_licence() after article_qrcode()"</span>)</span> <span><a aria-hidden="true" href="#cb25-30"></a> <span class="cf">return</span></span> <span><a aria-hidden="true" href="#cb25-31"></a></span> <span><a aria-hidden="true" href="#cb25-32"></a> div <span class="op">=</span> div[<span class="dv">1</span>]</span> <span><a aria-hidden="true" href="#cb25-33"></a></span> <span><a aria-hidden="true" href="#cb25-34"></a> <span class="cf">for</span> licence <span class="kw">in</span> licences:</span> <span><a aria-hidden="true" href="#cb25-35"></a> anchor <span class="op">=</span> soup.new_tag(<span class="st">"a"</span>)</span> <span><a aria-hidden="true" href="#cb25-36"></a> div.append(anchor)</span> <span><a aria-hidden="true" href="#cb25-37"></a> anchor.attrs.update({<span class="st">"href"</span>: licence.get(<span class="st">"href"</span>)})</span> <span><a aria-hidden="true" href="#cb25-38"></a> img <span class="op">=</span> soup.new_tag(<span class="st">"img"</span>)</span> <span><a aria-hidden="true" href="#cb25-39"></a> <span class="co"># The following scaling is for the PDF</span></span> <span><a aria-hidden="true" href="#cb25-40"></a> <span class="co"># In the browser the CSS overwrites this scaling:</span></span> <span><a aria-hidden="true" href="#cb25-41"></a> <span class="co"># newtag.attrs.update({"width": "28px", "height": "28px"})</span></span> <span><a aria-hidden="true" href="#cb25-42"></a> img.attrs.update({<span class="st">"src"</span>: licence.get(<span class="st">"img"</span>)})</span> <span><a aria-hidden="true" href="#cb25-43"></a> img.attrs.update({<span class="st">"alt"</span>: licence.get(<span class="st">"alt"</span>)})</span> <span><a aria-hidden="true" href="#cb25-44"></a> anchor.append(img)</span></code></pre> </div> <h3> Function article_audio() </h3> <p> This function looks whether a recording exists, sharing the same name as the HTML, just with extension <code> .mp3 </code> . It looks into the two folders designated for audio files. </p> <p> If the audio is found, it is added to the article header. </p> <p> <strong> display.py - article_audio() </strong> </p> <div class="sourceCode"> <pre class="sourceCode python"><code class="sourceCode python"><span><a aria-hidden="true" href="#cb26-1"></a><span class="kw">def</span> article_audio(soup, workpath, urn):</span> <span><a aria-hidden="true" href="#cb26-2"></a> <span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb26-3"></a><span class="co"> Run after article_licence()</span></span> <span><a aria-hidden="true" href="#cb26-4"></a><span class="co"> PARAMETER</span></span> <span><a aria-hidden="true" href="#cb26-5"></a><span class="co"> soup:</span></span> <span><a aria-hidden="true" href="#cb26-6"></a><span class="co"> workpath: The location of the md-file</span></span> <span><a aria-hidden="true" href="#cb26-7"></a><span class="co"> urn: unique resource name of the article</span></span> <span><a aria-hidden="true" href="#cb26-8"></a></span> <span><a aria-hidden="true" href="#cb26-9"></a><span class="co"> Based on thr urn we know the url the audio file has to have.</span></span> <span><a aria-hidden="true" href="#cb26-10"></a></span> <span><a aria-hidden="true" href="#cb26-11"></a><span class="co"> """</span></span> <span><a aria-hidden="true" href="#cb26-12"></a> audiopath <span class="op">=</span> <span class="va">None</span></span> <span><a aria-hidden="true" href="#cb26-13"></a> aup <span class="op">=</span> <span class="va">None</span></span> <span><a aria-hidden="true" href="#cb26-14"></a> <span class="cf">for</span> aup <span class="kw">in</span> [<span class="st">"audio"</span>, <span class="st">"idee/website/audio"</span>]:</span> <span><a aria-hidden="true" href="#cb26-15"></a> aup <span class="op">=</span> workpath <span class="op">/</span> aup <span class="op">/</span> urn</span> <span><a aria-hidden="true" href="#cb26-16"></a> aup <span class="op">=</span> aup.with_suffix(<span class="st">".mp3"</span>)</span> <span><a aria-hidden="true" href="#cb26-17"></a> <span class="cf">if</span> aup.exists():</span> <span><a aria-hidden="true" href="#cb26-18"></a> audiopath <span class="op">=</span> aup</span> <span><a aria-hidden="true" href="#cb26-19"></a> <span class="cf">break</span></span> <span><a aria-hidden="true" href="#cb26-20"></a></span> <span><a aria-hidden="true" href="#cb26-21"></a> <span class="cf">if</span> <span class="kw">not</span> audiopath:</span> <span><a aria-hidden="true" href="#cb26-22"></a> <span class="cf">return</span></span> <span><a aria-hidden="true" href="#cb26-23"></a></span> <span><a aria-hidden="true" href="#cb26-24"></a> header <span class="op">=</span> soup.find(<span class="st">"header"</span>)</span> <span><a aria-hidden="true" href="#cb26-25"></a> div <span class="op">=</span> header.find_all(<span class="st">"div"</span>)</span> <span><a aria-hidden="true" href="#cb26-26"></a> <span class="cf">if</span> <span class="bu">len</span>(div) <span class="op">&lt;</span> <span class="dv">2</span>:</span> <span><a aria-hidden="true" href="#cb26-27"></a> <span class="bu">print</span>(<span class="st">"call article_audio() after article_licence()"</span>)</span> <span><a aria-hidden="true" href="#cb26-28"></a> <span class="cf">return</span></span> <span><a aria-hidden="true" href="#cb26-29"></a></span> <span><a aria-hidden="true" href="#cb26-30"></a> div <span class="op">=</span> div[<span class="dv">1</span>]</span> <span><a aria-hidden="true" href="#cb26-31"></a> figure <span class="op">=</span> soup.new_tag(<span class="st">"figure"</span>)</span> <span><a aria-hidden="true" href="#cb26-32"></a> div.append(figure)</span> <span><a aria-hidden="true" href="#cb26-33"></a> audio <span class="op">=</span> soup.new_tag(<span class="st">"audio"</span>)</span> <span><a aria-hidden="true" href="#cb26-34"></a> figure.append(audio)</span> <span><a aria-hidden="true" href="#cb26-35"></a> audio.attrs.update({<span class="st">"accesskey"</span>: <span class="st">"a"</span>,</span> <span><a aria-hidden="true" href="#cb26-36"></a> <span class="st">"type"</span>: <span class="st">"audio/mp3"</span>,</span> <span><a aria-hidden="true" href="#cb26-37"></a> <span class="st">"preload"</span>: <span class="st">"none"</span>,</span> <span><a aria-hidden="true" href="#cb26-38"></a> <span class="st">"controls"</span>: <span class="st">"true"</span>,</span> <span><a aria-hidden="true" href="#cb26-39"></a> <span class="st">"src"</span>: <span class="ss">f"./</span><span class="sc">{</span>os<span class="sc">.</span>path<span class="sc">.</span>relpath(audiopath, workpath)<span class="sc">}</span><span class="ss">"</span>})</span></code></pre> </div> <!-- page-break --> <h3> Function movetoc() </h3> <p> If a comment for the TOC exists, the method moves the TOC generated by Pandoc to the location of the comment, replacing it. </p> <p> If no comment for the TOC exists, then no TOC was desired and the TOC is removed. </p> <p> The highest level of the TOC, containing the article title, is removed. The highest level used for headlines inside the articles is <code> &lt;h2&gt; </code> . </p> <p> <strong> display.py - movetoc() </strong> </p> <div class="sourceCode"> <pre class="sourceCode python"><code class="sourceCode python"><span><a aria-hidden="true" href="#cb27-1"></a><span class="kw">def</span> movetoc(soup):</span> <span><a aria-hidden="true" href="#cb27-2"></a> <span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb27-3"></a><span class="co"> Move the TOC to the correct location</span></span> <span><a aria-hidden="true" href="#cb27-4"></a><span class="co"> The author will set &lt;!-- toc --&gt; if he wants a TOC,</span></span> <span><a aria-hidden="true" href="#cb27-5"></a><span class="co"> and he will do it in the location, where it should be.</span></span> <span><a aria-hidden="true" href="#cb27-6"></a><span class="co"> """</span></span> <span><a aria-hidden="true" href="#cb27-7"></a> toc <span class="op">=</span> soup.find(<span class="st">"nav"</span>, <span class="bu">id</span><span class="op">=</span><span class="st">'TOC'</span>)</span> <span><a aria-hidden="true" href="#cb27-8"></a> <span class="co"># Eliminate the article title from the toc</span></span> <span><a aria-hidden="true" href="#cb27-9"></a> ul1 <span class="op">=</span> toc.find(<span class="st">"ul"</span>)</span> <span><a aria-hidden="true" href="#cb27-10"></a> ul2 <span class="op">=</span> ul1.find(<span class="st">"ul"</span>)</span> <span><a aria-hidden="true" href="#cb27-11"></a> ul1.replace_with(ul2)</span> <span><a aria-hidden="true" href="#cb27-12"></a> tag <span class="op">=</span> <span class="va">None</span></span> <span><a aria-hidden="true" href="#cb27-13"></a> comments <span class="op">=</span> soup.find_all(string<span class="op">=</span>iscomment)</span> <span><a aria-hidden="true" href="#cb27-14"></a> <span class="cf">for</span> comment <span class="kw">in</span> comments:</span> <span><a aria-hidden="true" href="#cb27-15"></a> <span class="cf">if</span> comment <span class="kw">in</span> <span class="st">' toc '</span>:</span> <span><a aria-hidden="true" href="#cb27-16"></a> tag <span class="op">=</span> comment</span> <span><a aria-hidden="true" href="#cb27-17"></a> <span class="cf">break</span></span> <span><a aria-hidden="true" href="#cb27-18"></a> <span class="cf">if</span> tag:</span> <span><a aria-hidden="true" href="#cb27-19"></a> tag.replace_with(toc)</span> <span><a aria-hidden="true" href="#cb27-20"></a> <span class="cf">else</span>:</span> <span><a aria-hidden="true" href="#cb27-21"></a> toc.decompose()</span></code></pre> </div> <h3> Function footnotes() </h3> <p> The footnotes function moves the references section created by Pandoc to the location of the references comment. </p> <p> More a style matter, but best done here, is the replacement of the default back-reference symbol with a symbol that indeed points backwards. </p> <p> <strong> display.py - footnotes() </strong> </p> <div class="sourceCode"> <pre class="sourceCode python"><code class="sourceCode python"><span><a aria-hidden="true" href="#cb28-1"></a><span class="kw">def</span> footnotes(soup):</span> <span><a aria-hidden="true" href="#cb28-2"></a> <span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb28-3"></a><span class="co"> Move footnotes to the place of the authors choice</span></span> <span><a aria-hidden="true" href="#cb28-4"></a><span class="co"> given by &lt;!-- references --&gt;</span></span> <span><a aria-hidden="true" href="#cb28-5"></a><span class="co"> Change back reference symbol to reference back and &lt;CR&gt;</span></span> <span><a aria-hidden="true" href="#cb28-6"></a><span class="co"> """</span></span> <span><a aria-hidden="true" href="#cb28-7"></a> <span class="co"># use a better symbol for backreferences</span></span> <span><a aria-hidden="true" href="#cb28-8"></a> tags <span class="op">=</span> soup.find_all(<span class="st">"a"</span>, string<span class="op">=</span><span class="st">'↩︎'</span>)</span> <span><a aria-hidden="true" href="#cb28-9"></a> <span class="cf">for</span> tag <span class="kw">in</span> tags:</span> <span><a aria-hidden="true" href="#cb28-10"></a> tag.clear()</span> <span><a aria-hidden="true" href="#cb28-11"></a> tag.append(<span class="st">'↑'</span>)</span> <span><a aria-hidden="true" href="#cb28-12"></a></span> <span><a aria-hidden="true" href="#cb28-13"></a> <span class="co"># Footnotes are generated as section &lt;section class="footnotes" </span></span> <span><a aria-hidden="true" href="#cb28-14"></a> <span class="co"># role="doc-endnotes"&gt; - Search Section and use it to replace references comment.</span></span> <span><a aria-hidden="true" href="#cb28-15"></a> fnotes <span class="op">=</span> soup.find(<span class="st">"section"</span>, class_<span class="op">=</span><span class="st">"footnotes"</span>)</span> <span><a aria-hidden="true" href="#cb28-16"></a> <span class="cf">if</span> fnotes:</span> <span><a aria-hidden="true" href="#cb28-17"></a> tag <span class="op">=</span> soup.find(string<span class="op">=</span>iscomment, text<span class="op">=</span><span class="st">'references'</span>)</span> <span><a aria-hidden="true" href="#cb28-18"></a> <span class="cf">if</span> tag:</span> <span><a aria-hidden="true" href="#cb28-19"></a> tag.replace_with(fnotes)</span></code></pre> </div> <h3> article_pdf() </h3> <p> At last the question is, whether PDF-creation is requested. If this is the case, then the PDF and the link to it in the HTML page are created. </p> <p> <strong> display.py - article_pdf() </strong> </p> <div class="sourceCode"> <pre class="sourceCode python"><code class="sourceCode python"><span><a aria-hidden="true" href="#cb29-1"></a><span class="kw">def</span> article_pdf(soup, workpath, urn, htmlpath):</span> <span><a aria-hidden="true" href="#cb29-2"></a> <span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb29-3"></a><span class="co"> Checks for &lt;!-- pdf --&gt;, which is the request to create PDF.</span></span> <span><a aria-hidden="true" href="#cb29-4"></a></span> <span><a aria-hidden="true" href="#cb29-5"></a><span class="co"> Applies some changes to the html:</span></span> <span><a aria-hidden="true" href="#cb29-6"></a><span class="co"> - figure for the audio is senseless in PDF</span></span> <span><a aria-hidden="true" href="#cb29-7"></a><span class="co"> - relative URL's need to become absolute.</span></span> <span><a aria-hidden="true" href="#cb29-8"></a><span class="co"> - footnotes need written links to make sense on paper</span></span> <span><a aria-hidden="true" href="#cb29-9"></a><span class="co"> - creates the pdf</span></span> <span><a aria-hidden="true" href="#cb29-10"></a><span class="co"> - places the link to the PDF into the soup</span></span> <span><a aria-hidden="true" href="#cb29-11"></a><span class="co"> """</span></span> <span><a aria-hidden="true" href="#cb29-12"></a> comments <span class="op">=</span> soup.find_all(string<span class="op">=</span>iscomment)</span> <span><a aria-hidden="true" href="#cb29-13"></a> c <span class="op">=</span> <span class="va">None</span></span> <span><a aria-hidden="true" href="#cb29-14"></a> <span class="cf">for</span> comment <span class="kw">in</span> comments:</span> <span><a aria-hidden="true" href="#cb29-15"></a> <span class="cf">if</span> <span class="st">'pdf'</span> <span class="kw">in</span> comment:</span> <span><a aria-hidden="true" href="#cb29-16"></a> c <span class="op">=</span> comment</span> <span><a aria-hidden="true" href="#cb29-17"></a> <span class="cf">break</span></span> <span><a aria-hidden="true" href="#cb29-18"></a></span> <span><a aria-hidden="true" href="#cb29-19"></a> <span class="cf">if</span> <span class="kw">not</span> c:</span> <span><a aria-hidden="true" href="#cb29-20"></a> <span class="cf">return</span></span> <span><a aria-hidden="true" href="#cb29-21"></a></span> <span><a aria-hidden="true" href="#cb29-22"></a> csspath <span class="op">=</span> Path(<span class="vs">r"/home/frank/projects/idee/website/css/fspdf.css"</span>)</span> <span><a aria-hidden="true" href="#cb29-23"></a></span> <span><a aria-hidden="true" href="#cb29-24"></a> workpath.resolve()</span> <span><a aria-hidden="true" href="#cb29-25"></a> pdfpath <span class="op">=</span> workpath <span class="op">/</span> <span class="st">"pdf"</span> <span class="op">/</span> urn</span> <span><a aria-hidden="true" href="#cb29-26"></a> pdfpath <span class="op">=</span> pdfpath.with_suffix(<span class="st">".pdf"</span>)</span> <span><a aria-hidden="true" href="#cb29-27"></a></span> <span><a aria-hidden="true" href="#cb29-28"></a> pdfsoup <span class="op">=</span> pdf_soup(soup)</span> <span><a aria-hidden="true" href="#cb29-29"></a> page_break(pdfsoup)</span> <span><a aria-hidden="true" href="#cb29-30"></a></span> <span><a aria-hidden="true" href="#cb29-31"></a> html_doc <span class="op">=</span> pdfsoup.prettify()</span> <span><a aria-hidden="true" href="#cb29-32"></a></span> <span><a aria-hidden="true" href="#cb29-33"></a> <span class="co"># weasy can't render MathTex, since it doesn't run the required JavaScript</span></span> <span><a aria-hidden="true" href="#cb29-34"></a> <span class="co"># let chromium hender MathTex to HTML and create PDF from that</span></span> <span><a aria-hidden="true" href="#cb29-35"></a> math <span class="op">=</span> soup.find(<span class="st">"span"</span>, class_<span class="op">=</span><span class="st">"math"</span>)</span> <span><a aria-hidden="true" href="#cb29-36"></a> <span class="cf">if</span> math:</span> <span><a aria-hidden="true" href="#cb29-37"></a> html_doc <span class="op">=</span> pdf_math(html_doc, workpath, htmlpath)</span> <span><a aria-hidden="true" href="#cb29-38"></a></span> <span><a aria-hidden="true" href="#cb29-39"></a> weasy_html <span class="op">=</span> HTML(string<span class="op">=</span>html_doc, base_url<span class="op">=</span><span class="bu">str</span>(workpath))</span> <span><a aria-hidden="true" href="#cb29-40"></a> bpdf <span class="op">=</span> weasy_html.write_pdf(stylesheets<span class="op">=</span>[CSS(filename<span class="op">=</span><span class="bu">str</span>(csspath))])</span> <span><a aria-hidden="true" href="#cb29-41"></a> <span class="cf">with</span> <span class="bu">open</span>(pdfpath, <span class="st">'wb'</span>) <span class="im">as</span> outfile:</span> <span><a aria-hidden="true" href="#cb29-42"></a> outfile.write(bpdf)</span> <span><a aria-hidden="true" href="#cb29-43"></a> outfile.flush()</span> <span><a aria-hidden="true" href="#cb29-44"></a> os.fsync(outfile)</span> <span><a aria-hidden="true" href="#cb29-45"></a> outfile.close()</span> <span><a aria-hidden="true" href="#cb29-46"></a></span> <span><a aria-hidden="true" href="#cb29-47"></a> <span class="co"># insert pdf symbol and link into the original soup</span></span> <span><a aria-hidden="true" href="#cb29-48"></a> link_pdf(soup, urn)</span></code></pre> </div> <p> As can be seen in the code, pdf creation requires a number of steps, which will be detailed below. </p> <!-- page-break --> <h3> pdf_soup() </h3> <p> The HTML needs a bit adjustment before the PDF is created. To avoid messing up the to be published HTML, a separate BeautifulSoup is created. </p> <p> <strong> display.py - pdf_soup() </strong> </p> <div class="sourceCode"> <pre class="sourceCode python"><code class="sourceCode python"><span><a aria-hidden="true" href="#cb30-1"></a><span class="kw">def</span> pdf_soup(soup):</span> <span><a aria-hidden="true" href="#cb30-2"></a> <span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb30-3"></a><span class="co"> Creates a separate soup for the PDF, because there some slight</span></span> <span><a aria-hidden="true" href="#cb30-4"></a><span class="co"> differences.</span></span> <span><a aria-hidden="true" href="#cb30-5"></a></span> <span><a aria-hidden="true" href="#cb30-6"></a><span class="co"> First of all we need to honor additional software in the generator</span></span> <span><a aria-hidden="true" href="#cb30-7"></a><span class="co"> information. Then PDF is not suited for audio and we need to remove that.</span></span> <span><a aria-hidden="true" href="#cb30-8"></a></span> <span><a aria-hidden="true" href="#cb30-9"></a><span class="co"> We also change all links into absolute links to make sure, that the</span></span> <span><a aria-hidden="true" href="#cb30-10"></a><span class="co"> generator finds all images and cascading style sheets.</span></span> <span><a aria-hidden="true" href="#cb30-11"></a><span class="co"> </span></span> <span><a aria-hidden="true" href="#cb30-12"></a><span class="co"> Returns:</span></span> <span><a aria-hidden="true" href="#cb30-13"></a><span class="co"> soup</span></span> <span><a aria-hidden="true" href="#cb30-14"></a><span class="co"> """</span></span> <span><a aria-hidden="true" href="#cb30-15"></a> <span class="co"># create an own soup for pdf</span></span> <span><a aria-hidden="true" href="#cb30-16"></a> html <span class="op">=</span> soup.prettify()</span> <span><a aria-hidden="true" href="#cb30-17"></a></span> <span><a aria-hidden="true" href="#cb30-18"></a> builder <span class="op">=</span> HTMLParserTreeBuilder()</span> <span><a aria-hidden="true" href="#cb30-19"></a> pdfsoup <span class="op">=</span> BeautifulSoup(html, builder<span class="op">=</span>builder)</span> <span><a aria-hidden="true" href="#cb30-20"></a></span> <span><a aria-hidden="true" href="#cb30-21"></a> <span class="co"># honor weasy as generator, and also chromium if we render math</span></span> <span><a aria-hidden="true" href="#cb30-22"></a> math <span class="op">=</span> soup.find(<span class="st">"span"</span>, class_<span class="op">=</span><span class="st">"math"</span>)</span> <span><a aria-hidden="true" href="#cb30-23"></a> meta <span class="op">=</span> pdfsoup.find(<span class="st">"meta"</span>, attrs<span class="op">=</span>{<span class="st">"name"</span>: <span class="st">"generator"</span>})</span> <span><a aria-hidden="true" href="#cb30-24"></a> generators <span class="op">=</span> meta[<span class="st">"content"</span>]</span> <span><a aria-hidden="true" href="#cb30-25"></a> <span class="cf">if</span> math:</span> <span><a aria-hidden="true" href="#cb30-26"></a> meta.attrs.update({<span class="st">"name"</span>: <span class="st">"generator"</span>,</span> <span><a aria-hidden="true" href="#cb30-27"></a> <span class="st">"content"</span>: <span class="ss">f"</span><span class="sc">{</span>generators<span class="sc">}</span><span class="ss">, chromium, weasy"</span>})</span> <span><a aria-hidden="true" href="#cb30-28"></a> <span class="cf">else</span>:</span> <span><a aria-hidden="true" href="#cb30-29"></a> meta.attrs.update({<span class="st">"name"</span>: <span class="st">"generator"</span>,</span> <span><a aria-hidden="true" href="#cb30-30"></a> <span class="st">"content"</span>: <span class="ss">f"</span><span class="sc">{</span>generators<span class="sc">}</span><span class="ss">, weasy"</span>})</span> <span><a aria-hidden="true" href="#cb30-31"></a></span> <span><a aria-hidden="true" href="#cb30-32"></a> <span class="co"># The article header</span></span> <span><a aria-hidden="true" href="#cb30-33"></a> article <span class="op">=</span> pdfsoup.find(<span class="st">"article"</span>)</span> <span><a aria-hidden="true" href="#cb30-34"></a> header <span class="op">=</span> article.find(<span class="st">"header"</span>)</span> <span><a aria-hidden="true" href="#cb30-35"></a></span> <span><a aria-hidden="true" href="#cb30-36"></a> tag <span class="op">=</span> header.find(<span class="st">"figure"</span>, attrs<span class="op">=</span>{<span class="st">"class"</span>: <span class="st">"audio"</span>})</span> <span><a aria-hidden="true" href="#cb30-37"></a> <span class="cf">if</span> tag:</span> <span><a aria-hidden="true" href="#cb30-38"></a> tag.decompose()</span> <span><a aria-hidden="true" href="#cb30-39"></a></span> <span><a aria-hidden="true" href="#cb30-40"></a> <span class="co"># We need to change relative paths to own articles into absolute</span></span> <span><a aria-hidden="true" href="#cb30-41"></a> <span class="co"># paths.</span></span> <span><a aria-hidden="true" href="#cb30-42"></a> rhref <span class="op">=</span> re.<span class="bu">compile</span>(<span class="vs">r"^\.\.\/"</span>)</span> <span><a aria-hidden="true" href="#cb30-43"></a> anchors <span class="op">=</span> pdfsoup.find_all(<span class="st">"a"</span>, href<span class="op">=</span>rhref)</span> <span><a aria-hidden="true" href="#cb30-44"></a> <span class="cf">for</span> anchor <span class="kw">in</span> anchors:</span> <span><a aria-hidden="true" href="#cb30-45"></a> url <span class="op">=</span> rhref.sub(<span class="st">"https://idee.frank-siebert.de/article/"</span>,</span> <span><a aria-hidden="true" href="#cb30-46"></a> anchor[<span class="st">"href"</span>])</span> <span><a aria-hidden="true" href="#cb30-47"></a> anchor.attrs.update({<span class="st">"href"</span>: url})</span> <span><a aria-hidden="true" href="#cb30-48"></a></span> <span><a aria-hidden="true" href="#cb30-49"></a> <span class="co"># On paper we need complete written URLs</span></span> <span><a aria-hidden="true" href="#cb30-50"></a> rhref <span class="op">=</span> re.<span class="bu">compile</span>(<span class="vs">r"^http.*"</span>)</span> <span><a aria-hidden="true" href="#cb30-51"></a> tag <span class="op">=</span> pdfsoup.find(<span class="st">"section"</span>, class_<span class="op">=</span><span class="st">"footnotes"</span>)</span> <span><a aria-hidden="true" href="#cb30-52"></a> <span class="cf">if</span> tag:</span> <span><a aria-hidden="true" href="#cb30-53"></a> anchors <span class="op">=</span> tag.find_all(<span class="st">"a"</span>, href<span class="op">=</span>rhref)</span> <span><a aria-hidden="true" href="#cb30-54"></a> <span class="cf">for</span> anchor <span class="kw">in</span> anchors:</span> <span><a aria-hidden="true" href="#cb30-55"></a> url <span class="op">=</span> anchor[<span class="st">"href"</span>]</span> <span><a aria-hidden="true" href="#cb30-56"></a> anchor.parent.append(pdfsoup.new_tag(<span class="st">"br"</span>))</span> <span><a aria-hidden="true" href="#cb30-57"></a> anchor.parent.append(url)</span> <span><a aria-hidden="true" href="#cb30-58"></a> <span class="cf">return</span> pdfsoup</span></code></pre> </div> <!-- page-break --> <h3> page_break() </h3> <p> Creating PDF means also, that automatic page breaks might simply look awful. It might e.g. let a headline stand alone at the bottom of one page, to start the chapter at the next page. </p> <p> As mentioned, page-break comments can be included anywhere in the markdown file to take control of page-breaks in the PDF. </p> <p> <strong> display.py - article_pdf() </strong> </p> <div class="sourceCode"> <pre class="sourceCode python"><code class="sourceCode python"><span><a aria-hidden="true" href="#cb31-1"></a><span class="kw">def</span> page_break(soup):</span> <span><a aria-hidden="true" href="#cb31-2"></a> <span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb31-3"></a><span class="co"> Find &lt;!-- page-break --&gt; comments set by the author and replace them</span></span> <span><a aria-hidden="true" href="#cb31-4"></a><span class="co"> with &lt;div style="page-break-before: always;"&gt; to control page break</span></span> <span><a aria-hidden="true" href="#cb31-5"></a><span class="co"> positions in the pdf</span></span> <span><a aria-hidden="true" href="#cb31-6"></a><span class="co"> """</span></span> <span><a aria-hidden="true" href="#cb31-7"></a> comments <span class="op">=</span> soup.find_all(string<span class="op">=</span>iscomment)</span> <span><a aria-hidden="true" href="#cb31-8"></a> <span class="cf">for</span> comment <span class="kw">in</span> comments:</span> <span><a aria-hidden="true" href="#cb31-9"></a> <span class="cf">if</span> <span class="st">'page-break'</span> <span class="kw">in</span> comment:</span> <span><a aria-hidden="true" href="#cb31-10"></a> pb <span class="op">=</span> soup.new_tag(<span class="st">"div"</span>)</span> <span><a aria-hidden="true" href="#cb31-11"></a> pb.attrs.update({<span class="st">"style"</span>: <span class="st">"page-break-before: always"</span>})</span> <span><a aria-hidden="true" href="#cb31-12"></a> comment.replace_with(pb)</span> <span><a aria-hidden="true" href="#cb31-13"></a> <span class="co"># if the page-break is in the table, split table</span></span> <span><a aria-hidden="true" href="#cb31-14"></a> <span class="cf">if</span> pb.parent.name <span class="op">==</span> <span class="st">'td'</span>:</span> <span><a aria-hidden="true" href="#cb31-15"></a> tr <span class="op">=</span> pb.parent.parent <span class="co"># should be "tr"</span></span> <span><a aria-hidden="true" href="#cb31-16"></a> tbody <span class="op">=</span> tr.parent</span> <span><a aria-hidden="true" href="#cb31-17"></a> table <span class="op">=</span> tbody.parent</span> <span><a aria-hidden="true" href="#cb31-18"></a> thead <span class="op">=</span> table.find(<span class="st">"thead"</span>)</span> <span><a aria-hidden="true" href="#cb31-19"></a> newthead <span class="op">=</span> copy.copy(thead)</span> <span><a aria-hidden="true" href="#cb31-20"></a> newtable <span class="op">=</span> soup.new_tag(<span class="st">"table"</span>)</span> <span><a aria-hidden="true" href="#cb31-21"></a> table.insert_after(pb)</span> <span><a aria-hidden="true" href="#cb31-22"></a> pb.insert_after(newtable)</span> <span><a aria-hidden="true" href="#cb31-23"></a> newtable.append(newthead)</span> <span><a aria-hidden="true" href="#cb31-24"></a> tbody <span class="op">=</span> soup.new_tag(<span class="st">"tbody"</span>)</span> <span><a aria-hidden="true" href="#cb31-25"></a> newtable.append(tbody)</span> <span><a aria-hidden="true" href="#cb31-26"></a> <span class="cf">while</span> (tr_next <span class="op">:=</span> tr.find_next_sibling(<span class="st">"tr"</span>)) <span class="kw">is</span> <span class="kw">not</span> <span class="va">None</span>:</span> <span><a aria-hidden="true" href="#cb31-27"></a> tbody.append(tr_next)</span> <span><a aria-hidden="true" href="#cb31-28"></a> tbody.insert(<span class="dv">0</span>, tr)</span></code></pre> </div> <!-- page-break --> <h3> pdf_math() </h3> <p> I found the resulting PDF most to my satisfaction using weasy to create it. I tried it first with Pandoc, and most probably the results could look the same with that tool, but for now my decision stands with weasy. </p> <p> The only problem: Weasy does not support JavaScript, which is required to use MathJax to render MathTex into MathML. </p> <p> Multiple solutions can be imagined to solve this issue, including running JavaScript directly from python, but I thought it a genius idea to use chromium to render MathML. </p> <p> <strong> display.py - pdf_math() </strong> </p> <div class="sourceCode"> <pre class="sourceCode python"><code class="sourceCode python"><span><a aria-hidden="true" href="#cb32-1"></a><span class="kw">def</span> pdf_math(html_doc, workpath, htmlpath):</span> <span><a aria-hidden="true" href="#cb32-2"></a> <span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb32-3"></a><span class="co"> Use Chromium to render MathTex into MathML via MathJax, since weasy does</span></span> <span><a aria-hidden="true" href="#cb32-4"></a><span class="co"> not support javascript.</span></span> <span><a aria-hidden="true" href="#cb32-5"></a></span> <span><a aria-hidden="true" href="#cb32-6"></a><span class="co"> </span><span class="al">TODO</span><span class="co">: Implement load and script control for webdriver to know exactly when</span></span> <span><a aria-hidden="true" href="#cb32-7"></a><span class="co"> MathJax scripting finished, to get rid of the sleep.</span></span> <span><a aria-hidden="true" href="#cb32-8"></a></span> <span><a aria-hidden="true" href="#cb32-9"></a><span class="co"> Returns;</span></span> <span><a aria-hidden="true" href="#cb32-10"></a><span class="co"> string - html_doc with rendered MathML</span></span> <span><a aria-hidden="true" href="#cb32-11"></a><span class="co"> """</span></span> <span><a aria-hidden="true" href="#cb32-12"></a> <span class="co"># we use the same filename as for the final html</span></span> <span><a aria-hidden="true" href="#cb32-13"></a> <span class="co"># that way we do not need to delete the temporary file</span></span> <span><a aria-hidden="true" href="#cb32-14"></a> <span class="cf">with</span> <span class="bu">open</span>(htmlpath, <span class="st">'w'</span>, encoding<span class="op">=</span><span class="st">'utf-8'</span>) <span class="im">as</span> outfile:</span> <span><a aria-hidden="true" href="#cb32-15"></a> <span class="bu">print</span>(html_doc, <span class="bu">file</span><span class="op">=</span>outfile)</span> <span><a aria-hidden="true" href="#cb32-16"></a> outfile.flush()</span> <span><a aria-hidden="true" href="#cb32-17"></a> os.fsync(outfile)</span> <span><a aria-hidden="true" href="#cb32-18"></a> outfile.close()</span> <span><a aria-hidden="true" href="#cb32-19"></a></span> <span><a aria-hidden="true" href="#cb32-20"></a> <span class="co"># with webdriver</span></span> <span><a aria-hidden="true" href="#cb32-21"></a> chrome_options <span class="op">=</span> Options()</span> <span><a aria-hidden="true" href="#cb32-22"></a> chrome_service <span class="op">=</span> <span class="op">\</span></span> <span><a aria-hidden="true" href="#cb32-23"></a> webdriver.ChromeService(executable_path<span class="op">=</span><span class="st">"/usr/bin/chromedriver"</span>)</span> <span><a aria-hidden="true" href="#cb32-24"></a> chrome_options.add_argument(<span class="st">"--headless"</span>)</span> <span><a aria-hidden="true" href="#cb32-25"></a> chrome_options.add_argument(<span class="st">"--enable-javascript"</span>)</span> <span><a aria-hidden="true" href="#cb32-26"></a> chrome_options.add_argument(<span class="st">"--window.navigator.webdriver=False"</span>)</span> <span><a aria-hidden="true" href="#cb32-27"></a> chrome_options.add_argument(<span class="st">"--useAutomationExtension=False"</span>)</span> <span><a aria-hidden="true" href="#cb32-28"></a> chrome_options.add_argument(<span class="st">"--disable-blink-features=AutomationControlled"</span>)</span> <span><a aria-hidden="true" href="#cb32-29"></a> chrome_options.add_argument(<span class="st">"--user-agent=Mozilla/5.0 (X11; Linux x86_64)"</span> \</span> <span><a aria-hidden="true" href="#cb32-30"></a> <span class="st">" AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 "</span> \</span> <span><a aria-hidden="true" href="#cb32-31"></a> <span class="st">"Safari/537.36"</span>)</span> <span><a aria-hidden="true" href="#cb32-32"></a> chrome_options.add_experimental_option(<span class="st">"excludeSwitches"</span>,</span> <span><a aria-hidden="true" href="#cb32-33"></a> [<span class="st">"enable-automation"</span>])</span> <span><a aria-hidden="true" href="#cb32-34"></a> browser <span class="op">=</span> webdriver.Chrome(options<span class="op">=</span>chrome_options, service<span class="op">=</span>chrome_service)</span> <span><a aria-hidden="true" href="#cb32-35"></a></span> <span><a aria-hidden="true" href="#cb32-36"></a> temp <span class="op">=</span> workpath <span class="op">/</span> htmlpath</span> <span><a aria-hidden="true" href="#cb32-37"></a> temp <span class="op">=</span> temp.resolve()</span> <span><a aria-hidden="true" href="#cb32-38"></a> browser.get( <span class="ss">f"file://</span><span class="sc">{</span>temp<span class="sc">}</span><span class="ss">"</span> )</span> <span><a aria-hidden="true" href="#cb32-39"></a> <span class="co"># wait till the page finished changing</span></span> <span><a aria-hidden="true" href="#cb32-40"></a> time.sleep(<span class="dv">3</span>)</span> <span><a aria-hidden="true" href="#cb32-41"></a></span> <span><a aria-hidden="true" href="#cb32-42"></a> test <span class="op">=</span> <span class="st">""</span></span> <span><a aria-hidden="true" href="#cb32-43"></a> <span class="cf">while</span> <span class="va">True</span>:</span> <span><a aria-hidden="true" href="#cb32-44"></a> html_doc <span class="op">=</span> browser.page_source</span> <span><a aria-hidden="true" href="#cb32-45"></a> <span class="cf">if</span> html_doc <span class="kw">in</span> test:</span> <span><a aria-hidden="true" href="#cb32-46"></a> <span class="cf">break</span></span> <span><a aria-hidden="true" href="#cb32-47"></a> test <span class="op">=</span> html_doc</span> <span><a aria-hidden="true" href="#cb32-48"></a></span> <span><a aria-hidden="true" href="#cb32-49"></a> <span class="cf">return</span> html_doc</span></code></pre> </div> <p> I consider the usage of chromium as a workaround, and I always tell everyone, that one workaround incubates the next. And indeed, you see the next workaround directly afterwards as <code> time.sleep(3) </code> . </p> <p> As I'm not religious about workarounds, only cautious, I let this be as it is, for now. But I imagine future changes at this code location. </p> <h3> link_pdf() </h3> <p> At last, after the PDF has been written to the disk, the article HTML needs a link to the PDF in the article header. </p> <p> <strong> display.py - link_pdf() </strong> </p> <div class="sourceCode"> <pre class="sourceCode python"><code class="sourceCode python"><span><a aria-hidden="true" href="#cb33-1"></a><span class="kw">def</span> link_pdf(soup, urn):</span> <span><a aria-hidden="true" href="#cb33-2"></a> <span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb33-3"></a><span class="co"> Adds the link to the PDF into the article header, represented by a</span></span> <span><a aria-hidden="true" href="#cb33-4"></a><span class="co"> pdfimage.</span></span> <span><a aria-hidden="true" href="#cb33-5"></a><span class="co"> """</span></span> <span><a aria-hidden="true" href="#cb33-6"></a></span> <span><a aria-hidden="true" href="#cb33-7"></a> pdfimage <span class="op">=</span> <span class="st">"3cd97bab8bb20288768b35fd72979ec3bbf4b2a8.png"</span></span> <span><a aria-hidden="true" href="#cb33-8"></a></span> <span><a aria-hidden="true" href="#cb33-9"></a> header <span class="op">=</span> soup.find(<span class="st">"header"</span>)</span> <span><a aria-hidden="true" href="#cb33-10"></a> divs <span class="op">=</span> header.find_all(<span class="st">"div"</span>)</span> <span><a aria-hidden="true" href="#cb33-11"></a> <span class="cf">if</span> <span class="bu">len</span>(divs) <span class="op">&lt;</span> <span class="dv">1</span>:</span> <span><a aria-hidden="true" href="#cb33-12"></a> <span class="bu">print</span>(<span class="st">"Call article_pdf() last before printing html"</span>)</span> <span><a aria-hidden="true" href="#cb33-13"></a> <span class="cf">return</span></span> <span><a aria-hidden="true" href="#cb33-14"></a></span> <span><a aria-hidden="true" href="#cb33-15"></a> div <span class="op">=</span> divs[<span class="dv">1</span>]</span> <span><a aria-hidden="true" href="#cb33-16"></a></span> <span><a aria-hidden="true" href="#cb33-17"></a> figure <span class="op">=</span> soup.new_tag(<span class="st">"figure"</span>)</span> <span><a aria-hidden="true" href="#cb33-18"></a> div.insert(<span class="dv">1</span>, figure)</span> <span><a aria-hidden="true" href="#cb33-19"></a> anchor <span class="op">=</span> soup.new_tag(<span class="st">"a"</span>)</span> <span><a aria-hidden="true" href="#cb33-20"></a> figure.append(anchor)</span> <span><a aria-hidden="true" href="#cb33-21"></a></span> <span><a aria-hidden="true" href="#cb33-22"></a> anchor.attrs.update({<span class="st">"accesskey"</span>: <span class="st">"p"</span>,</span> <span><a aria-hidden="true" href="#cb33-23"></a> <span class="co"># "download": "",</span></span> <span><a aria-hidden="true" href="#cb33-24"></a> <span class="st">"href"</span>: <span class="ss">f"./pdf/</span><span class="sc">{</span>urn<span class="sc">}</span><span class="ss">.pdf"</span>,</span> <span><a aria-hidden="true" href="#cb33-25"></a> <span class="st">"target"</span>: <span class="st">"_blank"</span>,</span> <span><a aria-hidden="true" href="#cb33-26"></a> <span class="st">"type"</span>: <span class="st">"application/pdf"</span></span> <span><a aria-hidden="true" href="#cb33-27"></a> })</span> <span><a aria-hidden="true" href="#cb33-28"></a></span> <span><a aria-hidden="true" href="#cb33-29"></a> <span class="co"># Inject the PDF Icon</span></span> <span><a aria-hidden="true" href="#cb33-30"></a> img <span class="op">=</span> soup.new_tag(<span class="st">"img"</span>)</span> <span><a aria-hidden="true" href="#cb33-31"></a> anchor.append(img)</span> <span><a aria-hidden="true" href="#cb33-32"></a> img.attrs.update({<span class="st">"src"</span>: <span class="ss">f"./idee/website/image/</span><span class="sc">{</span>pdfimage<span class="sc">}</span><span class="ss">"</span>})</span></code></pre> </div> <p> This is really the last thing done before the article HTML is written to the disk and shown in the Firefox web-browser. </p> <p> If everything is copy-edited and all page-breaks are nicely set for the PDF, then the next to do is the handover to the deployment via git. </p> <!-- page-break --> <h2> IdeePublish </h2> <p> The handover to the deployment is done with the command <code> IdeePublish </code> . </p> <p> <strong> publish.py - imports and debugging </strong> </p> <div class="sourceCode"> <pre class="sourceCode python"><code class="sourceCode python"><span><a aria-hidden="true" href="#cb34-1"></a><span class="co">#!/user/bin/python3</span></span> <span><a aria-hidden="true" href="#cb34-2"></a><span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb34-3"></a><span class="co">Publish generated HTML and attached assets to the idee website git.</span></span> <span><a aria-hidden="true" href="#cb34-4"></a></span> <span><a aria-hidden="true" href="#cb34-5"></a><span class="co">@date: 2026-02-05</span></span> <span><a aria-hidden="true" href="#cb34-6"></a><span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb34-7"></a></span> <span><a aria-hidden="true" href="#cb34-8"></a><span class="im">import</span> sys</span> <span><a aria-hidden="true" href="#cb34-9"></a><span class="im">import</span> os</span> <span><a aria-hidden="true" href="#cb34-10"></a><span class="im">import</span> string</span> <span><a aria-hidden="true" href="#cb34-11"></a><span class="im">import</span> shutil</span> <span><a aria-hidden="true" href="#cb34-12"></a><span class="im">import</span> getopt</span> <span><a aria-hidden="true" href="#cb34-13"></a><span class="im">from</span> pathlib <span class="im">import</span> Path</span> <span><a aria-hidden="true" href="#cb34-14"></a><span class="im">from</span> bs4 <span class="im">import</span> BeautifulSoup</span> <span><a aria-hidden="true" href="#cb34-15"></a><span class="im">from</span> bs4.builder._htmlparser <span class="im">import</span> HTMLParserTreeBuilder</span> <span><a aria-hidden="true" href="#cb34-16"></a></span> <span><a aria-hidden="true" href="#cb34-17"></a>...</span> <span><a aria-hidden="true" href="#cb34-18"></a></span> <span><a aria-hidden="true" href="#cb34-19"></a><span class="cf">if</span> <span class="va">__name__</span> <span class="op">==</span> <span class="st">"__main__"</span>:</span> <span><a aria-hidden="true" href="#cb34-20"></a></span> <span><a aria-hidden="true" href="#cb34-21"></a> MARKDOWNFILE <span class="op">=</span> <span class="va">None</span></span> <span><a aria-hidden="true" href="#cb34-22"></a></span> <span><a aria-hidden="true" href="#cb34-23"></a> <span class="cf">try</span>:</span> <span><a aria-hidden="true" href="#cb34-24"></a> opts, args <span class="op">=</span> getopt.getopt(sys.argv[<span class="dv">1</span>:], [<span class="st">"o"</span>])</span> <span><a aria-hidden="true" href="#cb34-25"></a></span> <span><a aria-hidden="true" href="#cb34-26"></a> <span class="cf">except</span> getopt.GetoptError:</span> <span><a aria-hidden="true" href="#cb34-27"></a> <span class="bu">print</span>(<span class="st">"No Parameter given"</span>)</span> <span><a aria-hidden="true" href="#cb34-28"></a> sys.exit(<span class="dv">2</span>)</span> <span><a aria-hidden="true" href="#cb34-29"></a></span> <span><a aria-hidden="true" href="#cb34-30"></a> <span class="cf">if</span> <span class="bu">len</span>(args) <span class="op">==</span> <span class="dv">0</span>:</span> <span><a aria-hidden="true" href="#cb34-31"></a> <span class="bu">print</span>(<span class="st">"No Parameter given"</span>)</span> <span><a aria-hidden="true" href="#cb34-32"></a> sys.exit(<span class="dv">2</span>)</span> <span><a aria-hidden="true" href="#cb34-33"></a></span> <span><a aria-hidden="true" href="#cb34-34"></a> MARKDOWNFILE <span class="op">=</span> args[<span class="dv">0</span>]</span> <span><a aria-hidden="true" href="#cb34-35"></a> <span class="cf">if</span> <span class="kw">not</span> MARKDOWNFILE:</span> <span><a aria-hidden="true" href="#cb34-36"></a> <span class="bu">print</span>(<span class="st">"No Parameter given"</span>)</span> <span><a aria-hidden="true" href="#cb34-37"></a> sys.exit(<span class="dv">2</span>)</span> <span><a aria-hidden="true" href="#cb34-38"></a></span> <span><a aria-hidden="true" href="#cb34-39"></a> publish(Path(MARKDOWNFILE))</span> <span><a aria-hidden="true" href="#cb34-40"></a></span> <span><a aria-hidden="true" href="#cb34-41"></a> sys.exit(<span class="dv">0</span>)</span></code></pre> </div> <p> The code shows the imports and the <code> __main__ </code> part used to call the publish function from terminal for e.g. debug purposes. The ellipsis shown in the middle of this code snipped stand for all the functions in between, which in the following will be shown one by one. </p> <!-- page-break --> <h3> Function publish() </h3> <p> The publish function has not very much to do. It boils down to the following steps: </p> <ul class="incremental"> <li> adjust resource links to fit the target location </li> <li> move the assets to the target location </li> <li> copy the markdown file to the author directory of the idee website project. </li> </ul> <p> The last step is important to have the source of the published article available for corrections, which might be required later. </p> <p> Often something is found to correct directly after the publishing. Copy-editing sometimes fails, especially if the author is also the copy-editor, as in my case. I therefore decided not to cleanup the directory where authoring took place, but to delete that manually some days after publishing. </p> <p> <strong> publish.py - publish() </strong> </p> <div class="sourceCode"> <pre class="sourceCode python"><code class="sourceCode python"><span><a aria-hidden="true" href="#cb35-1"></a><span class="kw">def</span> publish(filepath: string):</span> <span><a aria-hidden="true" href="#cb35-2"></a> <span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb35-3"></a><span class="co"> Creates an html web page frm the markdown file.</span></span> <span><a aria-hidden="true" href="#cb35-4"></a></span> <span><a aria-hidden="true" href="#cb35-5"></a><span class="co"> The html web page is fully flavored for that site, including the optional</span></span> <span><a aria-hidden="true" href="#cb35-6"></a><span class="co"> linked in pdf and audio versions and content licence information.</span></span> <span><a aria-hidden="true" href="#cb35-7"></a></span> <span><a aria-hidden="true" href="#cb35-8"></a><span class="co"> Returns</span></span> <span><a aria-hidden="true" href="#cb35-9"></a><span class="co"> -------</span></span> <span><a aria-hidden="true" href="#cb35-10"></a><span class="co"> None.</span></span> <span><a aria-hidden="true" href="#cb35-11"></a><span class="co"> """</span></span> <span><a aria-hidden="true" href="#cb35-12"></a> mdpath <span class="op">=</span> Path(filepath)</span> <span><a aria-hidden="true" href="#cb35-13"></a> workdirpath <span class="op">=</span> mdpath.parents[<span class="dv">0</span>]</span> <span><a aria-hidden="true" href="#cb35-14"></a></span> <span><a aria-hidden="true" href="#cb35-15"></a> <span class="co"># check that this a designated workdirectory</span></span> <span><a aria-hidden="true" href="#cb35-16"></a> idee <span class="op">=</span> workdirpath <span class="op">/</span> <span class="st">"idee"</span></span> <span><a aria-hidden="true" href="#cb35-17"></a> <span class="cf">if</span> <span class="kw">not</span> idee.exists():</span> <span><a aria-hidden="true" href="#cb35-18"></a> <span class="bu">print</span>(<span class="st">"Use first IdeeFolders to designate this location as"</span></span> <span><a aria-hidden="true" href="#cb35-19"></a> <span class="st">" workdirectory, and use IdeeDisplay to generate idee-website"</span></span> <span><a aria-hidden="true" href="#cb35-20"></a> <span class="st">" styled HTML"</span></span> <span><a aria-hidden="true" href="#cb35-21"></a> )</span> <span><a aria-hidden="true" href="#cb35-22"></a> sys.exit(<span class="dv">0</span>)</span> <span><a aria-hidden="true" href="#cb35-23"></a></span> <span><a aria-hidden="true" href="#cb35-24"></a> htmlpath <span class="op">=</span> workdirpath <span class="op">/</span> mdpath.stem</span> <span><a aria-hidden="true" href="#cb35-25"></a> htmlpath <span class="op">=</span> htmlpath.with_suffix(<span class="st">".html"</span>)</span> <span><a aria-hidden="true" href="#cb35-26"></a> <span class="co"># read html</span></span> <span><a aria-hidden="true" href="#cb35-27"></a> <span class="cf">if</span> htmlpath.exists():</span> <span><a aria-hidden="true" href="#cb35-28"></a> builder <span class="op">=</span> HTMLParserTreeBuilder()</span> <span><a aria-hidden="true" href="#cb35-29"></a> <span class="cf">with</span> <span class="bu">open</span>(htmlpath, <span class="st">'r'</span>, encoding<span class="op">=</span><span class="st">'utf-8'</span>) <span class="im">as</span> pf:</span> <span><a aria-hidden="true" href="#cb35-30"></a> published_html <span class="op">=</span> pf.read()</span> <span><a aria-hidden="true" href="#cb35-31"></a> pf.close()</span> <span><a aria-hidden="true" href="#cb35-32"></a> <span class="cf">else</span>:</span> <span><a aria-hidden="true" href="#cb35-33"></a> <span class="bu">print</span>(<span class="st">"Use IdeeDisplay to generate idee-website style HTML first"</span>)</span> <span><a aria-hidden="true" href="#cb35-34"></a> sys.exit(<span class="dv">0</span>)</span> <span><a aria-hidden="true" href="#cb35-35"></a></span> <span><a aria-hidden="true" href="#cb35-36"></a> soup <span class="op">=</span> BeautifulSoup(published_html, builder<span class="op">=</span>builder)</span> <span><a aria-hidden="true" href="#cb35-37"></a> resource_paths(soup)</span> <span><a aria-hidden="true" href="#cb35-38"></a> html_doc <span class="op">=</span> soup.prettify()</span> <span><a aria-hidden="true" href="#cb35-39"></a></span> <span><a aria-hidden="true" href="#cb35-40"></a> outpath <span class="op">=</span> workdirpath <span class="op">/</span> <span class="st">"idee"</span> <span class="op">/</span> <span class="st">"website"</span> <span class="op">/</span> <span class="st">"article"</span> <span class="op">/</span> mdpath.stem</span> <span><a aria-hidden="true" href="#cb35-41"></a> outpath <span class="op">=</span> outpath.with_suffix(<span class="st">".html"</span>)</span> <span><a aria-hidden="true" href="#cb35-42"></a> <span class="cf">with</span> <span class="bu">open</span>(outpath, <span class="st">"w"</span>, encoding<span class="op">=</span><span class="st">'utf-8'</span>) <span class="im">as</span> outfile:</span> <span><a aria-hidden="true" href="#cb35-43"></a> <span class="bu">print</span>(html_doc, <span class="bu">file</span><span class="op">=</span>outfile)</span> <span><a aria-hidden="true" href="#cb35-44"></a> outfile.flush()</span> <span><a aria-hidden="true" href="#cb35-45"></a> outfile.close()</span> <span><a aria-hidden="true" href="#cb35-46"></a></span> <span><a aria-hidden="true" href="#cb35-47"></a> copy_assets(workdirpath)</span> <span><a aria-hidden="true" href="#cb35-48"></a> <span class="co"># copy the markdown file itself also</span></span> <span><a aria-hidden="true" href="#cb35-49"></a> shutil.copy(mdpath, idee <span class="op">/</span> <span class="st">"author"</span>)</span> <span><a aria-hidden="true" href="#cb35-50"></a></span> <span><a aria-hidden="true" href="#cb35-51"></a> <span class="bu">print</span>(<span class="ss">f"Published to: </span><span class="sc">{</span>outpath<span class="sc">}</span><span class="ss">"</span>)</span></code></pre> </div> <h3> Function resource_paths() </h3> <p> Adjusts the relative paths to the articles assets as they are required by the web-sites directory structure. </p> <p> <strong> publish.py - resource_paths() </strong> </p> <div class="sourceCode"> <pre class="sourceCode python"><code class="sourceCode python"><span><a aria-hidden="true" href="#cb36-1"></a><span class="kw">def</span> resource_paths(soup):</span> <span><a aria-hidden="true" href="#cb36-2"></a> <span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb36-3"></a><span class="co"> Change the resource paths to fit for the idee website</span></span> <span><a aria-hidden="true" href="#cb36-4"></a><span class="co"> """</span></span> <span><a aria-hidden="true" href="#cb36-5"></a> <span class="cf">for</span> i <span class="kw">in</span> [<span class="st">"src"</span>, <span class="st">"href"</span>]:</span> <span><a aria-hidden="true" href="#cb36-6"></a> tags <span class="op">=</span> soup.find_all(attrs<span class="op">=</span>{i: <span class="va">True</span>})</span> <span><a aria-hidden="true" href="#cb36-7"></a> <span class="cf">for</span> tag <span class="kw">in</span> tags:</span> <span><a aria-hidden="true" href="#cb36-8"></a> p <span class="op">=</span> tag[i]</span> <span><a aria-hidden="true" href="#cb36-9"></a> p <span class="op">=</span> p.replace(<span class="st">"./idee/website/"</span>, <span class="st">"./"</span>)</span> <span><a aria-hidden="true" href="#cb36-10"></a> p <span class="op">=</span> p.replace(<span class="st">"./"</span>, <span class="st">"../"</span>)</span> <span><a aria-hidden="true" href="#cb36-11"></a> tag.attrs.update({i: p})</span></code></pre> </div> <h3> Function copy_assets() </h3> <p> The function <code> copy_assets() </code> copies the assets in the authoring subfolders to their target location in the web-site folder structure. </p> <p> <strong> publish.py - copy_assets() </strong> </p> <div class="sourceCode"> <pre class="sourceCode python"><code class="sourceCode python"><span><a aria-hidden="true" href="#cb37-1"></a><span class="kw">def</span> copy_assets(workpath):</span> <span><a aria-hidden="true" href="#cb37-2"></a> <span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb37-3"></a><span class="co"> Copy the assets located in subfolders to the respective subfolder in the</span></span> <span><a aria-hidden="true" href="#cb37-4"></a><span class="co"> idee-website</span></span> <span><a aria-hidden="true" href="#cb37-5"></a><span class="co"> """</span></span> <span><a aria-hidden="true" href="#cb37-6"></a> <span class="cf">for</span> d <span class="kw">in</span> [<span class="st">"audio"</span>, <span class="st">"files"</span>, <span class="st">"image"</span>, <span class="st">"js"</span>, <span class="st">"pdf"</span>, <span class="st">"qrcode"</span>]:</span> <span><a aria-hidden="true" href="#cb37-7"></a> wd <span class="op">=</span> workpath <span class="op">/</span> d</span> <span><a aria-hidden="true" href="#cb37-8"></a> wtd <span class="op">=</span> workpath <span class="op">/</span> <span class="st">"idee"</span> <span class="op">/</span> <span class="st">"website"</span> <span class="op">/</span> d</span> <span><a aria-hidden="true" href="#cb37-9"></a> <span class="cf">for</span> f <span class="kw">in</span> os.listdir(wd):</span> <span><a aria-hidden="true" href="#cb37-10"></a> wdf <span class="op">=</span> wd <span class="op">/</span> f</span> <span><a aria-hidden="true" href="#cb37-11"></a> <span class="cf">if</span> os.path.isfile(wdf):</span> <span><a aria-hidden="true" href="#cb37-12"></a> shutil.copy(wdf, wtd)</span></code></pre> </div> <h2> Changes in vimrc </h2> <p> There is just one more thing to mention. My vimrc file, which by default is in the folder <code> ~/.vim/ </code> , got some additional lines defining a key mapping in normal- as well as in insert-mode to run the command <code> IdeeMeta </code> . </p> <div class="sourceCode"> <pre class="sourceCode bash"><code class="sourceCode bash"><span><a aria-hidden="true" href="#cb38-1"></a><span class="st">" Markdown autocmd for idee ftplugin</span></span> <span><a aria-hidden="true" href="#cb38-2"></a><span class="st">augroup md</span></span> <span><a aria-hidden="true" href="#cb38-3"></a><span class="st"> autocmd!</span></span> <span><a aria-hidden="true" href="#cb38-4"></a><span class="st"> autocmd FileType markdown imap &lt;C-y&gt; &lt;C-O&gt;:IdeeMeta&lt;CR&gt;</span></span> <span><a aria-hidden="true" href="#cb38-5"></a><span class="st"> autocmd FileType markdown nmap &lt;C-y&gt; :IdeeMeta&lt;CR&gt;</span></span> <span><a aria-hidden="true" href="#cb38-6"></a><span class="st">augroup END</span></span></code></pre> </div> <p> This makes it very smooth to insert the comments used for meta data and to control aspects of the HTML/PDF creation. </p> <h2> Deployment </h2> <p> Deployment is a separate topic. I just teaser here, that I commit everything placed into the idee web-site project into git. During this commit RSS-feed, an archive page for the month, the index page of the web-site and the sitemap.xml are updated. </p> <p> Those need then a second commit. It seems not possible to prevent that. </p> <p> After pushing to the server-git, a client-git on the web-server is triggered automatically to fetch the latest changes, by which these become life in the web-site. </p> <p> Two commits and one push to set a new or updated article life. </p> <p> That's the topic of a future article, stay tuned. </p> <h2> Final Notes </h2> <p> Obviously this code is highly specialized for my own purposes. It is not meant to be copied, pasted and used as is in any other setting. </p> <p> It shows, how I adjusted my own workspace to my own needs, which tools I used for this and how I utilized them. </p> <p> I'm pretty sure it can be adjusted to serve the needs of someone else, and it could probably also be developed further to become customizable. At the present time I have no plans to do this. </p> <p> However, I hope this to be a showcase, how to get rid of all the 3rd-party scripts loaded from 3rd-party servers doing not much more than exposing web-site visitors to private data sellers. </p> <p> If you think it worth to derive a customizable version of this, usable by a broader audience, go ahead and do it. Its creative commons zero after all. </p> <p> Or just ask. If someone asks for it, I have an incentive to go on with it. </p> <h2> Footnotes </h2> <!-- references --> <hr/> <ol> <li> <p> <a href="https://www.mdpi.com/2073-8994/14/2/300"> The Cosmological Constant as Event Horizon </a> ; Enrique Gaztanaga; Symmetry, volume 14; Multidisciplinary Digital Publishing Institute; DOI: <a href="https://doi.org/10.3390/sym14020300"> <span> https://doi.org/10.3390/sym14020300 </span> </a> ; 2022-02-01 </p> </li> <li> <p> <a href="https://xclacksoverhead.org/home/about"> XClacksOverhead.org </a> ; XClacksOverhead.org; X-Clacks-Overhead </p> </li> </ol> </div>]]></content:encoded>
  </item>
  <item>
   <title>Taking England's all cause mortality data literally</title>
   <link>https://idee.frank-siebert.de/article/taking-englands-all-cause-mortality-data-literally.html</link>
   <pubDate>Sat, 18 Dec 2021 12:26:27 +0000</pubDate>
   <guid isPermaLink="false">https://idee.frank-siebert.de/article/taking-englands-all-cause-mortality-data-literally.html</guid>
   <description><![CDATA[Update 2022-09-17: After migration from wordpress one picture got missing, it is now linked correctly. I also realised that I mentioned people dropping out of the the vaccinated group without getting registered in the next group. The word vaccinated is now corrected into unvaccinated . ...]]></description>
   <content:encoded><![CDATA[<div> <div> <h1> Taking England's all cause mortality data literally </h1> <div> <time datetime="2021-12-18 12:26:27" pubdate="true"> 2021-12-18 </time> <address> Frank Siebert </address> </div> <div> <figure> <a href="https://idee.frank-siebert.de/qrcode/taking-englands-all-cause-mortality-data-literally.png"> <img src="https://idee.frank-siebert.de/qrcode/taking-englands-all-cause-mortality-data-literally.png"/> </a> <figcaption> </figcaption> </figure> <figure> <a accesskey="p" href="https://idee.frank-siebert.de/pdf/taking-englands-all-cause-mortality-data-literally.pdf" target="_blank" type="application/pdf"> <img src="https://idee.frank-siebert.de/image/3cd97bab8bb20288768b35fd72979ec3bbf4b2a8.png"/> </a> </figure> <a href="https://idee.frank-siebert.de/article/creative-commons-cc0-1-0-universal.html"> <img src="https://idee.frank-siebert.de/image/CC-Icon.png"/> </a> <a href="https://idee.frank-siebert.de/article/creative-commons-cc0-1-0-universal.html"> <img src="https://idee.frank-siebert.de/image/CC0-Icon.png"/> </a> </div> </div> <p> <strong> Update 2022-09-17: </strong> After migration from wordpress one picture got missing, it is now linked correctly. I also realised that I mentioned people dropping out of the the <strong> vaccinated </strong> group without getting registered in the next group. The word <strong> vaccinated </strong> is now corrected into <strong> unvaccinated </strong> . </p> <p> The Data from the ONS dataset "Deaths by vaccination status, England" <sup> ( 1 ) </sup> from the "Office for National Statistics", got big attention. Information like this is pretty much unavailable everywhere else. In Germany I do not know of any source providing such information. </p> <p> I already referred in an Article to the work done on this data set by Norman Fenton and Martin Neil" <sup> ( 2 ) </sup> . </p> <p> Follow me into a deep dive into some aspects of the data, and what it tells us if taken literally. </p> <h2> Conclusions </h2> <p> The Data about all cause deaths by vaccination status England, from the Office of National Statistics, makes the following interesting statements: </p> <ul class="incremental"> <li> The vaccinations increase the all cause death rate of the unvaccinated. </li> <li> It is possible to get vaccinated with both doses on the same day </li> </ul> <p> This, I want to emphasize, is not a statement from me. It is a statement encoded in the data from the Office of National Statistics. I explicitly emphasize, that I do not believe in the truth of these two statements. </p> <p> In the following part you may follow my investigation into the data, to dig out these two statements together with me. As I dug even deeper, I came to a different conclusion. Feel free to make your own. </p> <h2> Investigating the Data </h2> <p> On Tab 4 of the Excel-File provided, all cause mortality data is provided with the following column names: "Week ending", "Week number", "Vaccination status", "Age group", "Number of deaths", "Population", "Death rate per 100,000", "Lower confidence limit", "Upper confidence limit". </p> <p> By chance you might miss that one column, which comes without any column-header, featuring some 'u' entries to mark these lines as "unreliable". The criteria for this flag would be probably of interest, but it is mot documented below the table. From the position of that column we can tell, that it refers to the "Death rate per 100,000", as do the two confidence limits as well. </p> <p> But we learn from documentation below the table: <em> Figures are based on provisional mortality data and the Public Health Data Asset (PHDA), a linked dataset of people resident in England who could be linked to the 2011 Census and GP Patient Register. </em> </p> <p> The table provides the data separated into 4 age groups and 4 vaccination statuses, the age groups '10-59', '60-69', '70-79 and '80+_, the vaccination statuses 'Unvaccinated', 'Within 21 days of the first dose', '21 days or more after first dose' and 'Second dose'. </p> <p> Why ONS decided to use only one big age group for the 10 to 59 age old people stays a mystery for me. It would have made the life of an uncounted number of statisticians much simpler, if the age groups would have been aligned with those of the vaccination status tracking. </p> <h3> Calculated age specific death rate versus provided age specific death rate </h3> <p> For counting numbers of deaths by vaccination status and age group and calculating the death rate by dividing deaths through the group size, you wouldn't need a confidence interval, if you knew those numbers. </p> <p> But if you not know these numbers by counting, or do not know them exactly, wouldn't you name a confidence interval for each number you are not sure about? Can we find out more about the source of the confidence limits and "unreliable" marker? </p> <p> Let us take some lines to compare the self-calculated death rate with the provided death rate. On June 4th, the unvaccinated age-group 10 to 59 registers 125 deaths and a population size of 11,832,842 people. </p> <p> To calculate the death-rate per 100,000 we just multiply 125 with 100,000 and divide this trough 11,832,842. </p> <p> 125*100,000/11,832,842 = 1.056381 </p> <p> Showing just one digit behind the decimal comma, our result does match the result in the sheet. But we should take a line where more digits are available, it does not matter whether these are in front of or behind the decimal comma, to get a better resolution in our comparison. </p> <p> The June 4 unvaccinated 80+ entry seems to be suitable, not being marked as unreliable by a lower-case "u". </p> <p> 209*100,000/75,045 = 278.499566 </p> <p> We get a perfect match of self calculated and provided death rate also for this line. This is the point, where it is probably better to start using the spreadsheet we are sitting in front of, to make that comparison for all lines in search for a difference. </p> <p> Doing this, you can discover that all self-calculated death rates match with those provided as age-specific rate per 100.000. In a number of lines the death rate is omitted by ONS, in some other lines a lower-case "u" indicates unreliability caused by too low numbers. </p> <p> This u-signal raises questions by itself, it might or might not be related to the confidence interval, but I didn't see such information in the documentation till now. What we found out by the investigation: The age-specific rate per 100.000 seemingly was not altered by any additional information. It is really just calculated by most simple math. </p> <h3> A look at the group transitions </h3> <p> What happens at time of vaccinations in regard to the death toll? </p> <p> To look at this, we first use the death rate of the two neighboring groups, and visualize less prominently the population change, which does indicate the vaccine uptake, while at a lesser rate it also includes the population change by death. But as that second aspect is not dominating, we ignore this for a start. </p> <p> We start our view with the age class 70 to 79, because the age class 80+ did start vaccination in 2020 and would not provide the fill view on the scenario. We do not start with the group 10 to 59, because that group is much to big and vaccination uptake of its undisclosed subgroups is spread over a long time period. </p> <p> The age group 70 to 79, being a small age group embedded between two other small age groups is the perfect choice to have a start. </p> <p> <strong> Graph 1 </strong> </p> <figure> <a href="https://idee.frank-siebert.de/image/ONS-All-Cause-Mortality-Rate-2021-70-79-Unvaccinated-and-21-Days.png" target="_blank"> <img src="https://idee.frank-siebert.de/image/ONS-All-Cause-Mortality-Rate-2021-70-79-Unvaccinated-and-21-Days.png"/> </a> <figcaption> No Caption </figcaption> </figure> <p> This graphic explains us, that the death rate of the vaccinated follows in very good approximation the group-size changes of the "Within 21 days of the first dose"-group. </p> <p> If this is puzzling you, then you are not alone. </p> <p> <strong> If we take this result literally, we have to claim that the first dose of vaccine kills the unvaccinated. </strong> </p> <p> The wild peaks up and down in the death rate of the "Within 21 days of the first dose"-group starting around CW 17 are due to the small population size, and must not bother you at all. </p> <p> If we do not take this result literally, then we need to think about the question: <strong> What is wrong with this data? </strong> </p> <p> Three probable explanations for such an effect where given by Norman Fenton, Professor for Risk Information Management, and Martin Neil, Professor for Computer Science and Statistic, both teaching at the Queen Mary University at London, asking: "Is vaccine efficacy a statistical illusion?" <sup> ( 3 ) </sup> : </p> <ol class="incremental"> <li> A delay in the reporting of deaths could be responsible, leading in a shrinking population to a spike of the death rate, which follows inversely the rate of shrinking. Officially this explanation cannot apply, since ONS claims the data is based on the day of deaths, not on the day of death-reporting. </li> <li> The population of the unvaccinated probably got underestimated, which would lead to a growing death rate as the population shrinks. </li> <li> Deaths are probably wrongly categorized, meaning vaccinated deaths are attributed to the unvaccinated group. </li> </ol> <p> In the end also other explanations or a combination of explanations might be possible. We don't yet know the explanation, but we know that something is wrong with this data, because we have to believe that unvaccinated are not dying of the vaccination of others. So the data as we see it requires an explanation. To claim its true, that unvaccinated people are dying because others get their vaccine, is an insult on our common sense. </p> <p> We found an artifact in this data, which shows definitely an effect not existing in real life. As we do not know the cause of that artifact, it seems impossible to remove it reliably. But probably it is, lets take a deeper look. </p> <h3> Death rates only by age group </h3> <p> For further analysis, I installed PostgreSQL <sup> ( 4 ) </sup> and wrote some SQL views. This way its best documented, what the respective view does exactly to produce the graph shown here <sup> ( 5 ) </sup> . </p> <p> <strong> Graph 2 </strong> </p> <figure> <a href="https://idee.frank-siebert.de/image/6228c874aba172a2532192cc5408165c804f71b6.png" target="_blank"> <img src="https://idee.frank-siebert.de/image/6228c874aba172a2532192cc5408165c804f71b6.png"/> </a> <figcaption> No Caption </figcaption> </figure> <p> This graph shows us, that nothing special happens with the overall death rates. The highest death rate in all age groups is in CW 3, pointing out that this is not something vaccination related, but simply a seasonal increase as usual in the winter season. </p> <p> Neither the COVID deaths nor the Vaccination deaths are expected to show up here dominantly and the all cause mortality was in decline where the strange increases in all cause mortality in unvaccinated shows up in the ONS data. </p> <p> <strong> This can be seen as additional indicator, that the strange late death-spike in the unvaccinated group is just a statistic-only-effect without real-life relevance. </strong> </p> <h3> Death rate inoculated version unincolated </h3> <p> I'll focus in the further deep dive mainly on the age group 70-79. Less as deep dive but just to have it, I created also a simplified view <sup> ( 6 ) </sup> distinguishing only two vaccination status groups, the group of inoculated people which got at least one shot, and the group of uninoculated people which never got any COVID-shot. </p> <p> <strong> Graph 3 </strong> </p> <figure> <a href="https://idee.frank-siebert.de/image/9fc2e517200c52424484faf739ab443a5e90bf0b.png" target="_blank"> <img src="https://idee.frank-siebert.de/image/9fc2e517200c52424484faf739ab443a5e90bf0b.png"/> </a> <figcaption> No Caption </figcaption> </figure> <p> A discussion of the conditions, which could explain such a graph, where the vaccination seems to cause deaths in unvaccinated, was provided by Norman Fenton, Professor for Risc Information Management, and Martin Neil, Professor for Computer Sciences and Statistics, both at the Queen Mary University in London, in their article "Is vaccine efficacy a statistical illusion?" <sup> ( 7 ) </sup> . </p> <p> My intention is to go further with the investigation. </p> <h3> Checking the population balance </h3> <p> The data from Office for National Statistics is limited to a population, which took part in a Census in 2010 or probably one year next that, plus one additional criteria I forgot about. The point is: it is a closed population, which cam ne left only by death and which is not joined by anyone anytime in 2021. </p> <p> Because it is a closed group of the population, we can check the changes in the sub-populations with simple math. </p> <p> E.g. the death rates calculated by ONS use the population number of the week and the deaths of the week to calculate the death rate. Checking some weeks you can easily control that statement. This means, that the population given for the week is always the population at the begin of that week. </p> <p> If we subtract the deaths of the week from the population of the week, we get the population at the end of the week, which would be the expected population at the begin of the next week. </p> <p> If we would not have any age groups, this would already be all. But since we have age groups, one can move from one age group to another by having a birthday. But for a start we can ignore this to come back to this point later again. The separation into four different groups by vaccination status is also too complex for a start. </p> <p> Therefore the next graph shows just for each age group, how much the previous weeks end population differed from the weeks start population. A negative balance (below zero) means, that people join the group by birthday. A positive balance (above zero) means, that people leave the group by birthday. At least that's the only known reason to change the age-group, as deaths take no other part in this balance check than in the calculation of the population at the end of the week <sup> ( 8 ) </sup> . </p> <p> <strong> Graph 4 </strong> </p> <figure> <a href="https://idee.frank-siebert.de/image/5e50ca36cc693feb6f08e0cd12c26eb8e9cab021.png" target="_blank"> <img src="https://idee.frank-siebert.de/image/5e50ca36cc693feb6f08e0cd12c26eb8e9cab021.png"/> </a> <figcaption> No Caption </figcaption> </figure> <p> Looks strange? What's going on the age group 10-59 in week 13? An post from Anonymous from December 1st. explains, that the Census was closed on the 27th of March in 2011 <sup> ( 9 ) </sup> </p> <p> Therefore my knowledge was weak, until the 27th of March in 2021 new persons (childs) still joined the age group 10-59 by birthday. Now I know better. For the age group 70-79, which I intended to focus on, is anyhow unaffected by this detail. </p> <p> But it is definitely good to know that I do not have to bother too much because of missing people suddenly appearing where there is no possibility to do so. </p> <p> We see in this graph that a lot of people leave weekly the group 10-59 to join the group 60-69. In the same time some less move on into the group 70-79, causing a negative balance. The group 80+ cannot be left other than by death, therefore it needs to have a negative balance. </p> <p> Probably I should switch signs and let a positive balance make grow the group. I could easily do this, as it is just by definition the way it is now. But I stay with the current definition and implementation. It doesn't tell us anyhow, whether the group grows, since deaths are unaccounted for in the balance. </p> <p> But we can also look at the population balance between the unvaccinated population and the "Within 21 days of the first dose" subgroups in the age group 70-79. </p> <p> <strong> Graph 5 </strong> </p> <figure> <a href="https://idee.frank-siebert.de/image/314d5d1e4cdd58f474e40936f1a7656ac8fd9616.png" target="_blank"> <img src="https://idee.frank-siebert.de/image/314d5d1e4cdd58f474e40936f1a7656ac8fd9616.png"/> </a> <figcaption> No Caption </figcaption> </figure> <p> We see until week 4 a very good match, but then the two lines get out of sync. But this is generally correct, since everyone has to leave the second group after the 21 days. </p> <h3> Calculating the balance of the 21 days group </h3> <p> Just by look at the balance shown above, we cannot really tell, whether the 21 days balance fits with the unvaccinated balance. But since the two balances are tightly related, the 21 days balance can be easily calculated from the unvaccinated balance <sup> ( 10 ) </sup> </p> <p> <strong> Graph 6 </strong> </p> <figure> <a href="https://idee.frank-siebert.de/image/e2fe6492e84e0e9f73a5ca97ac3abb13bb3b64b6.png" target="_blank"> <img src="https://idee.frank-siebert.de/image/e2fe6492e84e0e9f73a5ca97ac3abb13bb3b64b6.png"/> </a> <figcaption> No Caption </figcaption> </figure> <p> The dotted red line is the calculated balance, based on the transition of people from the unvaccinated group to the 21 days group. The orange line is the line given by the difference between end of last weeks population and begin of new week population as provided by the ONS Data. </p> <p> The two curves are a very good match, with seemingly tiny mismatches between the 12th and 21st week. Taken literally, a big bunch of unvacinated people where either vaccinated at that time, all dying and not being able to join the 21 days group, or an extraordinary big of people had birthday in these weeks, making the unvaccinated disappear into the next age group. </p> <p> The event peaks in week 16 and if we look into Graph 4 we do not see any special birthday event with change of age group reflected there for the age group 70-79. And death is also not an alternative, because if their death would have been registered, the peak in the graph would not exist. </p> <p> <strong> We see people dropping out of the unvaccinated group without getting registered in the next group and without being registered as death. </strong> </p> <p> The ledger not in balance. The books are not well kept. </p> <p> But its not much, isn't it? </p> <h3> Zooming into the deviation in the group transition from vaccinated into the 21 days group </h3> <p> In Graph 4 we see the Net movement between the groups, not directly, but we get an approximation. </p> <ul class="incremental"> <li> About 12,000 move every week from age group 10-59 into 60-69 </li> <li> About 9,000 move every week from age group 60-69 into 70-79 </li> <li> About 5,500 move every week from age group 70-79 into 80+ </li> </ul> <p> Sometimes its more or less, but in the time frame we look at its a quite stable situation. </p> <p> Looking at age group 70-79 we could also say - its a 10 year group, each year having 365 days, every day being as likely a birthday as any other. We can calculate the weekly number of birthdays shifting members above the 79 years as: New <sub> 80+ </sub> = Population <sub> 70-79 </sub> * 7 days / 10 years / 365 days/year <sup> ( 11 ) </sup> . </p> <p> This calculation gives an upper limit for our expectation, in which range a population change of the group can possibly be during one week, without being very suspect. It is a very generous threshold, because: </p> <ul class="incremental"> <li> It is expected that most vulnerable older people of the group got vaccinated earlier, making birthday events with move to the age-group 80+ in the unvaccinated sub-group less likely than in the overall population. </li> <li> The unvaccinated people joining from the next younger age group are not included in this view, and those would narrow down the threshold even more. </li> </ul> <p> At the other hand: </p> <ul class="incremental"> <li> Including the vaccinated people of this age group leaving into the age group 80+ would increase the threshold, but lets see if these about 5500 weekly age group changing birthdays in this age group would make a difference. If we get that impression, we can extend our SQL statement. </li> </ul> <p> We can apply this view also to subgroups of that age group, which changes its size because of vaccination events. </p> <p> In the following picture the dotted lines show the upper and lower threshold for the maximal expected birthday related deviation in the weekly population balance based on the size of the unvaccinated group of the age group 70-79. </p> <p> <strong> Graph 7 </strong> </p> <figure> <a href="https://idee.frank-siebert.de/image/5cb19dfbd35663933abfb2b95547f65f7f473088.png" target="_blank"> <img src="https://idee.frank-siebert.de/image/5cb19dfbd35663933abfb2b95547f65f7f473088.png"/> </a> <figcaption> No Caption </figcaption> </figure> <p> We see a quite strong signal here, way outside the expected birthday related effects. '''In CW 16 the deviation of the population imbalance exceeds the expected birthday related effects by about 80,000 people. ''' </p> <p> What is the exact meaning of this signal? </p> <p> About 120.000 People left the unvaccinated age group during the weeks 14, 15 and 16 without becoming member of the 21 days group. A possible move into the age group 80+ can not explain this. Looking at Graph 4 we see very clearly, that these 80.000 did not move to the 80+ group during those 3 weeks. </p> <p> In Graph 3 the group of inoculated people shows a significant increase in week 16 running synchronous with a decrease in the uninoculated people at the same time. This probably means that the major part of these 120,000 people moved into another group. The question remains, in which group did they move? </p> <h3> Searching for the transition target group of 120,000 people </h3> <p> Since the move into age group 80+ definitely can not give any account for these 120,000 people, we have to take a look at other groups inside of the age group 70-79. </p> <p> <strong> Graph 8 </strong> </p> <figure> <a href="https://idee.frank-siebert.de/image/24f8d5c9088fa35864feaaea74c55670b2c168cf.png" target="_blank"> <img src="https://idee.frank-siebert.de/image/24f8d5c9088fa35864feaaea74c55670b2c168cf.png"/> </a> <figcaption> No Caption </figcaption> </figure> <p> Now that our sight is schooled by the data analysis we did until this point, we can see immediately that at least the major part of these 120,000 unvaccinated people moved directly to the 2nd dose group. Do you see it? </p> <p> <strong> About 120,000 people moved directly from the unvaccintated into the 2nd dose group in the calendar weeks 14, 15 and 16. </strong> </p> <p> You may not ask me, how this is possible. I formulate only a statement made by the "all cause deaths data set" from the Office of National Statistics. It is not my own statement, it is a statement made by the data itself. </p> <p> One probable reason comes to mind: Maybe they where vaccinated with Janssen / Johnson&amp;Johnson single shot vaccine? </p> <p> BBC reports at the 28th of May: "Janssen single-dose Covid vaccine approved by UK" <sup> ( 12 ) </sup> </p> <p> This is the end of calendar week 21, which is much later. The single dose vaccination can not provide any explanation. </p> <h2> Aftermath </h2> <p> 75,884 all cause deaths where reported in the age group 70-79 in the time frame covered by the ONS data. </p> <p> If we do not believe, that the vaccinations are responsible for deaths in the unvaccinated group, and if we do not believe that some people get both doses on the same day, which significantly deviates from the official vaccination plan, what is the final conclusion? </p> <p> About 120,000 unvaccinated suddenly being vaccinated with the second dose, while the complete all cause deaths where less in the same age group. </p> <p> I can only draw the conclusion, that the complete number of all cause deaths can be covered by the 120,000 people which suddenly jump across two intermediate groups. </p> <p> And these, I have also to conclude, where the surviving so-called "unvaccinated". I have to name them now "so-called unvaccinated", because they obviously where vaccinated and survived vaccination until the second dose was applied. </p> <p> No one can tell how many non-surviving so-called "unvaccinted" died before getting the second dose, increasing the officiak death-toll of the unvaccinated group. </p> <p> Going back to the phenomenon that the death rate of the unvaccinated seemingly follows for its best part the population size of the "Within 21 days" Group, as it is shown in Graph 1. Now that we see about 120,000 "unvaccinated" getting their 2nd dose in the weeks 14, 15 and 16, it is much more than just a guess, that this death rate of the "unvaccinated" is dominated by vaccinated people, which where wrongly accounted as unvaccinated. </p> <p> <strong> 120,000+ people aged 70 to 79 did not move into the 'Within 21 days of the first dose' group after getting their first dose. </strong> </p> <p> This tells us also: <strong> Every single death rate provided for the "Unvaccinated" aged 70 to 79 in the ONS data deserves to be flagged as 'unreliable'. </strong> </p> <p> Indeed, obviously the same has to be true for all other death rates in the age group 70 to 79. </p> <p> <strong> The Office for National Statics did not deliver on its promise. </strong> </p> <p> Give me a different satisfying conclusion, please. </p> <h2> Going Further? </h2> <p> Since the theme of this statistic, together with the choice for a well-defined closed group, established by itself a kind of double bookkeeping in the data, it was possible to identify a discrepancy in the data proving the bookkeeping as faulty. </p> <p> Thanks to the nature of a double bookkeeping data set, it might still be possible to correct errors in this bookkeeping to some extend. Probably, if this turns out to work, I'll be able to provide additional insights in a later article. </p> <hr/> <p> Cognition is always temporary and always of individual nature. You decide for yourself whether you adapt the findings of others as opinions or whether you acquire own insights. My references are intended to help you with the latter, but you should always use other sources as well. </p> <p> Do not believe, not even me, but examine and conclude for yourself. </p> <h2> Footnotes </h2> <hr/> <ol> <li> <a href="https://www.ons.gov.uk/peoplepopulationandcommunity/birthsdeathsandmarriages/deaths/datasets/deathsbyvaccinationstatusengland"> Deaths by vaccination status, England </a> ; www.ons.gov.uk; Release Date 2021-11-01 </li> <li> <a href="https://idee.frank-siebert.de/article/professoren-stellen-die-frage-ist-impfstoffwirksamkeit-eine-statistische-illusion.html"> Professoren stellen die Frage: Ist Impfstoffwirksamkeit eine statistische Illusion? </a> ; Frank Siebert; idee.frank-siebert.de; 2021-12-02 </li> <li> <a href="https://probabilityandlaw.blogspot.com/2021/11/is-vaccine-efficacy-statistical-illusion.html"> Is vaccine efficacy a statistical illusion? </a> ; Norman Fenton, Martin Neil; probabilityandlaw.blogspot.com; 2021-11-14 </li> <li> <a href="https://wiki.frank-siebert.de/script-inst/index.php?title=PostgreSQL_Installation"> PostgreSQL Installation </a> ; wiki.frank-siebert,de/inst </li> <li> <a href="https://wiki.frank-siebert.de/script-inst/index.php?title=PostgreSQL_-_Create_View_-_ons.v_aggr_pop_deaths"> View - ons.v_aggr_pop_deaths </a> ;wiki.frank-siebert,de/inst </li> <li> <a href="https://wiki.frank-siebert.de/script-inst/index.php?title=PostgreSQL_-_Create_View_-_ons.v_population_and_deaths_by_inoculation_status_and_age_group"> View - ons.v population and deaths by inoculation status and age group </a> ; wiki.frank-siebert,de/inst </li> <li> <a href="https://probabilityandlaw.blogspot.com/2021/11/is-vaccine-efficacy-statistical-illusion.html"> Is vaccine efficacy a statistical illusion? </a> ; Norman Fenton, Martin Neil; probabilityandlaw.blogspot.com; 2021-11-14 </li> <li> <a href="https://wiki.frank-siebert.de/script-inst/index.php?title=PostgreSQL_-_Create_View_-_ons.v_weekly_population_balance"> View - ons.v_weekly_population_balance </a> ; wiki.frank-siebert,de/inst </li> <li> <a href="https://probabilityandlaw.blogspot.com/2021/12/the-impact-of-misclassifying-deaths-in.html"> The impact of misclassifying deaths in evaluating vaccine safety: the same statistical illusion </a> ; Norman Fenton, Martin Neil; Comment by Anonymous, 2021-12-01 </li> <li> <a href="https://wiki.frank-siebert.de/script-inst/index.php?title=PostgreSQL_-_Create_View_-_ons.v_population_transition_21d"> View - ons.v_population_transition_21d </a> ; wiki.frank-siebert,de/inst </li> <li> <a href="https://wiki.frank-siebert.de/script-inst/index.php?title=PostgreSQL_-_Create_View_-_ons.v_population_transition_21d_deviation#VIEW_ons.v_population_transition_21d_deviation"> View - ons.v population transition 21d deviation </a> ; wiki.frank-siebert,de/inst </li> <li> <a href="https://web.archive.org/web/20210528150035/https://www.bbc.com/news/health-57283837"> Janssen single-dose Covid vaccine approved by UK </a> ; Philippa Roxby; BBC; via archive.org, 2021-05-28 </li> </ol> </div>]]></content:encoded>
  </item>
  <item>
   <title>Replacing WordPress</title>
   <link>https://idee.frank-siebert.de/article/replacing-wordpress.html</link>
   <pubDate>Wed, 16 Feb 2022 01:29:28 +0000</pubDate>
   <guid isPermaLink="false">https://idee.frank-siebert.de/article/replacing-wordpress.html</guid>
   <description><![CDATA[Is it a story, or is it a technical documentation? Probably it is both, I only know it started with a specification of some kind and went on to become a solution. And now I try to compile the specification and the implementation notes into a journey description. My journey to Python and my new web representation and what I had learn on my way. ...]]></description>
   <content:encoded><![CDATA[<div> <div> <h1> Replacing WordPress </h1> <div> <time datetime="2022-02-16T01:29:28" pubdate="true"> 2022-02-16 </time> <address> Frank Siebert </address> </div> <div> <figure> <a href="https://idee.frank-siebert.de/qrcode/replacing-wordpress.png"> <img src="https://idee.frank-siebert.de/qrcode/replacing-wordpress.png"/> </a> <figcaption> </figcaption> </figure> <figure> <a accesskey="p" href="https://idee.frank-siebert.de/pdf/replacing-wordpress.pdf" target="_blank" type="application/pdf"> <img src="https://idee.frank-siebert.de/image/3cd97bab8bb20288768b35fd72979ec3bbf4b2a8.png"/> </a> </figure> <a href="https://idee.frank-siebert.de/article/creative-commons-cc0-1-0-universal.html"> <img src="https://idee.frank-siebert.de/image/CC-Icon.png"/> </a> <a href="https://idee.frank-siebert.de/article/creative-commons-cc0-1-0-universal.html"> <img src="https://idee.frank-siebert.de/image/CC0-Icon.png"/> </a> </div> </div> <p> Is it a story, or is it a technical documentation? Probably it is both, I only know it started with a specification of some kind and went on to become a solution. And now I try to compile the specification and the implementation notes into a journey description. My journey to Python and my new web representation and what I had learn on my way. </p> <p> <em> The code in this article has still hot needle quality. </em> </p> <h2> Motivation </h2> <h3> Part One </h3> <p> I think WordPress is a good platform to create a web presence. It is just a pain in the ass if you start to care about privacy. You start to remove all the included tracker features contacting google analytics, you continue with the removal of "social" media features, because you are not sure whether these communicate the respecitve platforms without pressing the social media button and you remove all those resource links loading resources from foreign servers, because the respective resource servers might be able to identify you and also the page you visited. </p> <p> After this isolation of your web presence, you are halfway sure that the privacy of your visitors is save. Then you might receive a notification, that you should update WordPress or one of the plugins. ... </p> <p> If you do not update, you might run into the situation where your web presence might become insecure for your visitors. If you update, you have to review everything again. Is your server isolation still in effect, or did the update reinstate some of the removed side-communications? </p> <h3> Part Two </h3> <p> I decided to learn Python and needed a project for this. I stumbled upon the article "Gitblog - the software that powers my blog" <sup> ( 1 ) </sup> , and I liked the idea to publish via </p> <div class="sourceCode"> <pre class="sourceCode bash"><code class="sourceCode bash"><span><a aria-hidden="true" href="#cb1-1"></a>$ <span class="fu">git</span> push</span></code></pre> </div> <p> I'd guess most developers are in agreement with this. Posting articles just the same way as you push new code to a git server - developers a bound to like this. </p> <p> The Gitblock solution is based on Java and nothing is wrong with this. I used Java in software development for more than 20 years, with a noticable break where I mainly used ABAP to come back to Java again. </p> <p> But since I wanted to learn Python anyhow, this was the ideal project to get a start with that language. </p> <h2> Project Duration </h2> <p> From start to go live it took slighly less than 2 month. The first Wiki entry dates back to December 31 <sup> st </sup> 2021, but this was nothing more than a note to myself. January 16 <sup> th </sup> the real drafting on the specification started and rarely anything from that first specification survived. But the most important requirements stayed stable and are met with the solution. </p> <ul class="incremental"> <li> No JavaScript <ul class="incremental"> <li> There is one article containing JavaScript, because the article contains a quiz. But articles are content, not publishing solution. </li> </ul> </li> <li> All plain HTML+CSS static content </li> <li> State of the art semantic HTML </li> <li> Search (the exception from the static content) <ul class="incremental"> <li> Currently YaCy integration </li> <li> Planned to have a python based Search </li> </ul> </li> </ul> <p> The website with all previously published articles migrated did go live on March 14 <sup> th </sup> . And migration was a really heavy topic. Most articles where originally written in one of my Wikis and then published in WordPress, but some of the very short video, audio or article recommendations where not written in the Wiki. Some articles got corrections maintained directly in WordPress instead of a correction in the Wiki with republishing afterwards. For some of those corrections I just made a comment in WordPress to make readers aware of the mistake. </p> <p> In short: Inconsistency in previous publishing's made migration a major effort. </p> <p> Other inconsistencies where: </p> <ul class="incremental"> <li> Articles maintained in Wiki had partly chapter headlines starting with header level 2 and partly starting with header level 3. </li> <li> Quotations where only preceeded with "Zitat:" and concluded with "Zitat Ende", but not everywhere, since I started this only when I started with the audio recordings of my articles - to make sure I do not forget to mention the start and the end of the quotation in the recording. </li> <li> Quotation where not enclosed in &lt; blockquote &gt; tags. </li> </ul> <p> To provide state of the art semantical HTML I had to copy-edit every article during the migration. The good news is: The new setup will drive and support me to publish in a more constent manner. </p> <p> Considering this major migration effort, I'm pretty proud the project did take "only" 2 month, especially since it was the Python learning project. </p> <p> As it happens often on the way, I learned much more than just Python. I learned new things about PDF generation, fonts, git, regex, HTML, CSS, vim, the IDE spyder, the web server nginx and even more. </p> <h2> Requirement Specification </h2> <p> <em> The requirement specification was subject to changes. As it often happens, this was mainly because it not only described requirements, but made also already assumptions about technical details of the solution. </em> </p> <p> <em> A funny fact: I spend years explaining my own customers that it is important not to write requirement specifications with a technical solution in mind. The requirement specification should focus strictly on non technical scenario descriptions. The rationale behind this: Very often a customer would ask to eliminate work efforts caused by previously implemented workarounds. This workaround is viewed as tool by the customer, and following the customers suggestion leads to the implementation of yet another workaround. Very often you get a much better overall solution, if you sunset also existing workarounds, which is difficult, because those where so helpful in the past. </em> </p> <h3> My previous publishing scenario </h3> <p> I use one of my MediaWiki siblings to collect information and I use also this wiki to create articles based on that information. This part of the publishing szenario stays in place. I considered to change this as well and to write articles in future in the editor vim, but I decided to keep infomation collection and article compilation together in one place. </p> <p> To get this article published, together with its audio recording. I used the HTML export option of an PDF export extension. </p> <p> I then used the editor vim with 3 regex statements to strip the header and the footer from that export, and, if necessary, also the references to categories, which would otherwise establish links pointing into the void when displayed in WordPress. </p> <p> The remaining HTML was then pasted into one HTML input field in the Create Post UI of WordPress. Thus the page internal links in the table of contents and to and from the reference section of the page stayed functional. </p> <p> If pictures were included, I uploaded these first to WordPress and used these uploaded pictures already in the wiki. That way the links to the pictures stayed as they were in the later WordPress version of the article. </p> <p> All in all not too cumbersome a process, but with room for improvement. Especially when corrections where required it was much to easy to apply the correction directly in WordPress instead of doing the correction in the Wiki and to republish it. And this leads to problems on the long run. For some time I thought about some automation of text deployment to WordPress to mitigate this. But those thought are now obviously obsolete. </p> <h3> How do I want to do it in future ? </h3> <p> <em> This chapter is from my early specification notes. I tried to figure out what I really want. </em> </p> <p> This is not really easy to tell. I'm still struggling to have one opinion with myself about this topic. </p> <p> I'd like to edit my pages with MediaWiki markup or, as alternative, with markdown. From the implementation side it would be simplest to keep the editing process as I do it today and only to change the publishing. </p> <p> The publishing and the result as shown in Gitblog is quite to my taste. However, this solution is based on Java, and I think 20+ years of Java is enough. I'd like to base my own solution on Python. Not because it is so much better than Java, which might or might not be the case, but because I decided to learn Python down to its depths and such a project is a perfect opportunity. </p> <p> This does not mean, that I need to write everything from scratch, there are already a lot of modules in existence to build upon. </p> <p> At the other hand I'd like to be able to write my articles also completely offline, just using vim as markup editor. But is this a realistic scenario? Am I not researching every detail anyhow online during the authoring? So many things you did read about and you are quite sure about, but you need a source as reference when you write it into an article. Will I ever really do authoring offline? </p> <p> But why not both options? </p> <p> During commit a pre-commit handler can check the mime type and do one thing if it is an html-fragment to be placed into an empty html page template, and do another thing if it is an markdown file with the extension md. </p> <p> Looking at GitLab Docs <sup> ( 2 ) </sup> it is undeniable that powerful versions of markdown exist. However, installing GitLab means also to install a big bunch of software, which is not really smaller than wordpress. In search for a small minimalistic solution GitLab is probably out of scope. </p> <p> <em> To be honest, I'm not sure I ended up with less installations than Gitlab. But as you see in this text, I initially expected that would need to meddle with the HTML from the MediaWiki, as I did before. Fortunately specifications are a moving target, something we developers will often complain about. </em> </p> <h3> How shall it look like in the Future </h3> <p> ''This chapter is from my early specification notes as well. It was less off the mark than the previous chapter. </p> <p> For a start it should look like before, just without those things no longer required. E.g. a logon is no longer required, since I push and merge new articles to the server, instead of logging in and using an authoring front-end. </p> <p> <em> It looks different, but not too much different </em> </p> <h4> Search </h4> <p> Initially the search will use my YaCy instance. I have to look how well this integrates. </p> <p> <em> Yes, YaCy is integrated. But I consider the current search integration as improvable. </em> </p> <h4> Site Pane? </h4> <p> Is a side pane required any longer? Probably not, I'm not sure. </p> <p> <em> No site pane any more. </em> </p> <h4> Header Collapse or not? </h4> <p> Can I collapse the header with the site navigation during scroll down and make it available when the user starts to scroll up? I mean without JavaScript, only with CSS? I will see. </p> <p> <em> This point got no priority at all. Nice idea probably, but in the end I didn't care. </em> </p> <h4> Small Header </h4> <p> The header will become smaller, since I will shorten the main site name to "Idee" with a smaller "der eigenen Erkenntnis" and I will write it in small-caps to hide the problem with uppercase "Erkenntnis" not fitting exactly to the lowercase "e" at the end of "Idee". And the Header-Text will move on Top of the Header Picture. ''Forget it. Ok, the header got smaller, but that's it. </p> <h4> Article PDF </h4> <p> Every article will get an PDF-Download Button. The PDF is not necessarily optimized for print and offline reading, but it is nonetheless a good idea to simplify the access to references in the reference section via QR-Code for the respective links. </p> <p> <em> Article PDF is implemented and also a possibility to suppress its generation for low value content, e.g. if the "article" is just a recommendation note. </em> </p> <p> <em> References in the PDF are rendered as in the online version, but showing additionally the HTTP address as text. QRCode creation for every reference does not take place. </em> </p> <h4> Article Archive </h4> <p> WordPress shows am archive drop down. That needs scripting and dynamic population. An alternative would be the generation of one archive page, which allows drill down to the year, which allows drill down to the month. Such pages can stay unchanged, as long as I do not change the portal part. as soon as the respective month or respective year passed by. The usability is most probably not less than a drop down, which at some point gets a bit messy to scroll on small screens. </p> <p> <em> Implemented as described. Only a portal change does in the chosen implementation not require a regeneration of pages, which is a huge improvement compared with the initial specification. </em> </p> <h4> Sizing Pictures? </h4> <p> Should I size pictures during commit? Should I sample audio files to a number of different qualities? A lot of options are open now, with the development of an own page factory. </p> <p> <em> The question-mark in the title can be answered Today with No. </em> </p> <h4> Picture based article selection </h4> <p> I could create a picture gallery to select articles by picture. But then probably I should create a picture for every article... . Not really, I also, in rare cases, do not create audio. I wouldn't force myself into picture creation, where the picture does not add value. </p> <p> <em> The original text does hint it already. Nothing in this regard has happened. </em> </p> <h4> Semantic Web </h4> <p> Articles will have state of the art HTML5 article structure. This needs some intelligent logic if it comes to the correct use of tags like the cite-tag. I probably need to think about the markdown and MediaWiki representation of the HTML cite-tag to make this one working nicely. </p> <p> <em> I obviously meant quotations. The markup representation for quotations is the respective HTML tag &lt; blockquote &gt;. The mentioned &lt; cite &gt; tag could probably come into use in my articles as well in the references, but this is probably not a could idea. Possible that I'll introduce this later. </em> </p> <p> However, a lot of semantic is simple. The article content resides in the article-tag. The article-tag contains a header-tag, whose headline and media are descriptive for the complete article, like the QR Code of the URL, the PDF file and the audio file. Video is not planned. </p> <p> The HTML head meta-tags for articles and the og meta-tags bring a lot of invisible semantic to the page. </p> <p> A lot of options. In the end I will strip this text down to those things, which made it into the product. </p> <p> <em> It's all in, apart of the cite tag, which was an error in the specification. </em> </p> <h4> Citation </h4> <p> That would be an interactive page function. </p> <ul class="incremental"> <li> Reuse Citations I made in various citation formats. <ul class="incremental"> <li> Click at a function link at the footnote. the citation gets shown and a citation format can be selected. </li> <li> The result can be used via copy-paste buffer. </li> </ul> </li> <li> Cite statements made by me in various citation formats. <ul class="incremental"> <li> Mark a text passage in my article and a citation function link gets shown. </li> <li> Then as above. </li> </ul> </li> </ul> <p> Yes, this function needs java-script, which would be a draw back from the plain HTML philosophy I started with. Citing myself a lot of other publications and knowing about the effort to create the citations as I need them, I think such a function is worth to be scripted. </p> <p> Plain HTML is not a religion. It is rather: Avoid scripting where it is not required. <em> Ok, nice idea to make it simple for others to cite me. But it is not implemented and probably will never be. </em> </p> <h4> JavaScript </h4> <p> JavaScript can be disabled in the browser without any impact for the casual user, Only "extended features" may rely on JavaScript, as the citation feature. Features not enabled due to disabled JavaScript stay invisible to the user. </p> <p> <em> I translate myself for myself :-). Function-links are written into the page via JavaScript. If JavaScript is not enabled, the page will not contain any malfunctioning function links. </em> </p> <p> <em> Till now I did not need to follow this specification, since the implementation doesn't use JavaScript anywhere. But the specification stays valid. </em> </p> <h4> Sitemaps </h4> <p> The "Sitemaps XML format " <sup> ( 3 ) </sup> description explains the concept and the XML document structure of sitemaps. </p> <h4> RSS </h4> <p> Updates have a reason, most probably additional or corrected information went into the article. To make an announcement of such changes is imperative for an information provider, and the RSS.xml is the place for this. </p> <h4> Monthly Archive </h4> <p> As sitemaps can be structured by one or more sitemap index files, it does make sense to use this to structure the sitemaps by "yyyy-MM", getting one sitemap per month. </p> <p> Thus its probably simple to create a monthly page of articles to be selected by the user in the archive overview created from the sitemaps index. </p> <p> <em> The first sentence describes how the implementation was done later. But the second sentence was too optimistic. The sitemap, if not pepped up with extensions, does not contain enough information to create monthly archive pages from it. </em> </p> <h3> HTML5 Article (with prepared location for portal injection </h3> <p> <em> This is the article HTML draft. it contains a line with div id="main", which I planned to be the place where I place the portal part via Phyton. That div turned out to be unnecessary, instead an xml comment is now placed after body and before main tag as include instruction. The include is performed by the nginx web-server. </em> </p> <div class="sourceCode"> <pre class="sourceCode html"><code class="sourceCode html"><span><a aria-hidden="true" href="#cb2-1"></a><span class="dt">&lt;!DOCTYPE </span>html<span class="dt">&gt;</span></span> <span><a aria-hidden="true" href="#cb2-2"></a><span class="kw">&lt;html</span><span class="ot"> lang=</span><span class="st">"de-DE"</span><span class="ot"> xml:lang=</span><span class="st">"de-DE"</span><span class="ot"> xmlns=</span><span class="st">"http://www.w3.org/1999/xhtml"</span><span class="kw">&gt;</span></span> <span><a aria-hidden="true" href="#cb2-3"></a> <span class="kw">&lt;head&gt;</span></span> <span><a aria-hidden="true" href="#cb2-4"></a> <span class="kw">&lt;meta</span><span class="ot"> charset=</span><span class="st">"utf-8"</span><span class="kw">/&gt;</span></span> <span><a aria-hidden="true" href="#cb2-5"></a> <span class="kw">&lt;meta</span><span class="ot"> content=</span><span class="st">"pandoc, fs-commit-msg-hook 1.0"</span><span class="ot"> name=</span><span class="st">"generator"</span><span class="kw">/&gt;</span></span> <span><a aria-hidden="true" href="#cb2-6"></a> <span class="kw">&lt;meta</span><span class="ot"> content=</span><span class="st">"width=device-width, initial-scale=1.0, user-scalable=yes"</span> </span> <span><a aria-hidden="true" href="#cb2-7"></a><span class="ot"> name=</span><span class="st">"viewport"</span><span class="kw">/&gt;</span></span> <span><a aria-hidden="true" href="#cb2-8"></a> <span class="kw">&lt;meta</span><span class="ot"> content=</span><span class="st">"2022-01-19T16:05:43"</span><span class="ot"> property=</span><span class="st">"article:modified_time"</span><span class="kw">/&gt;</span></span> <span><a aria-hidden="true" href="#cb2-9"></a> <span class="kw">&lt;meta</span><span class="ot"> content=</span><span class="st">"2020-10-15 09:49:27"</span><span class="ot"> property=</span><span class="st">"article:published_time"</span><span class="kw">/&gt;</span></span> <span><a aria-hidden="true" href="#cb2-10"></a> <span class="kw">&lt;meta</span><span class="ot"> content=</span><span class="st">"Frank Siebert"</span><span class="ot"> property=</span><span class="st">"article:author"</span><span class="kw">/&gt;</span></span> <span><a aria-hidden="true" href="#cb2-11"></a> <span class="kw">&lt;meta</span><span class="ot"> content=</span><span class="st">"Idee"</span><span class="ot"> property=</span><span class="st">"og:site_name"</span><span class="kw">/&gt;</span></span> <span><a aria-hidden="true" href="#cb2-12"></a> <span class="kw">&lt;meta</span><span class="ot"> content=</span><span class="st">"de-DE"</span><span class="ot"> property=</span><span class="st">"og:locale"</span><span class="kw">/&gt;</span></span> <span><a aria-hidden="true" href="#cb2-13"></a> <span class="kw">&lt;meta</span><span class="ot"> content=</span><span class="st">"The Article Title"</span><span class="ot"> property=</span><span class="st">"og:title"</span><span class="kw">/&gt;</span></span> <span><a aria-hidden="true" href="#cb2-14"></a> <span class="kw">&lt;link</span><span class="ot"> href=</span><span class="st">"../website/css/fs.css"</span><span class="ot"> rel=</span><span class="st">"stylesheet"</span><span class="kw">/&gt;</span></span> <span><a aria-hidden="true" href="#cb2-15"></a> <span class="kw">&lt;title&gt;</span></span> <span><a aria-hidden="true" href="#cb2-16"></a> The Article Title</span> <span><a aria-hidden="true" href="#cb2-17"></a> <span class="kw">&lt;/title&gt;</span></span> <span><a aria-hidden="true" href="#cb2-18"></a> <span class="kw">&lt;style&gt;</span></span> <span><a aria-hidden="true" href="#cb2-19"></a> &lt;!-- styles by pandoc --<span class="op">&gt;</span></span> <span><a aria-hidden="true" href="#cb2-20"></a> code{<span class="kw">white-space</span>: <span class="dv">pre-wrap</span><span class="op">;</span>}</span> <span><a aria-hidden="true" href="#cb2-21"></a> span<span class="fu">.smallcaps</span>{<span class="kw">font-variant</span>: <span class="dv">small-caps</span><span class="op">;</span>}</span> <span><a aria-hidden="true" href="#cb2-22"></a> span<span class="fu">.underline</span>{<span class="kw">text-decoration</span>: <span class="dv">underline</span><span class="op">;</span>}</span> <span><a aria-hidden="true" href="#cb2-23"></a> div<span class="fu">.column</span>{<span class="kw">display</span>: <span class="dv">inline-block</span><span class="op">;</span> <span class="kw">vertical-align</span>: <span class="dv">top</span><span class="op">;</span> <span class="kw">width</span>: <span class="dv">50</span><span class="dt">%</span><span class="op">;</span>}</span> <span><a aria-hidden="true" href="#cb2-24"></a> div<span class="fu">.hanging-indent</span>{<span class="kw">margin-left</span>: <span class="dv">1.5</span><span class="dt">em</span><span class="op">;</span> <span class="kw">text-indent</span>: <span class="dv">-1.5</span><span class="dt">em</span><span class="op">;</span>}</span> <span><a aria-hidden="true" href="#cb2-25"></a> ul<span class="fu">.task-list</span>{<span class="kw">list-style</span>: <span class="dv">none</span><span class="op">;</span>}</span> <span><a aria-hidden="true" href="#cb2-26"></a> <span class="kw">&lt;/style&gt;</span></span> <span><a aria-hidden="true" href="#cb2-27"></a> <span class="kw">&lt;/head&gt;</span></span> <span><a aria-hidden="true" href="#cb2-28"></a> <span class="kw">&lt;body&gt;</span></span> <span><a aria-hidden="true" href="#cb2-29"></a> <span class="kw">&lt;div</span><span class="ot"> id=</span><span class="st">"main"</span><span class="kw">&gt;</span> <span class="co">&lt;!-- Portal injection parent --&gt;</span></span> <span><a aria-hidden="true" href="#cb2-30"></a> <span class="kw">&lt;main&gt;</span></span> <span><a aria-hidden="true" href="#cb2-31"></a> <span class="kw">&lt;article&gt;</span></span> <span><a aria-hidden="true" href="#cb2-32"></a> <span class="kw">&lt;header&gt;</span></span> <span><a aria-hidden="true" href="#cb2-33"></a> <span class="kw">&lt;h1&gt;</span></span> <span><a aria-hidden="true" href="#cb2-34"></a> The Article Title</span> <span><a aria-hidden="true" href="#cb2-35"></a> <span class="kw">&lt;/h1&gt;</span></span> <span><a aria-hidden="true" href="#cb2-36"></a> <span class="kw">&lt;div&gt;</span></span> <span><a aria-hidden="true" href="#cb2-37"></a> <span class="kw">&lt;time</span><span class="ot"> datetime=</span><span class="st">"yyyy-MM-dd hh:mm:ss"</span><span class="ot"> pubdate=</span><span class="st">"true"</span><span class="kw">&gt;</span></span> <span><a aria-hidden="true" href="#cb2-38"></a> yyyy-MM-dd</span> <span><a aria-hidden="true" href="#cb2-39"></a> <span class="kw">&lt;/time&gt;</span></span> <span><a aria-hidden="true" href="#cb2-40"></a> <span class="kw">&lt;address&gt;</span></span> <span><a aria-hidden="true" href="#cb2-41"></a> Author Name</span> <span><a aria-hidden="true" href="#cb2-42"></a> <span class="kw">&lt;/address&gt;</span></span> <span><a aria-hidden="true" href="#cb2-43"></a> <span class="co">&lt;!-- probably PDF download link location --&gt;</span></span> <span><a aria-hidden="true" href="#cb2-44"></a> <span class="kw">&lt;/div&gt;</span></span> <span><a aria-hidden="true" href="#cb2-45"></a> <span class="co">&lt;!-- probably audio player location --&gt;</span></span> <span><a aria-hidden="true" href="#cb2-46"></a> <span class="kw">&lt;/header&gt;</span></span> <span><a aria-hidden="true" href="#cb2-47"></a> <span class="co">&lt;!-- article content (paragraphs, toc, headlines (&lt; h1), images, footnotes)</span></span> <span><a aria-hidden="true" href="#cb2-48"></a><span class="co"> &lt;/article&gt;</span></span> <span><a aria-hidden="true" href="#cb2-49"></a><span class="co"> &lt;/main&gt;</span></span> <span><a aria-hidden="true" href="#cb2-50"></a><span class="co"> &lt;/div&gt;</span></span> <span><a aria-hidden="true" href="#cb2-51"></a><span class="co"> &lt;/body&gt;</span></span> <span><a aria-hidden="true" href="#cb2-52"></a><span class="co">&lt;/html&gt;</span></span></code></pre> </div> <h3> Scenario </h3> <p> <em> This is still specification before the development started. It seems to be repetition, but it is not, because it contains a decision not made before. But it contains also things which should not be written into a scenario. If you wear the hut of the customer, the architect and the developer all in one person, then there is no second person taking care to write the specification correctly. </em> </p> <p> Obviously I'm not completely sure about the scenario. But if it changes over time, then that's a common thing often seen also in other projects. </p> <p> <em> Decision: </em> Material collection and writing happens in a MediaWiki. The export may happen with the current export tool, or it might happen with a Python based Wiki-page parser <sup> ( 4 ) </sup> </p> <p> <em> Aspiration: </em> I want to have useful meta tags generated during the commit. The Author Information and the date of publishing, the date of update and a documentation of changes should be automatically processed into an HTML meta information and into a standardized representation in the visible text. This is important to ensure that corrections are processed in a transparent, reader friendly manner. </p> <p> For the sake of usability, a commit should not lead to an automatic release of the text. The commit is for the draft version only. This basically means that I will work in branches and the final publishing is done with a merge into the master branch, </p> <p> <em> Forget this branching explanation. Yes, commits generate HTML for review, nut branches are not necessary and therefore also no merge. The final publishing is done via the git push command, as explained earlier. </em> </p> <p> During the merge into the master branch on the server: </p> <ul class="incremental"> <li> The HTML is processed to contain a header, a style-sheet, meta information and change markers if the document had not been merged for the first time. </li> <li> The page is fed into a search engine for indexing </li> <li> The page is fed into an rss feed generator to provide a new entry in the rss feed. </li> <li> The page is fed into a sitemap generator to provide an updated sitemap </li> </ul> <p> <em> In the end everything is done during commit, with the exception of search engine indexing, which can be done by the YaCy-Search Engine only after publishing. </em> </p> <h4> Search Index </h4> <p> <em> And even more specification, if you like to call it such. Probably it is more am investigation of options regarding search. </em> </p> <p> For Python some search index implementations exist. There is one Doit-Yourself-Example by Bart de Goede <sup> ( 5 ) </sup> , at the opposite end of the spectrum we find Gensim <sup> ( 6 ) </sup> , which probably can do much more than just index, and there is a module named Whoosh <sup> ( 7 ) </sup> , and there is rank-bm25 <sup> ( 8 ) </sup> , which implements multiple variants of the bm25 search algorithm. </p> <p> I tend to base my search on the latter module, and I'm curious how well this will work. </p> <h4> Search Index Related Learning Material </h4> <ul class="incremental"> <li> "Improvements to BM25 and Language Models Examined" <sup> ( 9 ) </sup> </li> <li> "What is the difference between Okapi bm25 and NMSLIB?" <sup> ( 10 ) </sup> </li> </ul> <h2> Implementation </h2> <p> <em> With the chapter "Toolchain" the implementation started. The chapters are sorted by initial implementation sequence. </em> </p> <h3> Toolchain </h3> <h4> MediaWiki-Tools git ~/projects/wikitools/ </h4> <p> This is the git used to implement the tools to access the MediaWiki instances. The default instance used is my private sammel-wiki, but there is no reason why I should not also access my installations-wiki to create postings from it. Well, the language probably, since my blog is in German language. </p> <p> <em> The language problem is solved with the creation of two sites, one in German and one in English </em> </p> <p> <em> The wikitools project git existed already, hosting the code for a program "reference.py" to scrape Webpages for the creation of a reference tag stored in a new created reference wiki-page for the scraped website. </em> </p> <p> Related to WordPress-replacement-project is the new tool "export.py", which extracts a wiki-page with expanded templates as MediaWiki markup file. The output of this tool is placed into a configured directory, which is, how convenient, the authoring directory of the authoring git. </p> <h4> Authoring git ~/projects/idee </h4> <p> Authoring takes place in the folder <strong> ./author/ </strong> , for a start via MediaWiki files. This means I can also use my MediaWiki instances for authoring and afterwards use the export.py from my wikitools to save the article as "authoring source" into this folder. </p> <p> During <em> git commit </em> the <strong> commit-msg </strong> hook implementation checks for committed <strong> ./author/*.mediawiki </strong> files to be processed. </p> <p> <em> TODO: Consider options to structure this in <strong> ./author/yyyy/MM/ </strong> folders. </em> <br/> <em> DONE: The result is NO. </em> </p> <p> The respective mediawiki files are processed into plain HTML by the method <strong> pandocmw() </strong> , pandoc beeing the conversion tool used. </p> <p> Processing results are stored in: </p> <ul class="incremental"> <li> <strong> ./plain/ </strong> - the plain html files </li> <li> <strong> ./website/image/ </strong> - the image files </li> </ul> <p> The plain html files are further processed into PDF files by the method <strong> pandoc-html-pdf() </strong> , pandoc again beeing the conversion tool used. </p> <p> Processing results are stored in: </p> <ul class="incremental"> <li> <strong> ./website/pdf/ </strong> - the pdf files </li> </ul> <p> Created HTML and PDF files open automatically in Firefox for review. </p> <p> At this point of the processing the Pictures as well as PDF files are supposed to be final, at least after some possible round trips of review and correction. </p> <p> <em> TODO: check the creation of an asset list for each authored article, to prevent the deployment of the article without the corresponding pictures and audios. </em> <br/> <em> DONE: The result is NO. The risk of this to happen is minimal and it is also very fast corrected, if it should happen </em> </p> <p> <em> TODO: one option to simplify the commit of all required assets is the creation of an asset list for the committed article. Probably this can be in the form of a prepared commit message for these assets. </em> <br/> <em> DONE: The result is NO. All git comments can be issued in the root of the git repository, making sure everything is included.Just "git add .", "git commit", that is simple enough. </em> </p> <p> During <em> git commit </em> the <strong> commit-msg </strong> hook implementation checks for committed <strong> ./plain/*.html </strong> files to be processed. </p> <p> The respective plain html pages are processed by the method <strong> injectportal() </strong> into webpages of a website via: </p> <ul class="incremental"> <li> the injection of the portal into the page </li> <li> the placement of the PDF accesss link, if a pdf with the same name exists </li> <li> the placement of the HTML5 audio player, if an audio with the same name exists </li> </ul> <p> The processing results are stored in: </p> <ul class="incremental"> <li> <strong> ./website/article/ </strong> - webpages containing articles </li> </ul> <p> Then the website is updated via: </p> <ul class="incremental"> <li> the update of the sitemap,xml </li> <li> the update of the feed.xml </li> <li> the update of the index.html featuring the latest post as first entry. </li> </ul> <p> Processing results are stored in: </p> <ul class="incremental"> <li> <strong> ./website/ </strong> - Entry point of the web representation </li> </ul> <p> <strong> ./website/ </strong> contains all website related content. </p> <p> Resulting in the structure: </p> <div class="sourceCode"> <pre class="sourceCode bash"><code class="sourceCode bash"><span><a aria-hidden="true" href="#cb3-1"></a> <span class="ex">website</span></span> <span><a aria-hidden="true" href="#cb3-2"></a> ├── <span class="ex">article</span></span> <span><a aria-hidden="true" href="#cb3-3"></a> ├── <span class="ex">css</span></span> <span><a aria-hidden="true" href="#cb3-4"></a> ├── <span class="ex">media</span></span> <span><a aria-hidden="true" href="#cb3-5"></a> └── <span class="ex">pdf</span></span></code></pre> </div> <p> Privacy Statement and such administrative overhead will be deployed as article and linked as special page in the portal. </p> <p> <em> TODO: Think further about URL compatibility with the current WordPress site. </em> <br/> <em> Option: I have exported data from the mysql table, which enables the creation of a redirect list. </em> <br/> <em> DONE: Compatibiliy is a must and it is ensured. It is important not only for the redirect, but also to preserve the correct dates of the articles. The stem of the article page serves as urn to access the article data in the publishing list. </em> </p> <p> This git is a client git, connected to the server git. Deployment to the server is done via <strong> git merge </strong> . </p> <h4> Autoring git - Configuration </h4> <p> The configuration of the git gets stored and versioned in the git repository. The path to the configuration is <strong> ./config/ </strong> . The implemented hooks are part of the configuration and are stored in <strong> ./config/hooks/ </strong> , the preexisting examples are stored in <strong> ./config/hooks/samples/ </strong> . </p> <h5> ./config/gitconfig </h5> <div class="sourceCode"> <pre class="sourceCode bash"><code class="sourceCode bash"><span><a aria-hidden="true" href="#cb4-1"></a><span class="co">#!/bin/bash</span></span> <span><a aria-hidden="true" href="#cb4-2"></a><span class="co"># configure the wiki</span></span> <span><a aria-hidden="true" href="#cb4-3"></a></span> <span><a aria-hidden="true" href="#cb4-4"></a><span class="co"># We develop hooks and want version control for that</span></span> <span><a aria-hidden="true" href="#cb4-5"></a><span class="fu">git</span> config --local core.hooksPath ./config/hooks</span> <span><a aria-hidden="true" href="#cb4-6"></a></span> <span><a aria-hidden="true" href="#cb4-7"></a><span class="co"># We want to easy reading of German äüö in the file names</span></span> <span><a aria-hidden="true" href="#cb4-8"></a><span class="fu">git</span> config --local core.quotepath off</span> <span><a aria-hidden="true" href="#cb4-9"></a></span> <span><a aria-hidden="true" href="#cb4-10"></a><span class="co"># We provide some variable override options in a modified template</span></span> <span><a aria-hidden="true" href="#cb4-11"></a><span class="fu">git</span> config --local commit.template ./config/commit-message</span> <span><a aria-hidden="true" href="#cb4-12"></a></span> <span><a aria-hidden="true" href="#cb4-13"></a><span class="co"># We process the files committed and need absolute file paths from $GIT_DIR</span></span> <span><a aria-hidden="true" href="#cb4-14"></a><span class="co"># written into the commit-message</span></span> <span><a aria-hidden="true" href="#cb4-15"></a><span class="fu">git</span> config --local status.relativePaths false</span></code></pre> </div> <p> The configuration settings are applied with the above shown bash script. </p> <h4> Folder Structure </h4> <p> <em> To give you complete overview of the final git folder structure, here it is: </em> </p> <div class="sourceCode"> <pre class="sourceCode bash"><code class="sourceCode bash"><span><a aria-hidden="true" href="#cb5-1"></a><span class="ex">frank</span> @Asimov:~/projects/idee$ tree -d</span> <span><a aria-hidden="true" href="#cb5-2"></a><span class="ex">.</span></span> <span><a aria-hidden="true" href="#cb5-3"></a>├── <span class="ex">author</span></span> <span><a aria-hidden="true" href="#cb5-4"></a>├── <span class="fu">bash</span></span> <span><a aria-hidden="true" href="#cb5-5"></a>├── <span class="ex">config</span></span> <span><a aria-hidden="true" href="#cb5-6"></a>│ └── <span class="ex">hooks</span></span> <span><a aria-hidden="true" href="#cb5-7"></a>│ └── <span class="ex">samples</span></span> <span><a aria-hidden="true" href="#cb5-8"></a>├── <span class="ex">generator</span></span> <span><a aria-hidden="true" href="#cb5-9"></a>│ └── <span class="ex">__pycache__</span></span> <span><a aria-hidden="true" href="#cb5-10"></a>├── <span class="ex">nginx</span></span> <span><a aria-hidden="true" href="#cb5-11"></a>├── <span class="ex">plain</span></span> <span><a aria-hidden="true" href="#cb5-12"></a>├── <span class="bu">test</span></span> <span><a aria-hidden="true" href="#cb5-13"></a>└── <span class="ex">website</span></span> <span><a aria-hidden="true" href="#cb5-14"></a> ├── <span class="ex">archive</span></span> <span><a aria-hidden="true" href="#cb5-15"></a> ├── <span class="ex">article</span></span> <span><a aria-hidden="true" href="#cb5-16"></a> ├── <span class="ex">audio</span></span> <span><a aria-hidden="true" href="#cb5-17"></a> ├── <span class="ex">css</span></span> <span><a aria-hidden="true" href="#cb5-18"></a> ├── <span class="fu">env</span></span> <span><a aria-hidden="true" href="#cb5-19"></a> │ └── <span class="ex">bootstrap</span></span> <span><a aria-hidden="true" href="#cb5-20"></a> │ └── <span class="ex">css</span></span> <span><a aria-hidden="true" href="#cb5-21"></a> ├── <span class="ex">files</span></span> <span><a aria-hidden="true" href="#cb5-22"></a> ├── <span class="ex">image</span></span> <span><a aria-hidden="true" href="#cb5-23"></a> ├── <span class="ex">js</span></span> <span><a aria-hidden="true" href="#cb5-24"></a> ├── <span class="ex">pdf</span></span> <span><a aria-hidden="true" href="#cb5-25"></a> ├── <span class="ex">portal</span></span> <span><a aria-hidden="true" href="#cb5-26"></a> ├── <span class="ex">qrcode</span></span> <span><a aria-hidden="true" href="#cb5-27"></a> └── <span class="ex">sitemap</span></span> <span><a aria-hidden="true" href="#cb5-28"></a></span> <span><a aria-hidden="true" href="#cb5-29"></a><span class="ex">25</span> directories</span></code></pre> </div> <p> The folder website/env/bootstrap/css contains the CSS to format the YaCy search result page. In the current implementation I refrained from merging somehow the portal header into that page. Probably I could convince YaCy to return the results as XML and to render a portal page via XSLT, There is room for improvement. </p> <p> The folder /generator contains the Python part of the project. If you consider software development and content development as two different projects, then the git contains the development project as a nested project. </p> <p> To get a re-usable software product from this, it is required to separate the projects into separate git repositories. But for initial development the combined repository saved a lot of time. </p> <h4> Server git: /home/git/idee.git </h4> <p> This is the server git, and as such it is without work directory. When the pushed changes had been merged, a hook needs to take care to write the content into the web-server directory. </p> <p> Not all content, but the content belonging the website as documented above, needs to be processed. </p> <p> <em> The implementation uses a simple fetch by a client git located in the web-server directory. </em> </p> <h4> Export </h4> <p> There are tools you can install in your MediaWiki instance to support the export to PDF and to HTML. However, these require you to change the wiki installation and the result might not be tailored to your need, </p> <p> To publish articles I decided to write my own export tool, extracting a single page containing the composed article, with all included templates expanded. Mediawiki ships with a special page Special:Export, which uses the same API function as used in my implementation. The API call was already implemented in the module mwclient.py, but to get it to function with long pages I had to change the respective GET request into a POST request. I informed the developers via their Github issue tracker with the message "expandtemplates should use "post" instead of "get" · Issue -272 · mwclient-mwclient" <sup> ( 11 ) </sup> . </p> <p> The current implementation of the export.py is by no means beautified. I just began with python and I'm pretty much unaware of established coding conventions. As I progress with python, things will get nicer over time. </p> <h4> export.py </h4> <p> <em> wikitools git repository </em> </p> <p> <em> I let a lot of comments survive, which document also some of the wrong ideas I had. E.g. during the migration I used the Pandoc feature to download the images from their web location on my WordPress installation. Pandoc creates own filenames for the pictures via SHA1 hash during this process. </em> </p> <p> <em> Afterwards I thought I should change the export function to enable Pandoc also to download images from the Wiki. But the wiki requires authentication and overall the identifier for the image might Image:, File: Bild:, Datei:, and in additional languages you might get additional alternatives. I'm not sure whether Pandoc does address all those correctly, but at least that's SEP <sup> ( 12 ) </sup> as long I do not meddle myself in that soup. </em> </p> <p> <em> Now nothing is done in the export, the media download in Pandoc is deactivated and the image path is adjusted right after HTML creation at a point, where I meddled with that path anyhow already. </em> </p> <div class="sourceCode"> <pre class="sourceCode python"><code class="sourceCode python"><span><a aria-hidden="true" href="#cb6-1"></a><span class="co">#!/usr/bin/python3</span></span> <span><a aria-hidden="true" href="#cb6-2"></a><span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb6-3"></a><span class="co">Export MediaWiki Pages with expanded templates and page includes.</span></span> <span><a aria-hidden="true" href="#cb6-4"></a></span> <span><a aria-hidden="true" href="#cb6-5"></a><span class="co">@author: Frank Siebert</span></span> <span><a aria-hidden="true" href="#cb6-6"></a><span class="co">@website: https://idee.frank-siebert.de</span></span> <span><a aria-hidden="true" href="#cb6-7"></a><span class="co">@license: https://creativecommons.org/publicdomain/zero/1.0/deed.en</span></span> <span><a aria-hidden="true" href="#cb6-8"></a><span class="co">@date: 2022-03-15</span></span> <span><a aria-hidden="true" href="#cb6-9"></a></span> <span><a aria-hidden="true" href="#cb6-10"></a><span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb6-11"></a></span> <span><a aria-hidden="true" href="#cb6-12"></a><span class="im">import</span> sys</span> <span><a aria-hidden="true" href="#cb6-13"></a><span class="im">import</span> os</span> <span><a aria-hidden="true" href="#cb6-14"></a><span class="im">import</span> getopt</span> <span><a aria-hidden="true" href="#cb6-15"></a><span class="im">import</span> termios</span> <span><a aria-hidden="true" href="#cb6-16"></a><span class="im">import</span> fcntl</span> <span><a aria-hidden="true" href="#cb6-17"></a><span class="im">import</span> subprocess</span> <span><a aria-hidden="true" href="#cb6-18"></a><span class="im">import</span> time</span> <span><a aria-hidden="true" href="#cb6-19"></a><span class="im">import</span> re</span> <span><a aria-hidden="true" href="#cb6-20"></a></span> <span><a aria-hidden="true" href="#cb6-21"></a><span class="im">from</span> pathlib <span class="im">import</span> Path</span> <span><a aria-hidden="true" href="#cb6-22"></a><span class="im">import</span> configparser</span> <span><a aria-hidden="true" href="#cb6-23"></a><span class="im">from</span> termcolor <span class="im">import</span> colored</span> <span><a aria-hidden="true" href="#cb6-24"></a><span class="im">from</span> mwclient <span class="im">import</span> Site</span> <span><a aria-hidden="true" href="#cb6-25"></a><span class="im">from</span> mwclient.errors <span class="im">import</span> LoginError</span> <span><a aria-hidden="true" href="#cb6-26"></a></span> <span><a aria-hidden="true" href="#cb6-27"></a>HELPTEXT <span class="op">=</span> <span class="st">'Usage: export.py [-w </span><span class="ch">\'</span><span class="st">wiki</span><span class="ch">\'</span><span class="st">] </span><span class="ch">\'</span><span class="st">Page_Name</span><span class="ch">\'\n</span><span class="st">'</span>\</span> <span><a aria-hidden="true" href="#cb6-28"></a> <span class="st">'</span><span class="ch">\n</span><span class="st">'</span>\</span> <span><a aria-hidden="true" href="#cb6-29"></a> <span class="st">'-w </span><span class="ch">\'</span><span class="st">wiki</span><span class="ch">\'</span><span class="st"> Name the wiki to be used, using the section</span><span class="ch">\n</span><span class="st">'</span>\</span> <span><a aria-hidden="true" href="#cb6-30"></a> <span class="st">' name in the configuration file.</span><span class="ch">\n</span><span class="st">'</span>\</span> <span><a aria-hidden="true" href="#cb6-31"></a> <span class="st">'</span><span class="ch">\n</span><span class="st">'</span>\</span> <span><a aria-hidden="true" href="#cb6-32"></a> <span class="st">'Page_Name The page to be exported. In case of spaces either</span><span class="ch">\n</span><span class="st">'</span>\</span> <span><a aria-hidden="true" href="#cb6-33"></a> <span class="st">' surrounded by </span><span class="ch">\'</span><span class="st"> or with _ instead of spaces.</span><span class="ch">\n</span><span class="st">'</span></span> <span><a aria-hidden="true" href="#cb6-34"></a></span> <span><a aria-hidden="true" href="#cb6-35"></a></span> <span><a aria-hidden="true" href="#cb6-36"></a><span class="kw">def</span> askforkeypress(prompt, keylist, onerror):</span> <span><a aria-hidden="true" href="#cb6-37"></a> <span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb6-38"></a><span class="co"> Ask and wait for user input.</span></span> <span><a aria-hidden="true" href="#cb6-39"></a></span> <span><a aria-hidden="true" href="#cb6-40"></a><span class="co"> Parameters</span></span> <span><a aria-hidden="true" href="#cb6-41"></a><span class="co"> ----------</span></span> <span><a aria-hidden="true" href="#cb6-42"></a><span class="co"> prompt: Str</span></span> <span><a aria-hidden="true" href="#cb6-43"></a><span class="co"> The promt shown to the user to ask for input.</span></span> <span><a aria-hidden="true" href="#cb6-44"></a></span> <span><a aria-hidden="true" href="#cb6-45"></a><span class="co"> keylist: List</span></span> <span><a aria-hidden="true" href="#cb6-46"></a><span class="co"> A list of characters as possible input keys.</span></span> <span><a aria-hidden="true" href="#cb6-47"></a></span> <span><a aria-hidden="true" href="#cb6-48"></a><span class="co"> onerror: Object</span></span> <span><a aria-hidden="true" href="#cb6-49"></a><span class="co"> An object to return to the caller in case of an error.</span></span> <span><a aria-hidden="true" href="#cb6-50"></a><span class="co"> """</span></span> <span><a aria-hidden="true" href="#cb6-51"></a> fileno <span class="op">=</span> sys.stdin.fileno()</span> <span><a aria-hidden="true" href="#cb6-52"></a> oldterm <span class="op">=</span> termios.tcgetattr(fileno)</span> <span><a aria-hidden="true" href="#cb6-53"></a> newattr <span class="op">=</span> termios.tcgetattr(fileno)</span> <span><a aria-hidden="true" href="#cb6-54"></a> newattr[<span class="dv">3</span>] <span class="op">=</span> newattr[<span class="dv">3</span>] <span class="op">&amp;</span> <span class="op">~</span>termios.ICANON <span class="op">&amp;</span> <span class="op">~</span>termios.ECHO</span> <span><a aria-hidden="true" href="#cb6-55"></a> termios.tcsetattr(fileno, termios.TCSANOW, newattr)</span> <span><a aria-hidden="true" href="#cb6-56"></a></span> <span><a aria-hidden="true" href="#cb6-57"></a> oldflags <span class="op">=</span> fcntl.fcntl(fileno, fcntl.F_GETFL)</span> <span><a aria-hidden="true" href="#cb6-58"></a> fcntl.fcntl(fileno, fcntl.F_SETFL, oldflags <span class="op">|</span> os.O_NONBLOCK)</span> <span><a aria-hidden="true" href="#cb6-59"></a></span> <span><a aria-hidden="true" href="#cb6-60"></a> <span class="co"># stay in the same line to wait for input</span></span> <span><a aria-hidden="true" href="#cb6-61"></a> <span class="bu">print</span>(prompt, end<span class="op">=</span><span class="st">' '</span>, flush<span class="op">=</span><span class="va">True</span>)</span> <span><a aria-hidden="true" href="#cb6-62"></a> char <span class="op">=</span> <span class="va">None</span></span> <span><a aria-hidden="true" href="#cb6-63"></a> <span class="cf">try</span>:</span> <span><a aria-hidden="true" href="#cb6-64"></a> <span class="cf">while</span> char <span class="kw">not</span> <span class="kw">in</span> keylist:</span> <span><a aria-hidden="true" href="#cb6-65"></a> <span class="cf">try</span>:</span> <span><a aria-hidden="true" href="#cb6-66"></a> char <span class="op">=</span> sys.stdin.read(<span class="dv">1</span>)</span> <span><a aria-hidden="true" href="#cb6-67"></a> time.sleep(<span class="fl">.1</span>)</span> <span><a aria-hidden="true" href="#cb6-68"></a> <span class="cf">except</span> <span class="pp">IOError</span>:</span> <span><a aria-hidden="true" href="#cb6-69"></a> char <span class="op">=</span> onerror</span> <span><a aria-hidden="true" href="#cb6-70"></a> <span class="cf">except</span> <span class="pp">ValueError</span>:</span> <span><a aria-hidden="true" href="#cb6-71"></a> char <span class="op">=</span> onerror</span> <span><a aria-hidden="true" href="#cb6-72"></a> <span class="bu">print</span>(char)</span> <span><a aria-hidden="true" href="#cb6-73"></a> <span class="cf">finally</span>:</span> <span><a aria-hidden="true" href="#cb6-74"></a> termios.tcsetattr(fileno, termios.TCSAFLUSH, oldterm)</span> <span><a aria-hidden="true" href="#cb6-75"></a> fcntl.fcntl(fileno, fcntl.F_SETFL, oldflags)</span> <span><a aria-hidden="true" href="#cb6-76"></a> <span class="cf">return</span> char</span> <span><a aria-hidden="true" href="#cb6-77"></a></span> <span><a aria-hidden="true" href="#cb6-78"></a></span> <span><a aria-hidden="true" href="#cb6-79"></a><span class="kw">def</span> export():</span> <span><a aria-hidden="true" href="#cb6-80"></a> <span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb6-81"></a><span class="co"> Export the page as mediawiki markup.</span></span> <span><a aria-hidden="true" href="#cb6-82"></a></span> <span><a aria-hidden="true" href="#cb6-83"></a><span class="co"> Uses the API used by Special:Export including templates</span></span> <span><a aria-hidden="true" href="#cb6-84"></a><span class="co"> https://wiki.frank-siebert.de/script-inst/index.php?title=Special:Export</span></span> <span><a aria-hidden="true" href="#cb6-85"></a></span> <span><a aria-hidden="true" href="#cb6-86"></a><span class="co"> For long pages the use of POST is important. I changed the library function</span></span> <span><a aria-hidden="true" href="#cb6-87"></a><span class="co"> in this regard. A respective fix awaits its merge into the mwclient module.</span></span> <span><a aria-hidden="true" href="#cb6-88"></a><span class="co"> Request Type: POST</span></span> <span><a aria-hidden="true" href="#cb6-89"></a><span class="co"> Request Parameters:</span></span> <span><a aria-hidden="true" href="#cb6-90"></a><span class="co"> catname=&amp;pages=Replacing+Wordpress&amp;curonly=1&amp;templates=1&amp;wpDownload=</span></span> <span><a aria-hidden="true" href="#cb6-91"></a><span class="co"> 1&amp;wpEditToken=%2B%5C&amp;title=Special%3AExport</span></span> <span><a aria-hidden="true" href="#cb6-92"></a><span class="co"> """</span></span> <span><a aria-hidden="true" href="#cb6-93"></a> <span class="bu">print</span>(pagename)</span> <span><a aria-hidden="true" href="#cb6-94"></a> <span class="co"># Login</span></span> <span><a aria-hidden="true" href="#cb6-95"></a> host <span class="op">=</span> config[CFGSECTION][<span class="st">'Host'</span>]</span> <span><a aria-hidden="true" href="#cb6-96"></a> scriptpath <span class="op">=</span> config[CFGSECTION][<span class="st">'ScriptPath'</span>]</span> <span><a aria-hidden="true" href="#cb6-97"></a> user <span class="op">=</span> config[CFGSECTION][<span class="st">'User'</span>]</span> <span><a aria-hidden="true" href="#cb6-98"></a> password <span class="op">=</span> config[CFGSECTION][<span class="st">'Passwort'</span>]</span> <span><a aria-hidden="true" href="#cb6-99"></a> exportdir <span class="op">=</span> config[CFGSECTION][<span class="st">'ExportDirectory'</span>]</span> <span><a aria-hidden="true" href="#cb6-100"></a> references <span class="op">=</span> config[CFGSECTION][<span class="st">'References'</span>]</span> <span><a aria-hidden="true" href="#cb6-101"></a></span> <span><a aria-hidden="true" href="#cb6-102"></a> site <span class="op">=</span> Site(host, path<span class="op">=</span>scriptpath)</span> <span><a aria-hidden="true" href="#cb6-103"></a> <span class="cf">try</span>:</span> <span><a aria-hidden="true" href="#cb6-104"></a> site.login(user, password)</span> <span><a aria-hidden="true" href="#cb6-105"></a> site.get_token(<span class="st">"login"</span>)</span> <span><a aria-hidden="true" href="#cb6-106"></a> <span class="cf">except</span> LoginError:</span> <span><a aria-hidden="true" href="#cb6-107"></a> <span class="bu">print</span>(<span class="st">"login failed"</span>)</span> <span><a aria-hidden="true" href="#cb6-108"></a></span> <span><a aria-hidden="true" href="#cb6-109"></a> <span class="cf">for</span> result <span class="kw">in</span> site.search(pagename, what<span class="op">=</span><span class="st">'title'</span>):</span> <span><a aria-hidden="true" href="#cb6-110"></a> <span class="bu">print</span>(result)</span> <span><a aria-hidden="true" href="#cb6-111"></a> page <span class="op">=</span> site.pages.get(pagename)</span> <span><a aria-hidden="true" href="#cb6-112"></a> <span class="cf">if</span> page:</span> <span><a aria-hidden="true" href="#cb6-113"></a> <span class="bu">print</span>(page)</span> <span><a aria-hidden="true" href="#cb6-114"></a> <span class="co"># expand = page.templates().count &gt; 0</span></span> <span><a aria-hidden="true" href="#cb6-115"></a> <span class="co"># might fail if client.py is updated, because</span></span> <span><a aria-hidden="true" href="#cb6-116"></a> <span class="co"># def expandtemplates(self, text, title=None, generatexml=False)</span></span> <span><a aria-hidden="true" href="#cb6-117"></a> <span class="co"># sould use post instead of get</span></span> <span><a aria-hidden="true" href="#cb6-118"></a> wikitext <span class="op">=</span> page.text(section<span class="op">=</span><span class="va">None</span>,</span> <span><a aria-hidden="true" href="#cb6-119"></a> expandtemplates<span class="op">=</span><span class="va">True</span>,</span> <span><a aria-hidden="true" href="#cb6-120"></a> cache<span class="op">=</span><span class="va">True</span>,</span> <span><a aria-hidden="true" href="#cb6-121"></a> slot<span class="op">=</span><span class="st">'main'</span>)</span> <span><a aria-hidden="true" href="#cb6-122"></a></span> <span><a aria-hidden="true" href="#cb6-123"></a> <span class="co"># load page until it is no longer changeing</span></span> <span><a aria-hidden="true" href="#cb6-124"></a> <span class="co"># stop waiting after 5 seconds</span></span> <span><a aria-hidden="true" href="#cb6-125"></a> webpage <span class="op">=</span> wikitext</span> <span><a aria-hidden="true" href="#cb6-126"></a> time.sleep(<span class="fl">.5</span>)</span> <span><a aria-hidden="true" href="#cb6-127"></a> i <span class="op">=</span> <span class="dv">10</span></span> <span><a aria-hidden="true" href="#cb6-128"></a> <span class="cf">while</span> <span class="bu">len</span>(webpage) <span class="op">!=</span> <span class="bu">len</span>(wikitext) <span class="kw">and</span> i <span class="op">&gt;</span> <span class="dv">0</span>:</span> <span><a aria-hidden="true" href="#cb6-129"></a> webpage <span class="op">=</span> wikitext</span> <span><a aria-hidden="true" href="#cb6-130"></a> time.sleep(<span class="fl">.5</span>)</span> <span><a aria-hidden="true" href="#cb6-131"></a> i <span class="op">-=</span> <span class="dv">1</span></span> <span><a aria-hidden="true" href="#cb6-132"></a> <span class="co"># make clear, which one is not used from now on</span></span> <span><a aria-hidden="true" href="#cb6-133"></a> webpage <span class="op">=</span> <span class="va">None</span></span> <span><a aria-hidden="true" href="#cb6-134"></a></span> <span><a aria-hidden="true" href="#cb6-135"></a> <span class="co"># Later we might need a &lt;reference/&gt; tag to identify the</span></span> <span><a aria-hidden="true" href="#cb6-136"></a> <span class="co"># location where footnotes shall be placed.</span></span> <span><a aria-hidden="true" href="#cb6-137"></a> <span class="co"># Lets check non-existance of the tag and existance of the</span></span> <span><a aria-hidden="true" href="#cb6-138"></a> <span class="co"># reference section.</span></span> <span><a aria-hidden="true" href="#cb6-139"></a> <span class="co"># Insert the tag now, if it is not present at the expected place.</span></span> <span><a aria-hidden="true" href="#cb6-140"></a> <span class="co"># In vim the regex :%s:^=.*Fußnoten.*=\n&lt;references.+/&gt; finds</span></span> <span><a aria-hidden="true" href="#cb6-141"></a> <span class="co"># the german footnotes with the reference tag in the next line.</span></span> <span><a aria-hidden="true" href="#cb6-142"></a></span> <span><a aria-hidden="true" href="#cb6-143"></a> <span class="co"># strip down to headline text only</span></span> <span><a aria-hidden="true" href="#cb6-144"></a> ref_cfg <span class="op">=</span> references.strip().strip(<span class="st">'='</span>).strip()</span> <span><a aria-hidden="true" href="#cb6-145"></a></span> <span><a aria-hidden="true" href="#cb6-146"></a> <span class="co"># I tried repr(pattern) to get the raw string, but got it with</span></span> <span><a aria-hidden="true" href="#cb6-147"></a> <span class="co"># ' at the start and end</span></span> <span><a aria-hidden="true" href="#cb6-148"></a> <span class="co"># repr(pattern)[1:-1] would strip them off, but this is easier</span></span> <span><a aria-hidden="true" href="#cb6-149"></a> <span class="co"># to read and to write</span></span> <span><a aria-hidden="true" href="#cb6-150"></a></span> <span><a aria-hidden="true" href="#cb6-151"></a> <span class="co"># requires re.M</span></span> <span><a aria-hidden="true" href="#cb6-152"></a> r_ref_cfg_pattern <span class="op">=</span> <span class="vs">r""</span> <span class="op">+</span> <span class="st">"</span><span class="ch">\n</span><span class="st">=.*</span><span class="sc">{}</span><span class="st">.*=.*"</span>.<span class="bu">format</span>(ref_cfg)</span> <span><a aria-hidden="true" href="#cb6-153"></a> <span class="co"># a negative lookahead for the reference tag</span></span> <span><a aria-hidden="true" href="#cb6-154"></a> r_nreference <span class="op">=</span> <span class="vs">r""</span> <span class="op">+</span> <span class="st">"(?!</span><span class="ch">\n</span><span class="st">.*&lt;references.*/&gt;.*$)"</span></span> <span><a aria-hidden="true" href="#cb6-155"></a> <span class="co"># ok if footnote header found without reference tag</span></span> <span><a aria-hidden="true" href="#cb6-156"></a> r_okpattern <span class="op">=</span> <span class="vs">r""</span> <span class="op">+</span> r_ref_cfg_pattern <span class="op">+</span> r_nreference</span> <span><a aria-hidden="true" href="#cb6-157"></a></span> <span><a aria-hidden="true" href="#cb6-158"></a> okcheck <span class="op">=</span> re.<span class="bu">compile</span>(r_okpattern)</span> <span><a aria-hidden="true" href="#cb6-159"></a> exists <span class="op">=</span> okcheck.search(wikitext)</span> <span><a aria-hidden="true" href="#cb6-160"></a> <span class="cf">if</span> exists:</span> <span><a aria-hidden="true" href="#cb6-161"></a> <span class="bu">print</span>(colored(<span class="st">'</span><span class="ch">\n</span><span class="st">WARNING:'</span>, <span class="st">'yellow'</span>),</span> <span><a aria-hidden="true" href="#cb6-162"></a> <span class="st">"&lt;references/&gt; tag is not in the expected location"</span>)</span> <span><a aria-hidden="true" href="#cb6-163"></a> <span class="bu">print</span>(<span class="st">"Automatic insertion of the tag takes place."</span>)</span> <span><a aria-hidden="true" href="#cb6-164"></a> wikitext <span class="op">=</span> okcheck.sub(exists.group(<span class="dv">0</span>) <span class="op">+</span></span> <span><a aria-hidden="true" href="#cb6-165"></a> <span class="st">"</span><span class="ch">\n</span><span class="st">&lt;references/&gt;"</span>, wikitext)</span> <span><a aria-hidden="true" href="#cb6-166"></a></span> <span><a aria-hidden="true" href="#cb6-167"></a> <span class="co"># replace category references</span></span> <span><a aria-hidden="true" href="#cb6-168"></a> categorypattern1 <span class="op">=</span> re.<span class="bu">compile</span>(<span class="vs">r"(.*)"</span>,</span> <span><a aria-hidden="true" href="#cb6-169"></a> flags<span class="op">=</span>re.MULTILINE)</span> <span><a aria-hidden="true" href="#cb6-170"></a> wikitext <span class="op">=</span> categorypattern1.sub(<span class="vs">r"\1"</span>, wikitext)</span> <span><a aria-hidden="true" href="#cb6-171"></a></span> <span><a aria-hidden="true" href="#cb6-172"></a> <span class="co"># replace category references, take care not to touch Images</span></span> <span><a aria-hidden="true" href="#cb6-173"></a> categorypattern2 <span class="op">=</span> re.<span class="bu">compile</span>(<span class="vs">r"\[\[[K|C]ategor.*\|(.*)\]\]"</span>,</span> <span><a aria-hidden="true" href="#cb6-174"></a> flags<span class="op">=</span>re.MULTILINE)</span> <span><a aria-hidden="true" href="#cb6-175"></a> wikitext <span class="op">=</span> categorypattern2.sub(<span class="vs">r"\1"</span>, wikitext)</span> <span><a aria-hidden="true" href="#cb6-176"></a></span> <span><a aria-hidden="true" href="#cb6-177"></a> <span class="co"># A mediawiki extension enables me to embedd images from my</span></span> <span><a aria-hidden="true" href="#cb6-178"></a> <span class="co"># idee demain into articles I write in the wiki just by pasting</span></span> <span><a aria-hidden="true" href="#cb6-179"></a> <span class="co"># the URL in to an otherwise empty paragraph.</span></span> <span><a aria-hidden="true" href="#cb6-180"></a> <span class="co"># https://something....filename.ext</span></span> <span><a aria-hidden="true" href="#cb6-181"></a> <span class="co"># This, who wonders, is not recognized by pandoc.</span></span> <span><a aria-hidden="true" href="#cb6-182"></a> <span class="co"># I need to beautify these image links.</span></span> <span><a aria-hidden="true" href="#cb6-183"></a></span> <span><a aria-hidden="true" href="#cb6-184"></a> <span class="co"># only if the line starts with http</span></span> <span><a aria-hidden="true" href="#cb6-185"></a> <span class="co"># transitional code for the migration</span></span> <span><a aria-hidden="true" href="#cb6-186"></a> imagepattern <span class="op">=</span> re.<span class="bu">compile</span>(<span class="vs">r""</span> <span class="op">+</span> <span class="st">"^http(.*)png"</span>, re.MULTILINE)</span> <span><a aria-hidden="true" href="#cb6-187"></a> wikitext <span class="op">=</span> imagepattern.sub(<span class="vs">r"[[Image:http\1png|No Caption]]"</span>,</span> <span><a aria-hidden="true" href="#cb6-188"></a> wikitext)</span> <span><a aria-hidden="true" href="#cb6-189"></a></span> <span><a aria-hidden="true" href="#cb6-190"></a> imagepattern <span class="op">=</span> re.<span class="bu">compile</span>(<span class="vs">r""</span> <span class="op">+</span> <span class="st">"^http(.*)jpg"</span>, re.MULTILINE)</span> <span><a aria-hidden="true" href="#cb6-191"></a> wikitext <span class="op">=</span> imagepattern.sub(<span class="vs">r"[[Image:http\1jpg|No Caption]]"</span>,</span> <span><a aria-hidden="true" href="#cb6-192"></a> wikitext)</span> <span><a aria-hidden="true" href="#cb6-193"></a></span> <span><a aria-hidden="true" href="#cb6-194"></a> <span class="co"># Above imagepattern replacements take care for external</span></span> <span><a aria-hidden="true" href="#cb6-195"></a> <span class="co"># links, as I used them in the past to upload images to WordPress</span></span> <span><a aria-hidden="true" href="#cb6-196"></a> <span class="co"># instead to the MediaWiki, when I intended to use them in</span></span> <span><a aria-hidden="true" href="#cb6-197"></a> <span class="co"># articles.</span></span> <span><a aria-hidden="true" href="#cb6-198"></a> <span class="co">#</span></span> <span><a aria-hidden="true" href="#cb6-199"></a> <span class="co"># The new scenario is the upload to the MediaWiki and to create</span></span> <span><a aria-hidden="true" href="#cb6-200"></a> <span class="co"># image entries with Capture Text and probaly Size Information.</span></span> <span><a aria-hidden="true" href="#cb6-201"></a> <span class="co"># For those the image file name needs to be expanded with the</span></span> <span><a aria-hidden="true" href="#cb6-202"></a> <span class="co"># URL to retrieve the images from the Wiki.</span></span> <span><a aria-hidden="true" href="#cb6-203"></a> <span class="co">#</span></span> <span><a aria-hidden="true" href="#cb6-204"></a> <span class="co"># The exported image information looks as follows:</span></span> <span><a aria-hidden="true" href="#cb6-205"></a> <span class="co"># [[Image:Imagename.png|NNxNNpx|Capture Text]]</span></span> <span><a aria-hidden="true" href="#cb6-206"></a> <span class="co"># Or:</span></span> <span><a aria-hidden="true" href="#cb6-207"></a> <span class="co"># [[Image:Imagename.png|Capture Text]]</span></span> <span><a aria-hidden="true" href="#cb6-208"></a> <span class="co">#</span></span> <span><a aria-hidden="true" href="#cb6-209"></a> <span class="co"># The text "Imgagename.png" needs to be expanded into:</span></span> <span><a aria-hidden="true" href="#cb6-210"></a> <span class="co"># $host/$script-path/index.php?title=File:Imagename.png</span></span> <span><a aria-hidden="true" href="#cb6-211"></a> <span class="co">#</span></span> <span><a aria-hidden="true" href="#cb6-212"></a> <span class="co"># Instead of Imgage the returned WikiText may contain Bild or File</span></span> <span><a aria-hidden="true" href="#cb6-213"></a> <span class="co"># or Datei as Keyword.</span></span> <span><a aria-hidden="true" href="#cb6-214"></a></span> <span><a aria-hidden="true" href="#cb6-215"></a> <span class="co"># This became a NOOP, because changing the image path here</span></span> <span><a aria-hidden="true" href="#cb6-216"></a> <span class="co"># to help pandoc to download them does not make any sense.</span></span> <span><a aria-hidden="true" href="#cb6-217"></a> <span class="co"># Probably an image export function would make sense here.</span></span> <span><a aria-hidden="true" href="#cb6-218"></a></span> <span><a aria-hidden="true" href="#cb6-219"></a> <span class="co"># mstr1 = r"\[\[(Image|Bild|Datei|File):"</span></span> <span><a aria-hidden="true" href="#cb6-220"></a> <span class="co"># mstr2 = r"([^h][^t][^t][^p].*p[n|j]g)|.*\]\]"</span></span> <span><a aria-hidden="true" href="#cb6-221"></a> <span class="co"># mstr = mstr1 + mstr2</span></span> <span><a aria-hidden="true" href="#cb6-222"></a> <span class="co"># imagepattern = re.compile(mstr, flags=re.MULTILINE)</span></span> <span><a aria-hidden="true" href="#cb6-223"></a> <span class="co"># replstr = r"" + site + r"index.php?title=" + r"\2"</span></span> <span><a aria-hidden="true" href="#cb6-224"></a> <span class="co"># wikitext = imagepattern.sub(replstr, wikitext)</span></span> <span><a aria-hidden="true" href="#cb6-225"></a></span> <span><a aria-hidden="true" href="#cb6-226"></a> <span class="co"># Write the result to disk.</span></span> <span><a aria-hidden="true" href="#cb6-227"></a> <span class="co"># </span><span class="al">TODO</span><span class="co">: Enable also enviroment variables to determine HOME.</span></span> <span><a aria-hidden="true" href="#cb6-228"></a> <span class="co"># Using ~ is nice, but quite OS dependent</span></span> <span><a aria-hidden="true" href="#cb6-229"></a> <span class="cf">if</span> exportdir[<span class="dv">0</span>] <span class="op">==</span> <span class="st">'~'</span>:</span> <span><a aria-hidden="true" href="#cb6-230"></a> wikifile <span class="op">=</span> Path.home() <span class="op">/</span> exportdir.strip(<span class="st">'~'</span>).strip(<span class="st">'/'</span>) <span class="op">\</span></span> <span><a aria-hidden="true" href="#cb6-231"></a> <span class="op">/</span> <span class="st">"</span><span class="sc">{0}</span><span class="st">.</span><span class="sc">{1}</span><span class="st">"</span>.<span class="bu">format</span>(pagename, <span class="st">"mediawiki"</span>)</span> <span><a aria-hidden="true" href="#cb6-232"></a> <span class="cf">else</span>:</span> <span><a aria-hidden="true" href="#cb6-233"></a> wikifile <span class="op">=</span> Path(exportdir) <span class="op">\</span></span> <span><a aria-hidden="true" href="#cb6-234"></a> <span class="op">/</span> <span class="st">"</span><span class="sc">{0}</span><span class="st">.</span><span class="sc">{1}</span><span class="st">"</span>.<span class="bu">format</span>(pagename, <span class="st">"mediawiki"</span>)</span> <span><a aria-hidden="true" href="#cb6-235"></a> wikifile <span class="op">=</span> wikifile.resolve()</span> <span><a aria-hidden="true" href="#cb6-236"></a> <span class="cf">with</span> <span class="bu">open</span>(wikifile, <span class="st">'w'</span>) <span class="im">as</span> outfile:</span> <span><a aria-hidden="true" href="#cb6-237"></a> <span class="bu">print</span>(wikitext, <span class="bu">file</span><span class="op">=</span>outfile)</span> <span><a aria-hidden="true" href="#cb6-238"></a> outfile.flush()</span> <span><a aria-hidden="true" href="#cb6-239"></a> outfile.close()</span> <span><a aria-hidden="true" href="#cb6-240"></a> <span class="bu">print</span>(<span class="st">'</span><span class="ch">\n</span><span class="st">The mediawiki file was exported to:</span><span class="ch">\n</span><span class="st">'</span> <span class="op">+</span> outfile.name)</span> <span><a aria-hidden="true" href="#cb6-241"></a></span> <span><a aria-hidden="true" href="#cb6-242"></a> prompt <span class="op">=</span> <span class="st">'</span><span class="ch">\n</span><span class="st">Do you want to review the file? yes/no (y/n):'</span></span> <span><a aria-hidden="true" href="#cb6-243"></a> pressed <span class="op">=</span> askforkeypress(prompt<span class="op">=</span>prompt,</span> <span><a aria-hidden="true" href="#cb6-244"></a> keylist<span class="op">=</span>[<span class="st">'y'</span>, <span class="st">'Y'</span>, <span class="st">'n'</span>, <span class="st">'N'</span>],</span> <span><a aria-hidden="true" href="#cb6-245"></a> onerror<span class="op">=</span><span class="st">'n'</span>)</span> <span><a aria-hidden="true" href="#cb6-246"></a> <span class="cf">if</span> pressed <span class="kw">in</span> [<span class="st">'y'</span>, <span class="st">'Y'</span>]:</span> <span><a aria-hidden="true" href="#cb6-247"></a> subprocess.run([<span class="st">"vim"</span>, wikifile])</span> <span><a aria-hidden="true" href="#cb6-248"></a></span> <span><a aria-hidden="true" href="#cb6-249"></a> <span class="co"># </span><span class="al">TODO</span><span class="co">: Consider to execute the commit into the wiki git</span></span> <span><a aria-hidden="true" href="#cb6-250"></a> <span class="co"># htmltext = subprocess.run(["pandoc", "-f", "mediawiki", \</span></span> <span><a aria-hidden="true" href="#cb6-251"></a> <span class="co"># "-t", "html"], input=bytearray(wikitext.encode()), \</span></span> <span><a aria-hidden="true" href="#cb6-252"></a> <span class="co"># capture_output=True)</span></span> <span><a aria-hidden="true" href="#cb6-253"></a> <span class="co"># print(htmltext.stdout.decode("utf-8"))</span></span> <span><a aria-hidden="true" href="#cb6-254"></a> <span class="cf">break</span></span> <span><a aria-hidden="true" href="#cb6-255"></a> <span class="cf">return</span></span> <span><a aria-hidden="true" href="#cb6-256"></a></span> <span><a aria-hidden="true" href="#cb6-257"></a></span> <span><a aria-hidden="true" href="#cb6-258"></a><span class="cf">if</span> <span class="va">__name__</span> <span class="op">==</span> <span class="st">"__main__"</span>:</span> <span><a aria-hidden="true" href="#cb6-259"></a> <span class="co"># Check command line arguments, provide help and call the functions</span></span> <span><a aria-hidden="true" href="#cb6-260"></a></span> <span><a aria-hidden="true" href="#cb6-261"></a> CREATEFLAG <span class="op">=</span> <span class="va">False</span></span> <span><a aria-hidden="true" href="#cb6-262"></a></span> <span><a aria-hidden="true" href="#cb6-263"></a> <span class="cf">try</span>:</span> <span><a aria-hidden="true" href="#cb6-264"></a> opts, args <span class="op">=</span> getopt.getopt(sys.argv[<span class="dv">1</span>:], <span class="st">"hw:"</span>, [<span class="st">"help"</span>, <span class="st">"wiki"</span>])</span> <span><a aria-hidden="true" href="#cb6-265"></a></span> <span><a aria-hidden="true" href="#cb6-266"></a> <span class="cf">except</span> getopt.GetoptError:</span> <span><a aria-hidden="true" href="#cb6-267"></a> <span class="bu">print</span>(HELPTEXT)</span> <span><a aria-hidden="true" href="#cb6-268"></a> sys.exit(<span class="dv">2</span>)</span> <span><a aria-hidden="true" href="#cb6-269"></a></span> <span><a aria-hidden="true" href="#cb6-270"></a> <span class="co"># Defaults</span></span> <span><a aria-hidden="true" href="#cb6-271"></a> CFGSECTION <span class="op">=</span> <span class="st">'Default'</span></span> <span><a aria-hidden="true" href="#cb6-272"></a></span> <span><a aria-hidden="true" href="#cb6-273"></a> <span class="cf">for</span> opt, arg <span class="kw">in</span> opts:</span> <span><a aria-hidden="true" href="#cb6-274"></a> <span class="cf">if</span> opt <span class="kw">in</span> {<span class="st">"-h"</span>, <span class="st">"--help"</span>}:</span> <span><a aria-hidden="true" href="#cb6-275"></a> <span class="bu">print</span>(HELPTEXT)</span> <span><a aria-hidden="true" href="#cb6-276"></a> sys.exit()</span> <span><a aria-hidden="true" href="#cb6-277"></a> <span class="cf">if</span> opt <span class="kw">in</span> {<span class="st">"-w"</span>, <span class="st">"--wiki"</span>}:</span> <span><a aria-hidden="true" href="#cb6-278"></a> CFGSECTION <span class="op">=</span> arg</span> <span><a aria-hidden="true" href="#cb6-279"></a></span> <span><a aria-hidden="true" href="#cb6-280"></a> arg_names <span class="op">=</span> [<span class="st">'pagename'</span>]</span> <span><a aria-hidden="true" href="#cb6-281"></a> args <span class="op">=</span> <span class="bu">dict</span>(<span class="bu">zip</span>(arg_names, args))</span> <span><a aria-hidden="true" href="#cb6-282"></a></span> <span><a aria-hidden="true" href="#cb6-283"></a> <span class="co"># print(args)</span></span> <span><a aria-hidden="true" href="#cb6-284"></a></span> <span><a aria-hidden="true" href="#cb6-285"></a> <span class="co"># Kept as inspiration for future</span></span> <span><a aria-hidden="true" href="#cb6-286"></a> <span class="co"># ------------------------------</span></span> <span><a aria-hidden="true" href="#cb6-287"></a> <span class="co"># Arg_list = collections.namedtuple('Arg_list', arg_names)</span></span> <span><a aria-hidden="true" href="#cb6-288"></a> <span class="co"># args = Arg_list(*(args.get(arg, None) for arg in arg_names))</span></span> <span><a aria-hidden="true" href="#cb6-289"></a></span> <span><a aria-hidden="true" href="#cb6-290"></a> pagename <span class="op">=</span> args.get(<span class="st">'pagename'</span>)</span> <span><a aria-hidden="true" href="#cb6-291"></a> <span class="cf">if</span> <span class="kw">not</span> pagename:</span> <span><a aria-hidden="true" href="#cb6-292"></a> <span class="bu">print</span>(colored(<span class="st">'</span><span class="ch">\n</span><span class="st">ERROR:'</span>, <span class="st">'red'</span>), <span class="st">'Page_Name parameter is missing.'</span>)</span> <span><a aria-hidden="true" href="#cb6-293"></a> <span class="bu">print</span>(HELPTEXT)</span> <span><a aria-hidden="true" href="#cb6-294"></a> sys.exit(<span class="dv">2</span>)</span> <span><a aria-hidden="true" href="#cb6-295"></a></span> <span><a aria-hidden="true" href="#cb6-296"></a> config <span class="op">=</span> configparser.ConfigParser()</span> <span><a aria-hidden="true" href="#cb6-297"></a></span> <span><a aria-hidden="true" href="#cb6-298"></a> configpath <span class="op">=</span> Path.home() <span class="op">/</span> <span class="st">'.config'</span> <span class="op">/</span> <span class="st">'wikitools'</span> <span class="op">/</span> <span class="st">'wikitools.cfg'</span></span> <span><a aria-hidden="true" href="#cb6-299"></a> config.read(configpath)</span> <span><a aria-hidden="true" href="#cb6-300"></a> sections <span class="op">=</span> config.sections()</span> <span><a aria-hidden="true" href="#cb6-301"></a> <span class="cf">if</span> CFGSECTION <span class="kw">not</span> <span class="kw">in</span> sections:</span> <span><a aria-hidden="true" href="#cb6-302"></a> <span class="bu">print</span>(colored(<span class="st">'</span><span class="ch">\n</span><span class="st">ERROR:'</span>, <span class="st">'red'</span>), <span class="st">'Configuration is missing.'</span>)</span> <span><a aria-hidden="true" href="#cb6-303"></a> <span class="bu">print</span>(HELPTEXT)</span> <span><a aria-hidden="true" href="#cb6-304"></a> sys.exit(<span class="dv">3</span>)</span> <span><a aria-hidden="true" href="#cb6-305"></a></span> <span><a aria-hidden="true" href="#cb6-306"></a> export()</span> <span><a aria-hidden="true" href="#cb6-307"></a></span> <span><a aria-hidden="true" href="#cb6-308"></a> sys.exit(<span class="dv">0</span>)</span></code></pre> </div> <h4> ~/.config/wikitools/wikitools.cfg </h4> <p> The code reads a configuration, which enables me to post the code without the risk of exposing my user and password for the wiki instances I use. </p> <p> One default wiki can be configured and as many additional wikis as you like. With the command line paramente -w you can address the configuration section you want to use in current export. </p> <p> The configuration is shared between multiple wikitools. </p> <div class="sourceCode"> <pre class="sourceCode INI"><code class="sourceCode ini"><span><a aria-hidden="true" href="#cb7-1"></a><span class="co">#</span></span> <span><a aria-hidden="true" href="#cb7-2"></a><span class="co"># The configurations location has to be</span></span> <span><a aria-hidden="true" href="#cb7-3"></a><span class="co">#</span></span> <span><a aria-hidden="true" href="#cb7-4"></a><span class="co"># UserHome/.config/wikitools/</span></span> <span><a aria-hidden="true" href="#cb7-5"></a><span class="co">#</span></span> <span><a aria-hidden="true" href="#cb7-6"></a><span class="co"># If the command line names no wiki section</span></span> <span><a aria-hidden="true" href="#cb7-7"></a><span class="co"># the Default section is used.</span></span> <span><a aria-hidden="true" href="#cb7-8"></a><span class="co">#</span></span> <span><a aria-hidden="true" href="#cb7-9"></a><span class="co"># The command line option -w</span></span> <span><a aria-hidden="true" href="#cb7-10"></a><span class="co"># with a parameter can be used to name a wiki,</span></span> <span><a aria-hidden="true" href="#cb7-11"></a><span class="co"># for which a configuration section exists.</span></span> <span><a aria-hidden="true" href="#cb7-12"></a><span class="co">#</span></span> <span><a aria-hidden="true" href="#cb7-13"></a><span class="co"># access to a key from python</span></span> <span><a aria-hidden="true" href="#cb7-14"></a><span class="co"># config[cfgsection]['key']</span></span> <span><a aria-hidden="true" href="#cb7-15"></a></span> <span><a aria-hidden="true" href="#cb7-16"></a><span class="kw">[Default]</span></span> <span><a aria-hidden="true" href="#cb7-17"></a><span class="dt">DefaultCategory </span><span class="ot">=</span><span class="st"> Your Category</span></span> <span><a aria-hidden="true" href="#cb7-18"></a><span class="dt">Host </span><span class="ot">=</span><span class="st"> Your wiki host name</span></span> <span><a aria-hidden="true" href="#cb7-19"></a><span class="dt">ScriptPath </span><span class="ot">=</span><span class="st"> Your wiki script path</span></span> <span><a aria-hidden="true" href="#cb7-20"></a><span class="dt">User </span><span class="ot">=</span><span class="st"> Your wiki user name</span></span> <span><a aria-hidden="true" href="#cb7-21"></a><span class="dt">Passwort </span><span class="ot">=</span><span class="st"> Your wiki password</span></span> <span><a aria-hidden="true" href="#cb7-22"></a><span class="dt">References </span><span class="ot">=</span><span class="st"> == Fußnoten ==</span></span> <span><a aria-hidden="true" href="#cb7-23"></a><span class="dt">ExportDirectory </span><span class="ot">=</span><span class="st"> ~/projects/idee/author</span></span> <span><a aria-hidden="true" href="#cb7-24"></a></span> <span><a aria-hidden="true" href="#cb7-25"></a><span class="kw">[yourwiki]</span></span> <span><a aria-hidden="true" href="#cb7-26"></a><span class="dt">DefaultCategory </span><span class="ot">=</span><span class="st"> Your Category</span></span> <span><a aria-hidden="true" href="#cb7-27"></a><span class="dt">Host </span><span class="ot">=</span><span class="st"> Your wiki host name</span></span> <span><a aria-hidden="true" href="#cb7-28"></a><span class="dt">ScriptPath </span><span class="ot">=</span><span class="st"> Your wiki script path</span></span> <span><a aria-hidden="true" href="#cb7-29"></a><span class="dt">User </span><span class="ot">=</span><span class="st"> Your wiki user name</span></span> <span><a aria-hidden="true" href="#cb7-30"></a><span class="dt">Passwort </span><span class="ot">=</span><span class="st"> Your wiki password</span></span> <span><a aria-hidden="true" href="#cb7-31"></a><span class="dt">References </span><span class="ot">=</span><span class="st"> == Footnotes ==</span></span> <span><a aria-hidden="true" href="#cb7-32"></a><span class="dt">ExportDirectory </span><span class="ot">=</span><span class="st"> ~/projects/idee/author</span></span></code></pre> </div> <h3> WordPress Migration </h3> <p> Meta data and, if I decide to use this, also the content of WordPress articles and sites, are available in the MariaDB database wp_idee. </p> <p> In the context of the project it made sense to grant remote access to the MariaDB from the local area network. This is well described in the official documentation <sup> ( 13 ) </sup> . </p> <div class="sourceCode"> <pre class="sourceCode bash"><code class="sourceCode bash"><span><a aria-hidden="true" href="#cb8-1"></a><span class="ex">root</span> @sol:~# cd /etc/mysql/mariadb.conf.d/</span> <span><a aria-hidden="true" href="#cb8-2"></a><span class="ex">root</span> @sol:/etc/mysql/mariadb.conf.d# ls</span> <span><a aria-hidden="true" href="#cb8-3"></a><span class="ex">50-client.cnf</span> 50-mysql-clients.cnf 50-mysqld_safe.cnf 50-server.cnf</span> <span><a aria-hidden="true" href="#cb8-4"></a><span class="ex">root</span> @sol:/etc/mysql/mariadb.conf.d# vim 50-server.cnf </span></code></pre> </div> <p> In file 50-server.cnf comment out the the bind-address 127.0.0.1, making it bind to the network cards addresses. In my case there is just one if these, </p> <div class="sourceCode"> <pre class="sourceCode bash"><code class="sourceCode bash"><span><a aria-hidden="true" href="#cb9-1"></a>[<span class="ex">...</span>]</span> <span><a aria-hidden="true" href="#cb9-2"></a><span class="co"># Instead of skip-networking the default is now to listen only on</span></span> <span><a aria-hidden="true" href="#cb9-3"></a><span class="co"># localhost which is more compatible and is not less secure.</span></span> <span><a aria-hidden="true" href="#cb9-4"></a><span class="co"># bind-address = 127.0.0.1</span></span> <span><a aria-hidden="true" href="#cb9-5"></a>[<span class="ex">...</span>]</span></code></pre> </div> <p> Restart of the sql server. </p> <div class="sourceCode"> <pre class="sourceCode bash"><code class="sourceCode bash"><span><a aria-hidden="true" href="#cb10-1"></a><span class="ex">root</span> @sol:/etc/mysql/mariadb.conf.d# systemctl restart mysql</span></code></pre> </div> <p> Login into the sql server as root to manage users. </p> <div class="sourceCode"> <pre class="sourceCode bash"><code class="sourceCode bash"><span><a aria-hidden="true" href="#cb11-1"></a><span class="ex">root</span> @sol:/etc/mysql/mariadb.conf.d# mysql -p</span></code></pre> </div> <pre class="mysql"><code>MariaDB [(none)]&gt; SELECT User, Host FROM mysql.user; +-----------+-----------+ | User | Host | +-----------+-----------+ | ninja | localhost | | root | localhost | | wiki | localhost | | wordpress | localhost | +-----------+-----------+ 4 rows in set (0.000 sec) MariaDB [(none)]&gt; CREATE USER wpremote@'10.19.67.%' IDENTIFIED BY 'password-of-new-user'; Query OK, 0 rows affected (0.001 sec) MariaDB [(none)]&gt; SELECT User, Host FROM mysql.user; +-----------+------------+ | User | Host | +-----------+------------+ | wpremote | 10.19.67.% | | ninja | localhost | | root | localhost | | wiki | localhost | | wordpress | localhost | +-----------+------------+ 5 rows in set (0.000 sec) MariaDB [(none)]&gt; GRANT ALL PRIVILEGES ON wp_idee.* TO 'wpremote'@'10.19.67.%' WITH GRANT OPTION; Query OK, 0 rows affected (0.017 sec)</code></pre> <p> Remote Access with the new user. </p> <div class="sourceCode"> <pre class="sourceCode bash"><code class="sourceCode bash"><span><a aria-hidden="true" href="#cb13-1"></a><span class="ex">frank</span> @Asimov:~$ mysql -u wpremote -h sol -p wp_idee</span> <span><a aria-hidden="true" href="#cb13-2"></a><span class="ex">Enter</span> password: </span> <span><a aria-hidden="true" href="#cb13-3"></a><span class="ex">Reading</span> table information for completion of table and column names</span> <span><a aria-hidden="true" href="#cb13-4"></a><span class="ex">You</span> can turn off this feature to get a quicker startup with -A</span> <span><a aria-hidden="true" href="#cb13-5"></a></span> <span><a aria-hidden="true" href="#cb13-6"></a><span class="ex">Welcome</span> to the MariaDB monitor. Commands end with <span class="kw">;</span> <span class="ex">or</span> \g.</span> <span><a aria-hidden="true" href="#cb13-7"></a><span class="ex">Your</span> MariaDB connection id is 205</span> <span><a aria-hidden="true" href="#cb13-8"></a><span class="ex">Server</span> version: 10.3.31-MariaDB-0+deb10u1 Debian 10</span> <span><a aria-hidden="true" href="#cb13-9"></a></span> <span><a aria-hidden="true" href="#cb13-10"></a><span class="ex">Copyright</span> (c) <span class="ex">2000</span>, 2018, Oracle, MariaDB Corporation Ab and others.</span> <span><a aria-hidden="true" href="#cb13-11"></a></span> <span><a aria-hidden="true" href="#cb13-12"></a><span class="ex">Type</span> <span class="st">'help;'</span> or <span class="st">'\h'</span> for help. Type <span class="st">'\c'</span> to clear the current input statement.</span> <span><a aria-hidden="true" href="#cb13-13"></a></span> <span><a aria-hidden="true" href="#cb13-14"></a><span class="ex">MariaDB</span> [wp_idee]<span class="op">&gt;</span> </span></code></pre> </div> <h4> Table wp_posts </h4> <pre class="mysql"><code>MariaDB [wp_idee]&gt; describe wp_posts; +-----------------------+---------------------+------+-----+---------------------+ | Field | Type | Null | Key | Default |... +-----------------------+---------------------+------+-----+---------------------+ | ID | bigint(20) unsigned | NO | PRI | NULL | | post_author | bigint(20) unsigned | NO | MUL | 0 | | post_date | datetime | NO | | 0000-00-00 00:00:00 | | post_date_gmt | datetime | NO | | 0000-00-00 00:00:00 | | post_content | longtext | NO | | NULL | | post_title | text | NO | | NULL | | post_excerpt | text | NO | | NULL | | post_status | varchar(20) | NO | | publish | | comment_status | varchar(20) | NO | | open | | ping_status | varchar(20) | NO | | open | | post_password | varchar(255) | NO | | | | post_name | varchar(200) | NO | MUL | | | to_ping | text | NO | | NULL | | pinged | text | NO | | NULL | | post_modified | datetime | NO | | 0000-00-00 00:00:00 | | post_modified_gmt | datetime | NO | | 0000-00-00 00:00:00 | | post_content_filtered | longtext | NO | | NULL | | post_parent | bigint(20) unsigned | NO | MUL | 0 | | guid | varchar(255) | NO | | | | menu_order | int(11) | NO | | 0 | | post_type | varchar(20) | NO | MUL | post | | post_mime_type | varchar(100) | NO | | | | comment_count | bigint(20) | NO | | 0 | +-----------------------+---------------------+------+-----+---------------------+ 23 rows in set (0.003 sec)</code></pre> <h4> Post Types </h4> <pre class="mysql"><code>MariaDB [wp_idee]&gt; select distinct post_type from wp_posts; +---------------+ | post_type | +---------------+ | attachment | | nav_menu_item | | page | | post | | revision | +---------------+ 5 rows in set (0.003 sec)</code></pre> <p> The post types of interest are most probably only page and post. I'm not sure about revision, but we can take a look, what's tagged as post type "revision". </p> <p> The query "select post_title from wp_posts where post_type='revision';" brings repeatedly the same title, most probably the post_modified information should differ in those. </p> <p> The query "select post_title from wp_posts where post_type='page';" shows only 2 pages, which are already processed into one for the new solution. </p> <p> The query "select post_title from wp_posts where post_type='post';" shows every title only once, i would hope with the initial post information only. I'll not only hope, but check this before I continue. </p> <p> The query <strong> "select post_name from wp_posts where post_type='post';" </strong> shows the pages unique identifiers, the last element of the pages URL. This information is very helpful to ensure that the migrated pages are named exactly the same in the new solution and are found via one redirect instruction in nginx. The redirect is important, since I will not have the date encoded in the URL, as I had it in wordpress. </p> <h4> Post Mime Types </h4> <p> Post Mime types help to select PDF, Audio and zip and Spreadsheets, which had been embedded into the posts in WordPress. The HTML posts themselves got the mime type "" in this DB, which saves space, but is irritating. </p> <h4> Missing Information </h4> <p> One Information is missing. The language or locale of the posted content. In the new solution I'll use de-DE and en-US as locales and as language information. I did not really create a lot of English pages, but this might change and the few ones I already have shall get presented correctly. </p> <h4> WordPress Meta Data Export </h4> <p> Create an export query for the HTML pages posted in WordPress. Include </p> <ul class="incremental"> <li> a site column with default value "Idee", </li> <li> a locale column with default value "de-DE", </li> <li> a author column with default value "Frank Siebert" </li> </ul> <p> to be changed manually for the few English posts existing into </p> <ul class="incremental"> <li> site = "Concept" (Concept of new cognition elicitation personally thinking) replacing Idee (Idee der eigenen Erkenntnis). That's the best recursive acronym translation I found. </li> <li> locale = "en-US" </li> </ul> <p> Get the earliest post date into one column and the latest post date into another column of the same row. Have one row per post. </p> <p> The export query is created in a bash script, whose output can be piped into a local tab delimited file. The last line of this file needs to be deleted, since it contains an automatically saved draft, and .. (see next chapter). </p> <p> wpmeta bash script: </p> <div class="sourceCode"> <pre class="sourceCode bash"><code class="sourceCode bash"><span><a aria-hidden="true" href="#cb16-1"></a><span class="co">#!/bin/bash</span></span> <span><a aria-hidden="true" href="#cb16-2"></a><span class="va">sql=</span><span class="st">"select \</span></span> <span><a aria-hidden="true" href="#cb16-3"></a><span class="st"> min(post_date) over (partition by post_name) as post_date, \</span></span> <span><a aria-hidden="true" href="#cb16-4"></a><span class="st"> max(post_modified) over (partition by post_name), \</span></span> <span><a aria-hidden="true" href="#cb16-5"></a><span class="st"> max(comment_count) over (partition by post_name), \</span></span> <span><a aria-hidden="true" href="#cb16-6"></a><span class="st"> 'Idee' as site, \</span></span> <span><a aria-hidden="true" href="#cb16-7"></a><span class="st"> 'de-DE' as locale, \</span></span> <span><a aria-hidden="true" href="#cb16-8"></a><span class="st"> 'Frank Siebert' as author, \</span></span> <span><a aria-hidden="true" href="#cb16-9"></a><span class="st"> post_name, \</span></span> <span><a aria-hidden="true" href="#cb16-10"></a><span class="st"> post_title \</span></span> <span><a aria-hidden="true" href="#cb16-11"></a><span class="st"> from wp_posts \</span></span> <span><a aria-hidden="true" href="#cb16-12"></a><span class="st"> where (post_mime_type='' and post_type='post') \</span></span> <span><a aria-hidden="true" href="#cb16-13"></a><span class="st"> order by post_date asc;"</span></span> <span><a aria-hidden="true" href="#cb16-14"></a></span> <span><a aria-hidden="true" href="#cb16-15"></a><span class="ex">mysql</span> wp_idee -u wpremote -h sol -p<span class="st">'not-exposed-pwd'</span> <span class="kw">\</span></span> <span><a aria-hidden="true" href="#cb16-16"></a><span class="ex">--default-character-set</span>=utf8 -N -e <span class="st">"</span><span class="va">$sql</span><span class="st">"</span> <span class="op">&gt;</span> ../config/migrationlist.csv</span></code></pre> </div> <p> Using bash for the query and persisting the result directing the output into a file, the data in the file becomes tab delimited. </p> <h4> Special-Case for nginx redirect </h4> <p> Querying the database revealed: The post "Social Distancing und Lockdown" has for unknown reasons the url "/2021/01/26/255/". This will not be the case in the new solution, I'll not have a 255.html around there, but I'll either create a special case in the redirect for this article or rather ignore this case at all. </p> <p> In the result of the final export query a corrected page_name needs to be maintained, to migrate this article correctly. </p> <h4> Option in Consideration </h4> <p> For all articles written in the Wiki and published afterwards, export.py from the wikitools will do perfect service, I hope. </p> <p> But things change over time and quite a number of articles where written in WordPress, when I had no MediaWiki in place. Also some last minute typo correction, I know this, where maintained directly in WordPress after publishing. </p> <p> And the article "Das SARI-Rätsel" contains an interactive java-script part, which required some authoring in html. This last point might not stay a single happenstance, as developer I might find more often a reason to extend a page functionally, or even to write the complete page directly in html as source format. </p> <p> My concept does support such deviations from the standard scenario. But this is not the point here. The point is: I need an wp-export tool to generate MediaWiki files from the current WordPress pages for the migration. That's the only way to ensure that the migrated pages contain exactly what they are supposed to contain. With this I can also incorporate comments posted by me after publishing as Update Notes. </p> <p> <em> I need to leave a note about the last paragraph. For the reasons already mentioned above I made a highly manual migration. Overall content quality improved considerably during migration, even if I missed one or two typos. </em> </p> <p> A migration file created via sql from the wp_posts table ist stored as <strong> ./config/migration.csv </strong> (with spaces instead of colons) containing date, time and title columns. </p> <h4> wp-export.py </h4> <p> <em> This was never implemented. </em> I place this tool in the directory <strong> ./tools/ </strong> of the <strong> Authoring </strong> GIT and create an alias <strong> wpe </strong> for the single file export and probably I'll also have an <strong> wpbe </strong> command as WordPress batch-export. </p> <h3> Manage Meta Data </h3> <p> I have now migration data in a tab delimited csv available, and I need to manage publishing meta data to make sure the correct meta data is shown in every web-page, in the sitemaps and RSS-feed. </p> <p> I started to implement this as a specialized dictionary implementation, but I falter to proceed in this direction. The python modules pandas and agate seem to offer fascinating power working with csv files, and I have to investigate both in much more detail in the future. </p> <p> My choice fell on the module agate (Documentation: "agate 1.6.3" <sup> ( 14 ) </sup> ) for this implementation, since it is a smaller implementation and because I do not need any extensive statistical horsepower for the meta data stored. I use the module in version 1.6.1 as it is provided currently by Debian Stable. </p> <p> I have to revise me decision. Agate makes the reading of the csv easy and provides a powerful table object, making the data accessible very nicely. But the documentation states that it returns always copies to the data structure, which I like very much, but the documentation fails to show options to update data in the table object, which I will need to do without getting new instances of the meta data table next to the singleton created for that purpose. </p> <h4> Installing python3-pandas </h4> <div class="sourceCode"> <pre class="sourceCode bash"><code class="sourceCode bash"><span><a aria-hidden="true" href="#cb17-1"></a><span class="ex">frank</span> @Asimov:~$ sudo apt-get install python-pandas-doc python3-pandas</span> <span><a aria-hidden="true" href="#cb17-2"></a>[<span class="ex">sudo</span>] password for frank: </span> <span><a aria-hidden="true" href="#cb17-3"></a><span class="ex">Reading</span> package lists... Done</span> <span><a aria-hidden="true" href="#cb17-4"></a><span class="ex">Building</span> dependency tree... Done</span> <span><a aria-hidden="true" href="#cb17-5"></a><span class="ex">Reading</span> state information... Done</span> <span><a aria-hidden="true" href="#cb17-6"></a><span class="ex">The</span> following additional packages will be installed:</span> <span><a aria-hidden="true" href="#cb17-7"></a> <span class="ex">libblosc1</span> libclang-cpp9 libffi-dev liblbfgsb0 libllvm9 libncurses-dev libpfm4 </span> <span><a aria-hidden="true" href="#cb17-8"></a> <span class="ex">libtinfo-dev</span> libz3-dev llvm-9 llvm-9-dev llvm-9-runtime llvm-9-tools </span> <span><a aria-hidden="true" href="#cb17-9"></a> <span class="ex">numba-doc</span> python-odf-doc python-odf-tools python-tables-data </span> <span><a aria-hidden="true" href="#cb17-10"></a> <span class="ex">python3-bottleneck</span> python3-et-xmlfile python3-iniconfig python3-jdcal </span> <span><a aria-hidden="true" href="#cb17-11"></a> <span class="ex">python3-llvmlite</span> python3-numba python3-numexpr python3-odf python3-openpyxl </span> <span><a aria-hidden="true" href="#cb17-12"></a> <span class="ex">python3-pandas-lib</span> python3-py python3-pytest python3-scipy python3-tables </span> <span><a aria-hidden="true" href="#cb17-13"></a> <span class="ex">python3-tables-lib</span> python3-xlwt</span> <span><a aria-hidden="true" href="#cb17-14"></a><span class="ex">Suggested</span> packages:</span> <span><a aria-hidden="true" href="#cb17-15"></a> <span class="ex">ncurses-doc</span> llvm-9-doc python-bottleneck-doc llvmlite-doc nvidia-cuda-toolkit </span> <span><a aria-hidden="true" href="#cb17-16"></a> <span class="ex">python3-statsmodels</span> python-scipy-doc python3-netcdf4 python-tables-doc </span> <span><a aria-hidden="true" href="#cb17-17"></a> <span class="ex">vitables</span> python3-xlrd python-xlrt-doc</span> <span><a aria-hidden="true" href="#cb17-18"></a><span class="ex">The</span> following NEW packages will be installed:</span> <span><a aria-hidden="true" href="#cb17-19"></a> <span class="ex">libblosc1</span> libclang-cpp9 libffi-dev liblbfgsb0 libllvm9 libncurses-dev libpfm4 </span> <span><a aria-hidden="true" href="#cb17-20"></a> <span class="ex">libtinfo-dev</span> libz3-dev llvm-9 llvm-9-dev llvm-9-runtime llvm-9-tools numba-doc </span> <span><a aria-hidden="true" href="#cb17-21"></a> <span class="ex">python-odf-doc</span> python-odf-tools python-pandas-doc python-tables-data </span> <span><a aria-hidden="true" href="#cb17-22"></a> <span class="ex">python3-bottleneck</span> python3-et-xmlfile python3-iniconfig python3-jdcal </span> <span><a aria-hidden="true" href="#cb17-23"></a> <span class="ex">python3-llvmlite</span> python3-numba python3-numexpr python3-odf python3-openpyxl </span> <span><a aria-hidden="true" href="#cb17-24"></a> <span class="ex">python3-pandas</span> python3-pandas-lib python3-py python3-pytest python3-scipy </span> <span><a aria-hidden="true" href="#cb17-25"></a> <span class="ex">python3-tables</span> python3-tables-lib python3-xlwt</span> <span><a aria-hidden="true" href="#cb17-26"></a><span class="ex">0</span> upgraded, 35 newly installed, 0 to remove and 1 not upgraded.</span> <span><a aria-hidden="true" href="#cb17-27"></a><span class="ex">Need</span> to get 84.2 MB of archives.</span> <span><a aria-hidden="true" href="#cb17-28"></a><span class="ex">After</span> this operation, 496 MB of additional disk space will be used.</span> <span><a aria-hidden="true" href="#cb17-29"></a><span class="ex">Do</span> you want to continue? [Y/n] y</span></code></pre> </div> <p> A lot of suggestions next to a lot of required packages. If I would not intend to go deeper into data analysis with python, I would stay with my initial dict-based implementation instead. </p> <h4> python3-pandas web references </h4> <ul class="incremental"> <li> "pandas documentation" <sup> ( 15 ) </sup> </li> <li> "Add new rows and columns to Pandas dataframe" <sup> ( 16 ) </sup> With the most helpful explanation to insert via loc (or update via iloc) rows with: </li> </ul> <div class="sourceCode"> <pre class="sourceCode python"><code class="sourceCode python"><span><a aria-hidden="true" href="#cb18-1"></a>df.loc[<span class="bu">len</span>(df.index)]<span class="op">=</span><span class="bu">list</span>(data[<span class="dv">0</span>].values())</span></code></pre> </div> <ul class="incremental"> <li> "Pandas Tutorial" <sup> ( 17 ) </sup> </li> </ul> <h4> Bioconductor </h4> <p> The search "derive from dataframe" had one result on startpage.com. This search result does not help me to find out, what to take special care for when I derive my own class from the dataframe class, but it relates strongly to a lot of articles I posted on my site. </p> <ul class="incremental"> <li> "Getting Started with Bioconductor 3.7" <sup> ( 18 ) </sup> </li> </ul> <p> A quick check reveals - it is available in Debian just at my fingertips. </p> <div class="sourceCode"> <pre class="sourceCode bash"><code class="sourceCode bash"><span><a aria-hidden="true" href="#cb19-1"></a><span class="ex">frank</span> @Asimov:~$ sudo apt-cache search bioconductor</span> <span><a aria-hidden="true" href="#cb19-2"></a><span class="ex">bio-tradis</span> - analyse the output from TraDIS analyses of genomic sequences</span> <span><a aria-hidden="true" href="#cb19-3"></a><span class="ex">libtfbs-perl</span> - scanning DNA sequence with a position weight matrix</span> <span><a aria-hidden="true" href="#cb19-4"></a><span class="ex">q2-dada2</span> - QIIME 2 plugin to work with adapters in sequence data</span> <span><a aria-hidden="true" href="#cb19-5"></a><span class="ex">r-bioc-affy</span> - BioConductor methods for Affymetrix Oligonucleotide Arrays</span> <span><a aria-hidden="true" href="#cb19-6"></a><span class="ex">r-bioc-affyio</span> - BioConductor tools for parsing Affymetrix data files</span> <span><a aria-hidden="true" href="#cb19-7"></a><span class="ex">...</span></span> <span><a aria-hidden="true" href="#cb19-8"></a><span class="ex">r-bioc-variantannotation</span> - BioConductor annotation of genetic variants</span> <span><a aria-hidden="true" href="#cb19-9"></a><span class="ex">r-bioc-xvector</span> - BioConductor representation and manpulation of external</span> <span><a aria-hidden="true" href="#cb19-10"></a> <span class="ex">sequences</span></span> <span><a aria-hidden="true" href="#cb19-11"></a><span class="ex">r-bioc-zlibbioc</span> - (Virtual) <span class="ex">zlibbioc</span> Bioconductor package</span> <span><a aria-hidden="true" href="#cb19-12"></a><span class="ex">r-cran-ape</span> - GNU R package for Analyses of Phylogenetics and Evolution</span> <span><a aria-hidden="true" href="#cb19-13"></a><span class="ex">r-cran-biocmanager</span> - access the Bioconductor project package repository</span></code></pre> </div> <p> I suppose I'll never have enough leisure time to dig into everything I'm interested in. And it is written in R, a programming language I would like to learn as well. </p> <p> I have to remind me from time to time that I'm doing this to learn and not to to prove I can do this implementation in very short time. More reading, less coding, take your time and it will take less time. </p> <h3> MediaWiki to Plain HTML Conversion </h3> <p> <em> You are right, if you make the assumption that I started much earlier writing Python code. But we are now finally at the point where first parts of the final Python implementation can be used in the final setup. </em> </p> <p> <em> I spare every evolutionary step of the Python code development. All following code is in the most recent state. </em> </p> <h4> Commit-msg Nessage Hook </h4> <p> ''Originally I wrote an commit-msg hook directly as executable Python program. But the message hook shown next is a bash scriot. </p> <p> <strong> ~/projects/idee/config/hooks/commit-msg </strong> </p> <div class="sourceCode"> <pre class="sourceCode bash"><code class="sourceCode bash"><span><a aria-hidden="true" href="#cb20-1"></a><span class="co">#!/bin/bash</span></span> <span><a aria-hidden="true" href="#cb20-2"></a><span class="ex">/usr/bin/python3</span> generator/commitmsg.py <span class="va">$1</span></span></code></pre> </div> <p> The parameter $1 is the commit message, for which the solution does use a modified template. </p> <h4> Commit Message Template </h4> <p> The commit message template provides the possibility to define some meta data to taken into the content and to steer apart different content creation options. </p> <p> <strong> ~/projects/idee/config/commit-message </strong> </p> <div class="sourceCode"> <pre class="sourceCode bash"><code class="sourceCode bash"><span><a aria-hidden="true" href="#cb21-1"></a> </span> <span><a aria-hidden="true" href="#cb21-2"></a><span class="co"># Overwrite values if neccesary, based on https://ogp.me/</span></span> <span><a aria-hidden="true" href="#cb21-3"></a><span class="co"># pdf:draft=false</span></span> <span><a aria-hidden="true" href="#cb21-4"></a><span class="co"># og:locale=de-DE</span></span> <span><a aria-hidden="true" href="#cb21-5"></a><span class="co"># og:site_name=Idee</span></span> <span><a aria-hidden="true" href="#cb21-6"></a><span class="co"># article:author=Frank Siebert</span></span> <span><a aria-hidden="true" href="#cb21-7"></a><span class="co">#</span></span></code></pre> </div> <p> Note: There is a significant empty line as the first line. </p> <p> <em> Probably I will eliminate either the og:site_name or the og:locale line in future. I'm not yet sure. </em> </p> <h4> Main Program: commit-msg.py </h4> <p> You might have seen it above, this is the Python program called by the message hook bash script. The commit message is passed on as parameter. </p> <p> The commit-msg.py registers workers to take care for specific the commit message entries identified by a regex match at a dispatcher class. </p> <p> <strong> ~/projects/idee/generator/commit-msg.py </strong> </p> <div class="sourceCode"> <pre class="sourceCode Python"><code class="sourceCode python"><span><a aria-hidden="true" href="#cb22-1"></a><span class="co">"""Website Generator - "pandoc, fs-commit-msg-hook 1.0".</span></span> <span><a aria-hidden="true" href="#cb22-2"></a></span> <span><a aria-hidden="true" href="#cb22-3"></a><span class="co">@author: Frank Siebert</span></span> <span><a aria-hidden="true" href="#cb22-4"></a><span class="co">@license: https://creativecommons.org/publicdomain/zero/1.0/deed.en</span></span> <span><a aria-hidden="true" href="#cb22-5"></a><span class="co">@date: 2022-03-15</span></span> <span><a aria-hidden="true" href="#cb22-6"></a></span> <span><a aria-hidden="true" href="#cb22-7"></a><span class="co">https://wiki.frank-siebert.de/inst/Replacing_Wordpress</span></span> <span><a aria-hidden="true" href="#cb22-8"></a><span class="co">https://idee.frank-siebert.de/article/replacing-wordpress.html</span></span> <span><a aria-hidden="true" href="#cb22-9"></a></span> <span><a aria-hidden="true" href="#cb22-10"></a><span class="co">Website Generator uses Beautiful Soup, Pandoc and GIT to manage</span></span> <span><a aria-hidden="true" href="#cb22-11"></a><span class="co">authoring in *.wikimedia files and to convert those into:</span></span> <span><a aria-hidden="true" href="#cb22-12"></a><span class="co">* plain html pages, one per wikimedia file as article</span></span> <span><a aria-hidden="true" href="#cb22-13"></a><span class="co">* PDF files, one per wikimedia file for article download</span></span> <span><a aria-hidden="true" href="#cb22-14"></a><span class="co">* a Web site portal pages by injecting the portal into the plain html pages</span></span> <span><a aria-hidden="true" href="#cb22-15"></a></span> <span><a aria-hidden="true" href="#cb22-16"></a><span class="co">Website Generator generates as additional portal assets:</span></span> <span><a aria-hidden="true" href="#cb22-17"></a><span class="co">* sitemap.xml</span></span> <span><a aria-hidden="true" href="#cb22-18"></a><span class="co">* feed.xml</span></span> <span><a aria-hidden="true" href="#cb22-19"></a><span class="co">* ...</span></span> <span><a aria-hidden="true" href="#cb22-20"></a></span> <span><a aria-hidden="true" href="#cb22-21"></a><span class="co">Website Generator works with Python 3 and up. It works better if lxml</span></span> <span><a aria-hidden="true" href="#cb22-22"></a><span class="co">and/or html5lib is installed, as Beautiful Soup states it runs better then.</span></span> <span><a aria-hidden="true" href="#cb22-23"></a><span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb22-24"></a><span class="co"># Systen Imports</span></span> <span><a aria-hidden="true" href="#cb22-25"></a><span class="im">import</span> sys</span> <span><a aria-hidden="true" href="#cb22-26"></a><span class="im">import</span> getopt</span> <span><a aria-hidden="true" href="#cb22-27"></a></span> <span><a aria-hidden="true" href="#cb22-28"></a><span class="im">from</span> termcolor <span class="im">import</span> colored</span> <span><a aria-hidden="true" href="#cb22-29"></a></span> <span><a aria-hidden="true" href="#cb22-30"></a><span class="im">from</span> gitmsgdispatcher <span class="im">import</span> GitMsgDispatcher</span> <span><a aria-hidden="true" href="#cb22-31"></a><span class="im">from</span> mwworker <span class="im">import</span> MwWorker</span> <span><a aria-hidden="true" href="#cb22-32"></a><span class="im">from</span> pdfworker <span class="im">import</span> PdfWorker</span> <span><a aria-hidden="true" href="#cb22-33"></a><span class="im">from</span> plainworker <span class="im">import</span> PlainWorker</span> <span><a aria-hidden="true" href="#cb22-34"></a></span> <span><a aria-hidden="true" href="#cb22-35"></a><span class="co"># Ask for a key press and return the pressed key, if it is part of the keylist.</span></span> <span><a aria-hidden="true" href="#cb22-36"></a><span class="co"># In case of errors return the value of onerror, to enable the caller to</span></span> <span><a aria-hidden="true" href="#cb22-37"></a><span class="co"># decide on the most convinient way to preceed.</span></span> <span><a aria-hidden="true" href="#cb22-38"></a></span> <span><a aria-hidden="true" href="#cb22-39"></a></span> <span><a aria-hidden="true" href="#cb22-40"></a><span class="cf">if</span> <span class="va">__name__</span> <span class="op">==</span> <span class="st">"__main__"</span>:</span> <span><a aria-hidden="true" href="#cb22-41"></a></span> <span><a aria-hidden="true" href="#cb22-42"></a> HELPTEXT <span class="op">=</span> <span class="st">'Usage: commit-msg </span><span class="ch">\'</span><span class="st">message-file</span><span class="ch">\'\n</span><span class="st">'</span>\</span> <span><a aria-hidden="true" href="#cb22-43"></a> <span class="st">'</span><span class="ch">\n</span><span class="st">'</span>\</span> <span><a aria-hidden="true" href="#cb22-44"></a> <span class="st">'message-file The commit message file with the list of files</span><span class="ch">\n</span><span class="st">'</span>\</span> <span><a aria-hidden="true" href="#cb22-45"></a> <span class="st">' to be processed.</span><span class="ch">\n</span><span class="st">'</span>\</span> <span><a aria-hidden="true" href="#cb22-46"></a></span> <span><a aria-hidden="true" href="#cb22-47"></a> <span class="cf">try</span>:</span> <span><a aria-hidden="true" href="#cb22-48"></a> opts, args <span class="op">=</span> getopt.getopt(sys.argv[<span class="dv">1</span>:], <span class="st">"h:"</span>, [<span class="st">"help"</span>])</span> <span><a aria-hidden="true" href="#cb22-49"></a></span> <span><a aria-hidden="true" href="#cb22-50"></a> <span class="cf">except</span> getopt.GetoptError:</span> <span><a aria-hidden="true" href="#cb22-51"></a> <span class="bu">print</span>(HELPTEXT)</span> <span><a aria-hidden="true" href="#cb22-52"></a> sys.exit(<span class="dv">2</span>)</span> <span><a aria-hidden="true" href="#cb22-53"></a></span> <span><a aria-hidden="true" href="#cb22-54"></a> <span class="cf">for</span> opt, arg <span class="kw">in</span> opts:</span> <span><a aria-hidden="true" href="#cb22-55"></a> <span class="cf">if</span> opt <span class="kw">in</span> {<span class="st">"-h"</span>, <span class="st">"--help"</span>}:</span> <span><a aria-hidden="true" href="#cb22-56"></a> <span class="bu">print</span>(HELPTEXT)</span> <span><a aria-hidden="true" href="#cb22-57"></a> sys.exit()</span> <span><a aria-hidden="true" href="#cb22-58"></a></span> <span><a aria-hidden="true" href="#cb22-59"></a> arg_names <span class="op">=</span> [<span class="st">'message-file'</span>]</span> <span><a aria-hidden="true" href="#cb22-60"></a> args <span class="op">=</span> <span class="bu">dict</span>(<span class="bu">zip</span>(arg_names, args))</span> <span><a aria-hidden="true" href="#cb22-61"></a></span> <span><a aria-hidden="true" href="#cb22-62"></a> <span class="co"># print(args)</span></span> <span><a aria-hidden="true" href="#cb22-63"></a></span> <span><a aria-hidden="true" href="#cb22-64"></a> <span class="co"># Kept as inspiration for future</span></span> <span><a aria-hidden="true" href="#cb22-65"></a> <span class="co"># ------------------------------</span></span> <span><a aria-hidden="true" href="#cb22-66"></a> <span class="co"># Arg_list = collections.namedtuple('Arg_list', arg_names)</span></span> <span><a aria-hidden="true" href="#cb22-67"></a> <span class="co"># args = Arg_list(*(args.get(arg, None) for arg in arg_names))</span></span> <span><a aria-hidden="true" href="#cb22-68"></a></span> <span><a aria-hidden="true" href="#cb22-69"></a> messagefile <span class="op">=</span> args.get(<span class="st">'message-file'</span>)</span> <span><a aria-hidden="true" href="#cb22-70"></a> <span class="cf">if</span> <span class="kw">not</span> messagefile:</span> <span><a aria-hidden="true" href="#cb22-71"></a> <span class="bu">print</span>(colored(<span class="st">'</span><span class="ch">\n</span><span class="st">ERROR:'</span>, <span class="st">'red'</span>), <span class="st">'message-file parameter is missing.'</span>)</span> <span><a aria-hidden="true" href="#cb22-72"></a> <span class="bu">print</span>(HELPTEXT)</span> <span><a aria-hidden="true" href="#cb22-73"></a> sys.exit(<span class="dv">2</span>)</span> <span><a aria-hidden="true" href="#cb22-74"></a></span> <span><a aria-hidden="true" href="#cb22-75"></a> mwworker <span class="op">=</span> MwWorker(<span class="vs">r".*(new file|modified).*author[/].*\.mediawiki"</span>)</span> <span><a aria-hidden="true" href="#cb22-76"></a> plainworker <span class="op">=</span> PlainWorker(<span class="vs">r".*(new file|modified).*plain[/].*\.html"</span>)</span> <span><a aria-hidden="true" href="#cb22-77"></a> pdfworker <span class="op">=</span> PdfWorker(<span class="vs">r""</span> <span class="op">+</span> PdfWorker.pdfworkitem)</span> <span><a aria-hidden="true" href="#cb22-78"></a></span> <span><a aria-hidden="true" href="#cb22-79"></a> disp <span class="op">=</span> GitMsgDispatcher(messagefile, [mwworker, plainworker, pdfworker])</span> <span><a aria-hidden="true" href="#cb22-80"></a></span> <span><a aria-hidden="true" href="#cb22-81"></a> sys.exit(<span class="dv">0</span>)</span></code></pre> </div> <h4> Message Dispatcher: gitmsgdispatcher.py </h4> <p> The message dispatcher dispatches work-items from the git message to registered workers. Those workers then can place new work-items for later workers, to pick up work where they stopped working. </p> <p> <strong> ~/projects/idee/generator/commit-msg.py </strong> </p> <div class="sourceCode"> <pre class="sourceCode Python"><code class="sourceCode python"><span><a aria-hidden="true" href="#cb23-1"></a><span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb23-2"></a><span class="co">GitMessageDispatcher with MsgWorker base class.</span></span> <span><a aria-hidden="true" href="#cb23-3"></a></span> <span><a aria-hidden="true" href="#cb23-4"></a><span class="co">@author: Frank Siebert</span></span> <span><a aria-hidden="true" href="#cb23-5"></a><span class="co">@license: https://creativecommons.org/publicdomain/zero/1.0/deed.en</span></span> <span><a aria-hidden="true" href="#cb23-6"></a><span class="co">@date: 2022-03-15</span></span> <span><a aria-hidden="true" href="#cb23-7"></a></span> <span><a aria-hidden="true" href="#cb23-8"></a><span class="co">Instantiate the GitMessageDispatcher with the message</span></span> <span><a aria-hidden="true" href="#cb23-9"></a><span class="co">and with specialized workers. The workers provide a</span></span> <span><a aria-hidden="true" href="#cb23-10"></a><span class="co">pattern matching the lines they claim for work.</span></span> <span><a aria-hidden="true" href="#cb23-11"></a><span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb23-12"></a><span class="im">import</span> re</span> <span><a aria-hidden="true" href="#cb23-13"></a><span class="im">from</span> pathlib <span class="im">import</span> Path</span> <span><a aria-hidden="true" href="#cb23-14"></a><span class="im">from</span> pubmetadata <span class="im">import</span> PubMetaData</span> <span><a aria-hidden="true" href="#cb23-15"></a><span class="im">from</span> sitemap <span class="im">import</span> SiteMap</span> <span><a aria-hidden="true" href="#cb23-16"></a><span class="im">from</span> archive <span class="im">import</span> Archive</span> <span><a aria-hidden="true" href="#cb23-17"></a><span class="im">from</span> rssbuilder <span class="im">import</span> RSSBuilder</span> <span><a aria-hidden="true" href="#cb23-18"></a><span class="im">from</span> idxbuilder <span class="im">import</span> IDXBuilder</span> <span><a aria-hidden="true" href="#cb23-19"></a></span> <span><a aria-hidden="true" href="#cb23-20"></a></span> <span><a aria-hidden="true" href="#cb23-21"></a><span class="kw">class</span> GitMsgDispatcher:</span> <span><a aria-hidden="true" href="#cb23-22"></a> <span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb23-23"></a><span class="co"> Dispatch the lines of the git message to registered workers.</span></span> <span><a aria-hidden="true" href="#cb23-24"></a></span> <span><a aria-hidden="true" href="#cb23-25"></a><span class="co"> Parameters</span></span> <span><a aria-hidden="true" href="#cb23-26"></a><span class="co"> ----------</span></span> <span><a aria-hidden="true" href="#cb23-27"></a><span class="co"> gitmessagepath : Path</span></span> <span><a aria-hidden="true" href="#cb23-28"></a><span class="co"> Path as type str or type Path pointing to the git message.</span></span> <span><a aria-hidden="true" href="#cb23-29"></a></span> <span><a aria-hidden="true" href="#cb23-30"></a><span class="co"> msgworkers : List of MsgWorker</span></span> <span><a aria-hidden="true" href="#cb23-31"></a><span class="co"> The list of message workers is used as worker queue. Workers first</span></span> <span><a aria-hidden="true" href="#cb23-32"></a><span class="co"> in the queue get their workitems first.</span></span> <span><a aria-hidden="true" href="#cb23-33"></a></span> <span><a aria-hidden="true" href="#cb23-34"></a><span class="co"> Workers can return their work result to be picked up by</span></span> <span><a aria-hidden="true" href="#cb23-35"></a><span class="co"> later workers.</span></span> <span><a aria-hidden="true" href="#cb23-36"></a></span> <span><a aria-hidden="true" href="#cb23-37"></a><span class="co"> The ParameterValueWorker runs allways first. Do not add</span></span> <span><a aria-hidden="true" href="#cb23-38"></a><span class="co"> the ParamterValueWorker to the provided list of workers,</span></span> <span><a aria-hidden="true" href="#cb23-39"></a><span class="co"> or it will run twice.</span></span> <span><a aria-hidden="true" href="#cb23-40"></a></span> <span><a aria-hidden="true" href="#cb23-41"></a><span class="co"> The ParameterValueWorker takes care to provide the parameter</span></span> <span><a aria-hidden="true" href="#cb23-42"></a><span class="co"> values provided by the message in place for all workers.</span></span> <span><a aria-hidden="true" href="#cb23-43"></a></span> <span><a aria-hidden="true" href="#cb23-44"></a><span class="co"> When all workers finished, the sitemap is updated, the RSS</span></span> <span><a aria-hidden="true" href="#cb23-45"></a><span class="co"> feed is updated and the index page is updated.</span></span> <span><a aria-hidden="true" href="#cb23-46"></a></span> <span><a aria-hidden="true" href="#cb23-47"></a><span class="co"> Returns</span></span> <span><a aria-hidden="true" href="#cb23-48"></a><span class="co"> -------</span></span> <span><a aria-hidden="true" href="#cb23-49"></a><span class="co"> GitMsgDispatcher.</span></span> <span><a aria-hidden="true" href="#cb23-50"></a></span> <span><a aria-hidden="true" href="#cb23-51"></a><span class="co"> """</span></span> <span><a aria-hidden="true" href="#cb23-52"></a></span> <span><a aria-hidden="true" href="#cb23-53"></a> <span class="kw">def</span> <span class="fu">__init__</span>(<span class="va">self</span>, gitmessagepath, msgworkers):</span> <span><a aria-hidden="true" href="#cb23-54"></a> <span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb23-55"></a><span class="co"> Dispatch the lines of the git message to registered workers.</span></span> <span><a aria-hidden="true" href="#cb23-56"></a></span> <span><a aria-hidden="true" href="#cb23-57"></a><span class="co"> Parameters</span></span> <span><a aria-hidden="true" href="#cb23-58"></a><span class="co"> ----------</span></span> <span><a aria-hidden="true" href="#cb23-59"></a><span class="co"> gitmessagepath : Path</span></span> <span><a aria-hidden="true" href="#cb23-60"></a><span class="co"> DESCRIPTION.</span></span> <span><a aria-hidden="true" href="#cb23-61"></a><span class="co"> *msgworkers : List of MsgWorker</span></span> <span><a aria-hidden="true" href="#cb23-62"></a><span class="co"> DESCRIPTION.</span></span> <span><a aria-hidden="true" href="#cb23-63"></a></span> <span><a aria-hidden="true" href="#cb23-64"></a><span class="co"> Returns</span></span> <span><a aria-hidden="true" href="#cb23-65"></a><span class="co"> -------</span></span> <span><a aria-hidden="true" href="#cb23-66"></a><span class="co"> GitMsgDispatcher.</span></span> <span><a aria-hidden="true" href="#cb23-67"></a></span> <span><a aria-hidden="true" href="#cb23-68"></a><span class="co"> """</span></span> <span><a aria-hidden="true" href="#cb23-69"></a> <span class="va">self</span>.gitmessagepath <span class="op">=</span> gitmessagepath</span> <span><a aria-hidden="true" href="#cb23-70"></a></span> <span><a aria-hidden="true" href="#cb23-71"></a> <span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb23-72"></a><span class="co"> Extract the relevant part of the message</span></span> <span><a aria-hidden="true" href="#cb23-73"></a><span class="co"> """</span></span> <span><a aria-hidden="true" href="#cb23-74"></a> <span class="va">self</span>.worklist <span class="op">=</span> []</span> <span><a aria-hidden="true" href="#cb23-75"></a> <span class="va">self</span>.parameters <span class="op">=</span> ParameterValueWorker()</span> <span><a aria-hidden="true" href="#cb23-76"></a></span> <span><a aria-hidden="true" href="#cb23-77"></a> <span class="cf">with</span> <span class="bu">open</span>(<span class="va">self</span>.gitmessagepath, <span class="st">'r'</span>) <span class="im">as</span> infile:</span> <span><a aria-hidden="true" href="#cb23-78"></a> <span class="co"># </span><span class="al">TODO</span><span class="co">: Do Better. Latest when the git server joins the game</span></span> <span><a aria-hidden="true" href="#cb23-79"></a> <span class="co"># Message section of most interest: "Changes to be committed"</span></span> <span><a aria-hidden="true" href="#cb23-80"></a> <span class="co"># But we are also interested in the parameter values</span></span> <span><a aria-hidden="true" href="#cb23-81"></a> <span class="co"># we placed ealier into the file.</span></span> <span><a aria-hidden="true" href="#cb23-82"></a> <span class="co"># The Start helps us to find the End.</span></span> <span><a aria-hidden="true" href="#cb23-83"></a> start <span class="op">=</span> re.<span class="bu">compile</span>(<span class="vs">r"^# Changes to be committed:"</span>)</span> <span><a aria-hidden="true" href="#cb23-84"></a> <span class="co"># Next uppercase entry starts another message section</span></span> <span><a aria-hidden="true" href="#cb23-85"></a> end <span class="op">=</span> re.<span class="bu">compile</span>(<span class="vs">r"^# [A-Z]"</span>)</span> <span><a aria-hidden="true" href="#cb23-86"></a></span> <span><a aria-hidden="true" href="#cb23-87"></a> started <span class="op">=</span> <span class="va">None</span></span> <span><a aria-hidden="true" href="#cb23-88"></a> <span class="cf">for</span> line <span class="kw">in</span> infile:</span> <span><a aria-hidden="true" href="#cb23-89"></a> <span class="cf">if</span> <span class="va">self</span>.parameters.pattern.match(line):</span> <span><a aria-hidden="true" href="#cb23-90"></a> <span class="va">self</span>.worklist.append(line)</span> <span><a aria-hidden="true" href="#cb23-91"></a> <span class="cf">if</span> <span class="kw">not</span> started <span class="kw">and</span> start.match(line):</span> <span><a aria-hidden="true" href="#cb23-92"></a> started <span class="op">=</span> <span class="va">True</span></span> <span><a aria-hidden="true" href="#cb23-93"></a> <span class="cf">elif</span> started <span class="kw">and</span> end.match(line):</span> <span><a aria-hidden="true" href="#cb23-94"></a> started <span class="op">=</span> <span class="va">False</span></span> <span><a aria-hidden="true" href="#cb23-95"></a> <span class="cf">if</span> started <span class="kw">is</span> <span class="va">True</span>:</span> <span><a aria-hidden="true" href="#cb23-96"></a> <span class="va">self</span>.worklist.append(line)</span> <span><a aria-hidden="true" href="#cb23-97"></a> <span class="cf">if</span> started <span class="kw">is</span> <span class="va">False</span>:</span> <span><a aria-hidden="true" href="#cb23-98"></a> <span class="cf">break</span></span> <span><a aria-hidden="true" href="#cb23-99"></a> infile.close()</span> <span><a aria-hidden="true" href="#cb23-100"></a></span> <span><a aria-hidden="true" href="#cb23-101"></a> <span class="va">self</span>.msgworkers <span class="op">=</span> msgworkers</span> <span><a aria-hidden="true" href="#cb23-102"></a></span> <span><a aria-hidden="true" href="#cb23-103"></a> <span class="va">self</span>.msgworkers.insert(<span class="dv">0</span>, <span class="va">self</span>.parameters)</span> <span><a aria-hidden="true" href="#cb23-104"></a> <span class="va">self</span>.dispatch()</span> <span><a aria-hidden="true" href="#cb23-105"></a></span> <span><a aria-hidden="true" href="#cb23-106"></a> <span class="co"># Save changed publishing meta data, if any.</span></span> <span><a aria-hidden="true" href="#cb23-107"></a> <span class="cf">if</span> PubMetaData.instance:</span> <span><a aria-hidden="true" href="#cb23-108"></a> PubMetaData.instance.save()</span> <span><a aria-hidden="true" href="#cb23-109"></a> <span class="co"># Generate Sitemaps (bilingual)</span></span> <span><a aria-hidden="true" href="#cb23-110"></a> SiteMap().update()</span> <span><a aria-hidden="true" href="#cb23-111"></a> <span class="co"># Generate Archive</span></span> <span><a aria-hidden="true" href="#cb23-112"></a> Archive().update()</span> <span><a aria-hidden="true" href="#cb23-113"></a> <span class="co"># Generate RSS feed (bilingual)</span></span> <span><a aria-hidden="true" href="#cb23-114"></a> RSSBuilder().update()</span> <span><a aria-hidden="true" href="#cb23-115"></a> <span class="co"># Generate Index Pages (English and German Version)</span></span> <span><a aria-hidden="true" href="#cb23-116"></a> IDXBuilder().update()</span> <span><a aria-hidden="true" href="#cb23-117"></a></span> <span><a aria-hidden="true" href="#cb23-118"></a> <span class="kw">def</span> dispatch(<span class="va">self</span>):</span> <span><a aria-hidden="true" href="#cb23-119"></a> <span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb23-120"></a><span class="co"> Dispatch the git message lines to registered message workers.</span></span> <span><a aria-hidden="true" href="#cb23-121"></a></span> <span><a aria-hidden="true" href="#cb23-122"></a><span class="co"> Returns</span></span> <span><a aria-hidden="true" href="#cb23-123"></a><span class="co"> -------</span></span> <span><a aria-hidden="true" href="#cb23-124"></a><span class="co"> None.</span></span> <span><a aria-hidden="true" href="#cb23-125"></a></span> <span><a aria-hidden="true" href="#cb23-126"></a><span class="co"> """</span></span> <span><a aria-hidden="true" href="#cb23-127"></a> <span class="cf">for</span> worker <span class="kw">in</span> <span class="va">self</span>.msgworkers:</span> <span><a aria-hidden="true" href="#cb23-128"></a> <span class="bu">print</span>(<span class="st">"Dispatching work to: </span><span class="sc">{}</span><span class="st">"</span>.<span class="bu">format</span>(<span class="bu">type</span>(worker)))</span> <span><a aria-hidden="true" href="#cb23-129"></a> <span class="cf">for</span> item <span class="kw">in</span> <span class="va">self</span>.worklist:</span> <span><a aria-hidden="true" href="#cb23-130"></a> <span class="cf">if</span> <span class="bu">type</span>(item) <span class="op">==</span> <span class="bu">str</span>:</span> <span><a aria-hidden="true" href="#cb23-131"></a> <span class="cf">if</span> worker.pattern.match(item):</span> <span><a aria-hidden="true" href="#cb23-132"></a> worker.work(<span class="va">self</span>, item)</span> <span><a aria-hidden="true" href="#cb23-133"></a> <span class="cf">if</span> <span class="bu">type</span>(item) <span class="op">==</span> <span class="bu">dict</span>:</span> <span><a aria-hidden="true" href="#cb23-134"></a> worker_match <span class="op">=</span> item[MsgWorker.task_worker_match]</span> <span><a aria-hidden="true" href="#cb23-135"></a> <span class="cf">if</span> worker.pattern.match(worker_match):</span> <span><a aria-hidden="true" href="#cb23-136"></a> worker.work(<span class="va">self</span>, item)</span> <span><a aria-hidden="true" href="#cb23-137"></a></span> <span><a aria-hidden="true" href="#cb23-138"></a></span> <span><a aria-hidden="true" href="#cb23-139"></a><span class="kw">class</span> MsgWorker:</span> <span><a aria-hidden="true" href="#cb23-140"></a> <span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb23-141"></a><span class="co"> Create a worker for lines matching the pattern.</span></span> <span><a aria-hidden="true" href="#cb23-142"></a></span> <span><a aria-hidden="true" href="#cb23-143"></a><span class="co"> Parameters</span></span> <span><a aria-hidden="true" href="#cb23-144"></a><span class="co"> ----------</span></span> <span><a aria-hidden="true" href="#cb23-145"></a><span class="co"> pattern : Pattern</span></span> <span><a aria-hidden="true" href="#cb23-146"></a><span class="co"> A regex pattern matching the lines the worker takes care for.</span></span> <span><a aria-hidden="true" href="#cb23-147"></a></span> <span><a aria-hidden="true" href="#cb23-148"></a><span class="co"> Returns</span></span> <span><a aria-hidden="true" href="#cb23-149"></a><span class="co"> -------</span></span> <span><a aria-hidden="true" href="#cb23-150"></a><span class="co"> MsgWorker.</span></span> <span><a aria-hidden="true" href="#cb23-151"></a></span> <span><a aria-hidden="true" href="#cb23-152"></a><span class="co"> """</span></span> <span><a aria-hidden="true" href="#cb23-153"></a></span> <span><a aria-hidden="true" href="#cb23-154"></a> <span class="co"># Cases to work against</span></span> <span><a aria-hidden="true" href="#cb23-155"></a> CREATEPAT <span class="op">=</span> re.<span class="bu">compile</span>(<span class="vs">r"^#.*(modified:|new file:)"</span>)</span> <span><a aria-hidden="true" href="#cb23-156"></a> RENAMEPAT <span class="op">=</span> re.<span class="bu">compile</span>(<span class="vs">r"^#.*renamed:"</span>)</span> <span><a aria-hidden="true" href="#cb23-157"></a> DELETEPAT <span class="op">=</span> re.<span class="bu">compile</span>(<span class="vs">r"^#.*deleted:"</span>)</span> <span><a aria-hidden="true" href="#cb23-158"></a></span> <span><a aria-hidden="true" href="#cb23-159"></a> <span class="co"># Task types</span></span> <span><a aria-hidden="true" href="#cb23-160"></a> task_worker_match <span class="op">=</span> <span class="st">"workermatch"</span></span> <span><a aria-hidden="true" href="#cb23-161"></a> task_type <span class="op">=</span> <span class="st">"tasktype"</span></span> <span><a aria-hidden="true" href="#cb23-162"></a> task_create <span class="op">=</span> <span class="st">"create"</span></span> <span><a aria-hidden="true" href="#cb23-163"></a> task_rename <span class="op">=</span> <span class="st">"rename"</span></span> <span><a aria-hidden="true" href="#cb23-164"></a> task_delete <span class="op">=</span> <span class="st">"delete"</span></span> <span><a aria-hidden="true" href="#cb23-165"></a></span> <span><a aria-hidden="true" href="#cb23-166"></a> <span class="kw">def</span> <span class="fu">__init__</span>(<span class="va">self</span>, pattern):</span> <span><a aria-hidden="true" href="#cb23-167"></a> <span class="cf">if</span> <span class="bu">isinstance</span>(pattern, <span class="bu">str</span>):</span> <span><a aria-hidden="true" href="#cb23-168"></a> <span class="va">self</span>.pattern <span class="op">=</span> re.<span class="bu">compile</span>(pattern)</span> <span><a aria-hidden="true" href="#cb23-169"></a> <span class="cf">elif</span> <span class="bu">isinstance</span>(pattern, re.Pattern):</span> <span><a aria-hidden="true" href="#cb23-170"></a> <span class="va">self</span>.pattern <span class="op">=</span> pattern</span> <span><a aria-hidden="true" href="#cb23-171"></a> <span class="va">self</span>.dispatcher <span class="op">=</span> <span class="va">None</span> <span class="co"># initialized in work method</span></span> <span><a aria-hidden="true" href="#cb23-172"></a> <span class="va">self</span>.item <span class="op">=</span> <span class="va">None</span> <span class="co"># initialized in work method</span></span> <span><a aria-hidden="true" href="#cb23-173"></a> <span class="va">self</span>.inpath <span class="op">=</span> <span class="va">None</span> <span class="co"># initialized in work method, if any</span></span> <span><a aria-hidden="true" href="#cb23-174"></a> <span class="va">self</span>.delpath <span class="op">=</span> <span class="va">None</span> <span class="co"># initialized in work methid, if any</span></span> <span><a aria-hidden="true" href="#cb23-175"></a> <span class="va">self</span>.outpath <span class="op">=</span> <span class="va">None</span> <span class="co"># initialized in work method, if any</span></span> <span><a aria-hidden="true" href="#cb23-176"></a></span> <span><a aria-hidden="true" href="#cb23-177"></a> <span class="kw">def</span> get_pattern(<span class="va">self</span>):</span> <span><a aria-hidden="true" href="#cb23-178"></a> <span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb23-179"></a><span class="co"> Get the pattern for matching list items.</span></span> <span><a aria-hidden="true" href="#cb23-180"></a></span> <span><a aria-hidden="true" href="#cb23-181"></a><span class="co"> Returns</span></span> <span><a aria-hidden="true" href="#cb23-182"></a><span class="co"> -------</span></span> <span><a aria-hidden="true" href="#cb23-183"></a><span class="co"> Pattern</span></span> <span><a aria-hidden="true" href="#cb23-184"></a><span class="co"> A regex pattern matching the lines the worker takes care for.</span></span> <span><a aria-hidden="true" href="#cb23-185"></a></span> <span><a aria-hidden="true" href="#cb23-186"></a><span class="co"> """</span></span> <span><a aria-hidden="true" href="#cb23-187"></a> <span class="cf">return</span> <span class="va">self</span>.pattern</span> <span><a aria-hidden="true" href="#cb23-188"></a></span> <span><a aria-hidden="true" href="#cb23-189"></a> <span class="kw">def</span> work(<span class="va">self</span>, dispatcher, item):</span> <span><a aria-hidden="true" href="#cb23-190"></a> <span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb23-191"></a><span class="co"> Overwrite this method to implement if required.</span></span> <span><a aria-hidden="true" href="#cb23-192"></a></span> <span><a aria-hidden="true" href="#cb23-193"></a><span class="co"> Call the super().work() method in your new method, to get</span></span> <span><a aria-hidden="true" href="#cb23-194"></a><span class="co"> - self.dispatcher initialized</span></span> <span><a aria-hidden="true" href="#cb23-195"></a><span class="co"> - the inpath initialized (if any), to the file to be processed</span></span> <span><a aria-hidden="true" href="#cb23-196"></a><span class="co"> - the delpath initialized (if any)</span></span> <span><a aria-hidden="true" href="#cb23-197"></a><span class="co"> - the delete() or the process() method or both called, whatever applies</span></span> <span><a aria-hidden="true" href="#cb23-198"></a></span> <span><a aria-hidden="true" href="#cb23-199"></a><span class="co"> Parameters</span></span> <span><a aria-hidden="true" href="#cb23-200"></a><span class="co"> ----------</span></span> <span><a aria-hidden="true" href="#cb23-201"></a><span class="co"> dispatcher : GitMsgDispatcher</span></span> <span><a aria-hidden="true" href="#cb23-202"></a><span class="co"> The dispatcher, which assigned the work item</span></span> <span><a aria-hidden="true" href="#cb23-203"></a></span> <span><a aria-hidden="true" href="#cb23-204"></a><span class="co"> item : str or dict</span></span> <span><a aria-hidden="true" href="#cb23-205"></a><span class="co"> One matching line from the git message.</span></span> <span><a aria-hidden="true" href="#cb23-206"></a><span class="co"> Or complex workitem added by ealier workers.</span></span> <span><a aria-hidden="true" href="#cb23-207"></a></span> <span><a aria-hidden="true" href="#cb23-208"></a><span class="co"> Returns</span></span> <span><a aria-hidden="true" href="#cb23-209"></a><span class="co"> -------</span></span> <span><a aria-hidden="true" href="#cb23-210"></a><span class="co"> None.</span></span> <span><a aria-hidden="true" href="#cb23-211"></a></span> <span><a aria-hidden="true" href="#cb23-212"></a><span class="co"> """</span></span> <span><a aria-hidden="true" href="#cb23-213"></a> <span class="va">self</span>.dispatcher <span class="op">=</span> dispatcher</span> <span><a aria-hidden="true" href="#cb23-214"></a> <span class="va">self</span>.item <span class="op">=</span> item</span> <span><a aria-hidden="true" href="#cb23-215"></a></span> <span><a aria-hidden="true" href="#cb23-216"></a> <span class="co"># For some workers a dictionary is passed as item</span></span> <span><a aria-hidden="true" href="#cb23-217"></a> <span class="cf">if</span> <span class="bu">isinstance</span>(item, <span class="bu">str</span>):</span> <span><a aria-hidden="true" href="#cb23-218"></a> filename <span class="op">=</span> item[<span class="dv">14</span>:].strip()</span> <span><a aria-hidden="true" href="#cb23-219"></a> <span class="va">self</span>.inpath <span class="op">=</span> Path(filename)</span> <span><a aria-hidden="true" href="#cb23-220"></a></span> <span><a aria-hidden="true" href="#cb23-221"></a> <span class="cf">if</span> <span class="va">self</span>.RENAMEPAT.match(item):</span> <span><a aria-hidden="true" href="#cb23-222"></a> <span class="co"># part filename in new and old</span></span> <span><a aria-hidden="true" href="#cb23-223"></a> <span class="co"># filename = line[14:].strip()</span></span> <span><a aria-hidden="true" href="#cb23-224"></a> <span class="co"># self.inpath = Path(filename)</span></span> <span><a aria-hidden="true" href="#cb23-225"></a> <span class="va">self</span>.delpath <span class="op">=</span> <span class="va">None</span> <span class="co"># Needs to be assigned now</span></span> <span><a aria-hidden="true" href="#cb23-226"></a> <span class="va">self</span>.rename()</span> <span><a aria-hidden="true" href="#cb23-227"></a></span> <span><a aria-hidden="true" href="#cb23-228"></a> <span class="cf">if</span> <span class="va">self</span>.CREATEPAT.match(item):</span> <span><a aria-hidden="true" href="#cb23-229"></a> <span class="va">self</span>.process()</span> <span><a aria-hidden="true" href="#cb23-230"></a></span> <span><a aria-hidden="true" href="#cb23-231"></a> <span class="cf">if</span> <span class="va">self</span>.DELETEPAT.match(item):</span> <span><a aria-hidden="true" href="#cb23-232"></a> <span class="va">self</span>.delpath <span class="op">=</span> <span class="va">self</span>.inpath <span class="co"># clear deletion request</span></span> <span><a aria-hidden="true" href="#cb23-233"></a> <span class="va">self</span>.delete()</span> <span><a aria-hidden="true" href="#cb23-234"></a></span> <span><a aria-hidden="true" href="#cb23-235"></a> <span class="cf">if</span> <span class="bu">isinstance</span>(item, <span class="bu">dict</span>):</span> <span><a aria-hidden="true" href="#cb23-236"></a> <span class="va">self</span>.inpath <span class="op">=</span> <span class="va">None</span></span> <span><a aria-hidden="true" href="#cb23-237"></a> <span class="va">self</span>.delpath <span class="op">=</span> <span class="va">None</span></span> <span><a aria-hidden="true" href="#cb23-238"></a></span> <span><a aria-hidden="true" href="#cb23-239"></a> <span class="cf">if</span> item[<span class="va">self</span>.task_type] <span class="op">==</span> <span class="va">self</span>.task_rename:</span> <span><a aria-hidden="true" href="#cb23-240"></a> <span class="va">self</span>.rename()</span> <span><a aria-hidden="true" href="#cb23-241"></a> <span class="cf">if</span> item[<span class="va">self</span>.task_type] <span class="op">==</span> <span class="va">self</span>.task_create:</span> <span><a aria-hidden="true" href="#cb23-242"></a> <span class="va">self</span>.process()</span> <span><a aria-hidden="true" href="#cb23-243"></a> <span class="cf">if</span> item[<span class="va">self</span>.task_type] <span class="op">==</span> <span class="va">self</span>.task_delete:</span> <span><a aria-hidden="true" href="#cb23-244"></a> <span class="va">self</span>.delete()</span> <span><a aria-hidden="true" href="#cb23-245"></a></span> <span><a aria-hidden="true" href="#cb23-246"></a> <span class="kw">def</span> delete(<span class="va">self</span>):</span> <span><a aria-hidden="true" href="#cb23-247"></a> <span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb23-248"></a><span class="co"> Overwrite this method to implement the actual work to be done.</span></span> <span><a aria-hidden="true" href="#cb23-249"></a></span> <span><a aria-hidden="true" href="#cb23-250"></a><span class="co"> The method is called by super.work(), if the message line is</span></span> <span><a aria-hidden="true" href="#cb23-251"></a><span class="co"> about a rename or a deletion.</span></span> <span><a aria-hidden="true" href="#cb23-252"></a></span> <span><a aria-hidden="true" href="#cb23-253"></a><span class="co"> Since renames might go along with additional content change, deletion</span></span> <span><a aria-hidden="true" href="#cb23-254"></a><span class="co"> and re-processing take place both in that case.</span></span> <span><a aria-hidden="true" href="#cb23-255"></a></span> <span><a aria-hidden="true" href="#cb23-256"></a><span class="co"> The path to the resource named in the message is available via</span></span> <span><a aria-hidden="true" href="#cb23-257"></a><span class="co"> self.delpath</span></span> <span><a aria-hidden="true" href="#cb23-258"></a></span> <span><a aria-hidden="true" href="#cb23-259"></a><span class="co"> Depending on the type of content, more than just deleting the</span></span> <span><a aria-hidden="true" href="#cb23-260"></a><span class="co"> file might be required.</span></span> <span><a aria-hidden="true" href="#cb23-261"></a></span> <span><a aria-hidden="true" href="#cb23-262"></a><span class="co"> Parameters</span></span> <span><a aria-hidden="true" href="#cb23-263"></a><span class="co"> ----------</span></span> <span><a aria-hidden="true" href="#cb23-264"></a><span class="co"> None</span></span> <span><a aria-hidden="true" href="#cb23-265"></a></span> <span><a aria-hidden="true" href="#cb23-266"></a><span class="co"> Returns</span></span> <span><a aria-hidden="true" href="#cb23-267"></a><span class="co"> -------</span></span> <span><a aria-hidden="true" href="#cb23-268"></a><span class="co"> None.</span></span> <span><a aria-hidden="true" href="#cb23-269"></a></span> <span><a aria-hidden="true" href="#cb23-270"></a><span class="co"> """</span></span> <span><a aria-hidden="true" href="#cb23-271"></a></span> <span><a aria-hidden="true" href="#cb23-272"></a> <span class="kw">def</span> process(<span class="va">self</span>):</span> <span><a aria-hidden="true" href="#cb23-273"></a> <span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb23-274"></a><span class="co"> Overwrite this method to implement the actual work to be done.</span></span> <span><a aria-hidden="true" href="#cb23-275"></a></span> <span><a aria-hidden="true" href="#cb23-276"></a><span class="co"> The method is called by super.work(), if the message line is</span></span> <span><a aria-hidden="true" href="#cb23-277"></a><span class="co"> about a rename or a new file.</span></span> <span><a aria-hidden="true" href="#cb23-278"></a></span> <span><a aria-hidden="true" href="#cb23-279"></a><span class="co"> Since renames might go along with additional content change, deletion</span></span> <span><a aria-hidden="true" href="#cb23-280"></a><span class="co"> and re-processing take place both in that case.</span></span> <span><a aria-hidden="true" href="#cb23-281"></a></span> <span><a aria-hidden="true" href="#cb23-282"></a><span class="co"> The path to the resource named in the message is available via</span></span> <span><a aria-hidden="true" href="#cb23-283"></a><span class="co"> self.delpath</span></span> <span><a aria-hidden="true" href="#cb23-284"></a></span> <span><a aria-hidden="true" href="#cb23-285"></a><span class="co"> Parameters</span></span> <span><a aria-hidden="true" href="#cb23-286"></a><span class="co"> ----------</span></span> <span><a aria-hidden="true" href="#cb23-287"></a><span class="co"> None</span></span> <span><a aria-hidden="true" href="#cb23-288"></a></span> <span><a aria-hidden="true" href="#cb23-289"></a><span class="co"> Returns</span></span> <span><a aria-hidden="true" href="#cb23-290"></a><span class="co"> -------</span></span> <span><a aria-hidden="true" href="#cb23-291"></a><span class="co"> None.</span></span> <span><a aria-hidden="true" href="#cb23-292"></a></span> <span><a aria-hidden="true" href="#cb23-293"></a><span class="co"> """</span></span> <span><a aria-hidden="true" href="#cb23-294"></a></span> <span><a aria-hidden="true" href="#cb23-295"></a> <span class="kw">def</span> rename(<span class="va">self</span>):</span> <span><a aria-hidden="true" href="#cb23-296"></a> <span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb23-297"></a><span class="co"> Rename by delete and process.</span></span> <span><a aria-hidden="true" href="#cb23-298"></a></span> <span><a aria-hidden="true" href="#cb23-299"></a><span class="co"> Since we do not know, whether next to the rename additional</span></span> <span><a aria-hidden="true" href="#cb23-300"></a><span class="co"> changes were applied, deletion and recreation is savest.</span></span> <span><a aria-hidden="true" href="#cb23-301"></a><span class="co"> """</span></span> <span><a aria-hidden="true" href="#cb23-302"></a> <span class="va">self</span>.delete()</span> <span><a aria-hidden="true" href="#cb23-303"></a> <span class="va">self</span>.process()</span> <span><a aria-hidden="true" href="#cb23-304"></a></span> <span><a aria-hidden="true" href="#cb23-305"></a></span> <span><a aria-hidden="true" href="#cb23-306"></a><span class="kw">class</span> ParameterValueWorker(MsgWorker):</span> <span><a aria-hidden="true" href="#cb23-307"></a> <span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb23-308"></a><span class="co"> The ParameterValueWorker reads parameter value pairs.</span></span> <span><a aria-hidden="true" href="#cb23-309"></a></span> <span><a aria-hidden="true" href="#cb23-310"></a><span class="co"> Example of a line with parameter value pair:</span></span> <span><a aria-hidden="true" href="#cb23-311"></a><span class="co"> # article:author=Firstname Lastname</span></span> <span><a aria-hidden="true" href="#cb23-312"></a></span> <span><a aria-hidden="true" href="#cb23-313"></a><span class="co"> These parameters in the git message allow the injection</span></span> <span><a aria-hidden="true" href="#cb23-314"></a><span class="co"> of values for metadata, which would be otherwise not available.</span></span> <span><a aria-hidden="true" href="#cb23-315"></a></span> <span><a aria-hidden="true" href="#cb23-316"></a><span class="co"> Other workers can access the values dictionary via:</span></span> <span><a aria-hidden="true" href="#cb23-317"></a><span class="co"> dispatcher.parameters.values</span></span> <span><a aria-hidden="true" href="#cb23-318"></a></span> <span><a aria-hidden="true" href="#cb23-319"></a><span class="co"> Parameters</span></span> <span><a aria-hidden="true" href="#cb23-320"></a><span class="co"> ----------</span></span> <span><a aria-hidden="true" href="#cb23-321"></a><span class="co"> super: MsgWorker</span></span> <span><a aria-hidden="true" href="#cb23-322"></a><span class="co"> The ParameterValueWorker is derived from the MsgWorker.</span></span> <span><a aria-hidden="true" href="#cb23-323"></a></span> <span><a aria-hidden="true" href="#cb23-324"></a><span class="co"> Returns</span></span> <span><a aria-hidden="true" href="#cb23-325"></a><span class="co"> -------</span></span> <span><a aria-hidden="true" href="#cb23-326"></a><span class="co"> ParameterValueWorker.</span></span> <span><a aria-hidden="true" href="#cb23-327"></a></span> <span><a aria-hidden="true" href="#cb23-328"></a><span class="co"> """</span></span> <span><a aria-hidden="true" href="#cb23-329"></a></span> <span><a aria-hidden="true" href="#cb23-330"></a> <span class="kw">def</span> <span class="fu">__init__</span>(<span class="va">self</span>, pattern<span class="op">=</span><span class="vs">r"^#.*="</span>):</span> <span><a aria-hidden="true" href="#cb23-331"></a> <span class="bu">super</span>().<span class="fu">__init__</span>(pattern)</span> <span><a aria-hidden="true" href="#cb23-332"></a> <span class="va">self</span>.values <span class="op">=</span> {}</span> <span><a aria-hidden="true" href="#cb23-333"></a></span> <span><a aria-hidden="true" href="#cb23-334"></a> <span class="kw">def</span> work(<span class="va">self</span>, dispatcher, item):</span> <span><a aria-hidden="true" href="#cb23-335"></a> <span class="co">"""."""</span></span> <span><a aria-hidden="true" href="#cb23-336"></a> <span class="bu">super</span>().work(dispatcher, item)</span> <span><a aria-hidden="true" href="#cb23-337"></a></span> <span><a aria-hidden="true" href="#cb23-338"></a> <span class="cf">if</span> item.count(<span class="st">'='</span>) <span class="op">==</span> <span class="dv">1</span>:</span> <span><a aria-hidden="true" href="#cb23-339"></a> lineparts <span class="op">=</span> item.rpartition(<span class="st">'='</span>)</span> <span><a aria-hidden="true" href="#cb23-340"></a> <span class="va">self</span>.values.update(</span> <span><a aria-hidden="true" href="#cb23-341"></a> {lineparts[<span class="dv">0</span>].strip(<span class="st">'#'</span>).strip():</span> <span><a aria-hidden="true" href="#cb23-342"></a> lineparts[<span class="dv">2</span>].strip()}</span> <span><a aria-hidden="true" href="#cb23-343"></a> )</span> <span><a aria-hidden="true" href="#cb23-344"></a></span> <span><a aria-hidden="true" href="#cb23-345"></a></span> <span><a aria-hidden="true" href="#cb23-346"></a><span class="cf">if</span> <span class="va">__name__</span> <span class="op">==</span> <span class="st">"__main__"</span>:</span> <span><a aria-hidden="true" href="#cb23-347"></a> <span class="cf">pass</span></span></code></pre> </div> <h4> MediaWiki Worker: mwworker.py </h4> <p> This worker converts the MediaWiki file to the plain HTML version, which is used for Copy-Edit reading, which is a combined task with the audio recording. </p> <p> I found out that I find typos best, if I read the text loud. Creating the audio is therefore a good option for me to improve the quality of the written text. </p> <p> <strong> ~/projects/idee/generator/mwworker.py </strong> </p> <div class="sourceCode"> <pre class="sourceCode Python"><code class="sourceCode python"><span><a aria-hidden="true" href="#cb24-1"></a><span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb24-2"></a><span class="co">MwWorker is derived from the MsgWorker base class.</span></span> <span><a aria-hidden="true" href="#cb24-3"></a></span> <span><a aria-hidden="true" href="#cb24-4"></a><span class="co">@author: Frank Siebert</span></span> <span><a aria-hidden="true" href="#cb24-5"></a><span class="co">@license: https://creativecommons.org/publicdomain/zero/1.0/deed.en</span></span> <span><a aria-hidden="true" href="#cb24-6"></a><span class="co">@date: 2022-03-15</span></span> <span><a aria-hidden="true" href="#cb24-7"></a></span> <span><a aria-hidden="true" href="#cb24-8"></a><span class="co">The MwWorker takes care of *.mediawiki files</span></span> <span><a aria-hidden="true" href="#cb24-9"></a><span class="co">in the author directory, if changes are committed</span></span> <span><a aria-hidden="true" href="#cb24-10"></a><span class="co">for them.</span></span> <span><a aria-hidden="true" href="#cb24-11"></a><span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb24-12"></a><span class="im">import</span> re</span> <span><a aria-hidden="true" href="#cb24-13"></a><span class="im">import</span> subprocess</span> <span><a aria-hidden="true" href="#cb24-14"></a><span class="im">import</span> sys</span> <span><a aria-hidden="true" href="#cb24-15"></a></span> <span><a aria-hidden="true" href="#cb24-16"></a><span class="im">from</span> bs4 <span class="im">import</span> BeautifulSoup</span> <span><a aria-hidden="true" href="#cb24-17"></a><span class="im">from</span> bs4 <span class="im">import</span> Comment</span> <span><a aria-hidden="true" href="#cb24-18"></a><span class="im">from</span> bs4.builder._htmlparser <span class="im">import</span> HTMLParserTreeBuilder</span> <span><a aria-hidden="true" href="#cb24-19"></a></span> <span><a aria-hidden="true" href="#cb24-20"></a><span class="im">from</span> gitmsgdispatcher <span class="im">import</span> MsgWorker</span> <span><a aria-hidden="true" href="#cb24-21"></a><span class="im">from</span> gitmsgconstants <span class="im">import</span> GitMsgConstants <span class="im">as</span> gmc</span> <span><a aria-hidden="true" href="#cb24-22"></a><span class="im">from</span> pdfworker <span class="im">import</span> PdfWorker</span> <span><a aria-hidden="true" href="#cb24-23"></a><span class="im">from</span> pubmetadata <span class="im">import</span> PubMetaData</span> <span><a aria-hidden="true" href="#cb24-24"></a><span class="im">from</span> pubmetadata <span class="im">import</span> pageurn</span> <span><a aria-hidden="true" href="#cb24-25"></a></span> <span><a aria-hidden="true" href="#cb24-26"></a></span> <span><a aria-hidden="true" href="#cb24-27"></a><span class="kw">class</span> MwWorker(MsgWorker):</span> <span><a aria-hidden="true" href="#cb24-28"></a> <span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb24-29"></a><span class="co"> The MwWorker takes care of *.mediawiki files in the author/ directory.</span></span> <span><a aria-hidden="true" href="#cb24-30"></a></span> <span><a aria-hidden="true" href="#cb24-31"></a><span class="co"> Example of a line taken care for</span></span> <span><a aria-hidden="true" href="#cb24-32"></a><span class="co"> # modified: author/PDF-Icon.mediawiki</span></span> <span><a aria-hidden="true" href="#cb24-33"></a></span> <span><a aria-hidden="true" href="#cb24-34"></a><span class="co"> The line has to be from the section git message section:</span></span> <span><a aria-hidden="true" href="#cb24-35"></a><span class="co"> # Changes to be committed:</span></span> <span><a aria-hidden="true" href="#cb24-36"></a></span> <span><a aria-hidden="true" href="#cb24-37"></a><span class="co"> The main output is an HTML created from the mediawiki file,</span></span> <span><a aria-hidden="true" href="#cb24-38"></a><span class="co"> which is plain (without portal part) and stored in the</span></span> <span><a aria-hidden="true" href="#cb24-39"></a><span class="co"> folder GITROOT/plain/</span></span> <span><a aria-hidden="true" href="#cb24-40"></a></span> <span><a aria-hidden="true" href="#cb24-41"></a><span class="co"> A minor output, a PDF, might be requirested via the message line:</span></span> <span><a aria-hidden="true" href="#cb24-42"></a><span class="co"> # pdf:draft=true</span></span> <span><a aria-hidden="true" href="#cb24-43"></a></span> <span><a aria-hidden="true" href="#cb24-44"></a><span class="co"> The respective PDF is created from HTML and stored in the folder</span></span> <span><a aria-hidden="true" href="#cb24-45"></a><span class="co"> GITROOT/website/pdf/</span></span> <span><a aria-hidden="true" href="#cb24-46"></a></span> <span><a aria-hidden="true" href="#cb24-47"></a><span class="co"> Parameters</span></span> <span><a aria-hidden="true" href="#cb24-48"></a><span class="co"> ----------</span></span> <span><a aria-hidden="true" href="#cb24-49"></a><span class="co"> super: MsgWorker</span></span> <span><a aria-hidden="true" href="#cb24-50"></a><span class="co"> The MwWorker is derived from the MsgWorker.</span></span> <span><a aria-hidden="true" href="#cb24-51"></a></span> <span><a aria-hidden="true" href="#cb24-52"></a><span class="co"> Returns</span></span> <span><a aria-hidden="true" href="#cb24-53"></a><span class="co"> -------</span></span> <span><a aria-hidden="true" href="#cb24-54"></a><span class="co"> MwWorker.</span></span> <span><a aria-hidden="true" href="#cb24-55"></a></span> <span><a aria-hidden="true" href="#cb24-56"></a><span class="co"> """</span></span> <span><a aria-hidden="true" href="#cb24-57"></a></span> <span><a aria-hidden="true" href="#cb24-58"></a> <span class="kw">def</span> <span class="fu">__init__</span>(<span class="va">self</span>, pattern):</span> <span><a aria-hidden="true" href="#cb24-59"></a> <span class="bu">super</span>().<span class="fu">__init__</span>(pattern)</span> <span><a aria-hidden="true" href="#cb24-60"></a> <span class="va">self</span>.values <span class="op">=</span> {}</span> <span><a aria-hidden="true" href="#cb24-61"></a></span> <span><a aria-hidden="true" href="#cb24-62"></a> <span class="at">@staticmethod</span></span> <span><a aria-hidden="true" href="#cb24-63"></a> <span class="kw">def</span> __make_url_migration__(soup):</span> <span><a aria-hidden="true" href="#cb24-64"></a> <span class="co">r"""</span></span> <span><a aria-hidden="true" href="#cb24-65"></a><span class="co"> Migrate the wordpress url pattern to the new one.</span></span> <span><a aria-hidden="true" href="#cb24-66"></a></span> <span><a aria-hidden="true" href="#cb24-67"></a><span class="co"> Articles: r"idee.frank.siebert.de.\d{4}.\d{2}.\d{2}"</span></span> <span><a aria-hidden="true" href="#cb24-68"></a></span> <span><a aria-hidden="true" href="#cb24-69"></a><span class="co"> Parameters</span></span> <span><a aria-hidden="true" href="#cb24-70"></a><span class="co"> ----------</span></span> <span><a aria-hidden="true" href="#cb24-71"></a><span class="co"> soup : BeautifulSoup</span></span> <span><a aria-hidden="true" href="#cb24-72"></a><span class="co"> HTML represented by BeautifulSoup top level object</span></span> <span><a aria-hidden="true" href="#cb24-73"></a></span> <span><a aria-hidden="true" href="#cb24-74"></a><span class="co"> Returns</span></span> <span><a aria-hidden="true" href="#cb24-75"></a><span class="co"> -------</span></span> <span><a aria-hidden="true" href="#cb24-76"></a><span class="co"> soup. r"https://idee.frank-siebert/"</span></span> <span><a aria-hidden="true" href="#cb24-77"></a></span> <span><a aria-hidden="true" href="#cb24-78"></a><span class="co"> """</span></span> <span><a aria-hidden="true" href="#cb24-79"></a> site_r <span class="op">=</span> <span class="vs">r"(https://idee\.frank-siebert\.de)"</span></span> <span><a aria-hidden="true" href="#cb24-80"></a> date_r <span class="op">=</span> <span class="vs">r"([/]\d</span><span class="sc">{4}</span><span class="vs">[/]\d</span><span class="sc">{2}</span><span class="vs">[/]\d</span><span class="sc">{2}</span><span class="vs">[/])"</span> <span class="co"># '/yyyy/MM/dd'</span></span> <span><a aria-hidden="true" href="#cb24-81"></a> article_r <span class="op">=</span> <span class="vs">r"[/][a][r][t][i][c][l][e][/]"</span></span> <span><a aria-hidden="true" href="#cb24-82"></a></span> <span><a aria-hidden="true" href="#cb24-83"></a> <span class="co"># Links to own articles will be addressed by relative path,</span></span> <span><a aria-hidden="true" href="#cb24-84"></a> <span class="co"># In article migration we point to pages in the same location.</span></span> <span><a aria-hidden="true" href="#cb24-85"></a></span> <span><a aria-hidden="true" href="#cb24-86"></a> repattern <span class="op">=</span> re.<span class="bu">compile</span>(site_r <span class="op">+</span> date_r)</span> <span><a aria-hidden="true" href="#cb24-87"></a> tags <span class="op">=</span> soup.find_all(<span class="st">"a"</span>, attrs<span class="op">=</span>{<span class="st">"href"</span>: repattern})</span> <span><a aria-hidden="true" href="#cb24-88"></a></span> <span><a aria-hidden="true" href="#cb24-89"></a> <span class="cf">for</span> tag <span class="kw">in</span> tags:</span> <span><a aria-hidden="true" href="#cb24-90"></a> <span class="co"># in case page internal id was addressed</span></span> <span><a aria-hidden="true" href="#cb24-91"></a> url <span class="op">=</span> tag[<span class="st">"href"</span>].split(<span class="st">"#"</span>)</span> <span><a aria-hidden="true" href="#cb24-92"></a> url[<span class="dv">0</span>] <span class="op">=</span> <span class="st">"./"</span> <span class="op">+</span> repattern.sub(<span class="st">""</span>, url[<span class="dv">0</span>].rstrip(<span class="st">"/"</span>))<span class="op">\</span></span> <span><a aria-hidden="true" href="#cb24-93"></a> <span class="op">+</span> <span class="st">".html"</span></span> <span><a aria-hidden="true" href="#cb24-94"></a> new_url <span class="op">=</span> <span class="st">'#'</span>.join(url)</span> <span><a aria-hidden="true" href="#cb24-95"></a> new_url <span class="op">=</span> new_url.lower() <span class="co"># change camel case to lower case</span></span> <span><a aria-hidden="true" href="#cb24-96"></a> tag.attrs.update({<span class="st">"href"</span>: new_url})</span> <span><a aria-hidden="true" href="#cb24-97"></a></span> <span><a aria-hidden="true" href="#cb24-98"></a> <span class="co"># References to own articles in the new portal</span></span> <span><a aria-hidden="true" href="#cb24-99"></a> <span class="co"># shall be relative as well.</span></span> <span><a aria-hidden="true" href="#cb24-100"></a> reart <span class="op">=</span> re.<span class="bu">compile</span>(site_r<span class="op">+</span>article_r)</span> <span><a aria-hidden="true" href="#cb24-101"></a> tags <span class="op">=</span> soup.find_all(re.<span class="bu">compile</span>(<span class="vs">r"^a$"</span>), attrs<span class="op">=</span>{<span class="st">"href"</span>: reart})</span> <span><a aria-hidden="true" href="#cb24-102"></a> <span class="cf">for</span> tag <span class="kw">in</span> tags:</span> <span><a aria-hidden="true" href="#cb24-103"></a> new_url <span class="op">=</span> <span class="st">"./"</span> <span class="op">+</span> reart.sub(<span class="st">""</span>, tag[<span class="st">"href"</span>])</span> <span><a aria-hidden="true" href="#cb24-104"></a> new_url <span class="op">=</span> new_url.lower()</span> <span><a aria-hidden="true" href="#cb24-105"></a> tag.attrs.update({<span class="st">"href"</span>: new_url})</span> <span><a aria-hidden="true" href="#cb24-106"></a> <span class="cf">return</span> soup</span> <span><a aria-hidden="true" href="#cb24-107"></a></span> <span><a aria-hidden="true" href="#cb24-108"></a> <span class="kw">def</span> process(<span class="va">self</span>):</span> <span><a aria-hidden="true" href="#cb24-109"></a> <span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb24-110"></a><span class="co"> Process the mediawiki files into plain html files.</span></span> <span><a aria-hidden="true" href="#cb24-111"></a></span> <span><a aria-hidden="true" href="#cb24-112"></a><span class="co"> Returns</span></span> <span><a aria-hidden="true" href="#cb24-113"></a><span class="co"> -------</span></span> <span><a aria-hidden="true" href="#cb24-114"></a><span class="co"> None.</span></span> <span><a aria-hidden="true" href="#cb24-115"></a><span class="co"> """</span></span> <span><a aria-hidden="true" href="#cb24-116"></a> <span class="co"># The file name is the title</span></span> <span><a aria-hidden="true" href="#cb24-117"></a> title <span class="op">=</span> <span class="va">self</span>.inpath.stem</span> <span><a aria-hidden="true" href="#cb24-118"></a></span> <span><a aria-hidden="true" href="#cb24-119"></a> <span class="co"># inject meta information from commit message</span></span> <span><a aria-hidden="true" href="#cb24-120"></a> <span class="co"># Creates the single instance of Publishing Dictionary</span></span> <span><a aria-hidden="true" href="#cb24-121"></a> PubMetaData(<span class="va">self</span>.dispatcher.parameters.values)</span> <span><a aria-hidden="true" href="#cb24-122"></a></span> <span><a aria-hidden="true" href="#cb24-123"></a> article_data <span class="op">=</span> PubMetaData.instance.get_new_revision(</span> <span><a aria-hidden="true" href="#cb24-124"></a> title</span> <span><a aria-hidden="true" href="#cb24-125"></a> )</span> <span><a aria-hidden="true" href="#cb24-126"></a></span> <span><a aria-hidden="true" href="#cb24-127"></a> <span class="co"># compose the output path</span></span> <span><a aria-hidden="true" href="#cb24-128"></a> <span class="va">self</span>.outpath <span class="op">=</span> gmc.plainpath <span class="op">/</span> pageurn(title)</span> <span><a aria-hidden="true" href="#cb24-129"></a> <span class="va">self</span>.outpath <span class="op">=</span> <span class="va">self</span>.outpath.with_suffix(<span class="st">".html"</span>)</span> <span><a aria-hidden="true" href="#cb24-130"></a> <span class="va">self</span>.outpath.resolve()</span> <span><a aria-hidden="true" href="#cb24-131"></a></span> <span><a aria-hidden="true" href="#cb24-132"></a> <span class="co"># To enable --toc, the parameter -s (standalone) needs to be set.</span></span> <span><a aria-hidden="true" href="#cb24-133"></a> <span class="co"># This parameter leads to the generation of an html header with</span></span> <span><a aria-hidden="true" href="#cb24-134"></a> <span class="co"># some meta tags.</span></span> <span><a aria-hidden="true" href="#cb24-135"></a></span> <span><a aria-hidden="true" href="#cb24-136"></a> <span class="co"># &lt;!DOCTYPE html&gt;</span></span> <span><a aria-hidden="true" href="#cb24-137"></a> <span class="co"># &lt;html lang="" xml:lang="" xmlns="http://www.w3.org/1999/xhtml"&gt;</span></span> <span><a aria-hidden="true" href="#cb24-138"></a> <span class="co"># &lt;head&gt;</span></span> <span><a aria-hidden="true" href="#cb24-139"></a> <span class="co"># &lt;meta charset="utf-8"/&gt;</span></span> <span><a aria-hidden="true" href="#cb24-140"></a> <span class="co"># &lt;meta content="pandoc" name="generator"/&gt;</span></span> <span><a aria-hidden="true" href="#cb24-141"></a> <span class="co"># &lt;meta content="width=device-width, initial-scale=1.0,</span></span> <span><a aria-hidden="true" href="#cb24-142"></a> <span class="co"># user-scalable=yes" name="viewport"/&gt;</span></span> <span><a aria-hidden="true" href="#cb24-143"></a> <span class="co"># &lt;title&gt;</span></span> <span><a aria-hidden="true" href="#cb24-144"></a> <span class="co"># Verstehen</span></span> <span><a aria-hidden="true" href="#cb24-145"></a> <span class="co"># &lt;/title&gt;</span></span> <span><a aria-hidden="true" href="#cb24-146"></a> <span class="co"># &lt;style&gt;</span></span> <span><a aria-hidden="true" href="#cb24-147"></a> <span class="co"># code{white-space: pre-wrap;}</span></span> <span><a aria-hidden="true" href="#cb24-148"></a> <span class="co"># span.smallcaps{font-variant: small-caps;}</span></span> <span><a aria-hidden="true" href="#cb24-149"></a> <span class="co"># span.underline{text-decoration: underline;}</span></span> <span><a aria-hidden="true" href="#cb24-150"></a> <span class="co"># div.column{display: inline-block; vertical-align:</span></span> <span><a aria-hidden="true" href="#cb24-151"></a> <span class="co"># top; width: 50%;}</span></span> <span><a aria-hidden="true" href="#cb24-152"></a> <span class="co"># div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;}</span></span> <span><a aria-hidden="true" href="#cb24-153"></a> <span class="co"># ul.task-list{list-style: none;}</span></span> <span><a aria-hidden="true" href="#cb24-154"></a> <span class="co"># &lt;/style&gt;</span></span> <span><a aria-hidden="true" href="#cb24-155"></a> <span class="co"># &lt;/head&gt;</span></span> <span><a aria-hidden="true" href="#cb24-156"></a> <span class="co"># &lt;body&gt;</span></span> <span><a aria-hidden="true" href="#cb24-157"></a> <span class="co"># &lt;/body&gt;</span></span> <span><a aria-hidden="true" href="#cb24-158"></a> <span class="co"># &lt;/html&gt;</span></span> <span><a aria-hidden="true" href="#cb24-159"></a></span> <span><a aria-hidden="true" href="#cb24-160"></a> <span class="co"># The TOC is created as &lt;nav id="TOC"&gt; tag,</span></span> <span><a aria-hidden="true" href="#cb24-161"></a> <span class="co"># and it is not placed at the __TOC__</span></span> <span><a aria-hidden="true" href="#cb24-162"></a> <span class="co"># location specified in the mediawiki page.</span></span> <span><a aria-hidden="true" href="#cb24-163"></a></span> <span><a aria-hidden="true" href="#cb24-164"></a> <span class="co"># Also __NOTOC__ is not honored.</span></span> <span><a aria-hidden="true" href="#cb24-165"></a></span> <span><a aria-hidden="true" href="#cb24-166"></a> <span class="co"># Own meta data lines need be injected and</span></span> <span><a aria-hidden="true" href="#cb24-167"></a> <span class="co"># the toc needs to be moved to the correct location if specified,</span></span> <span><a aria-hidden="true" href="#cb24-168"></a> <span class="co"># or removed, if specified.</span></span> <span><a aria-hidden="true" href="#cb24-169"></a> imgdir <span class="op">=</span> gmc.imagepath</span> <span><a aria-hidden="true" href="#cb24-170"></a> imgdir.resolve()</span> <span><a aria-hidden="true" href="#cb24-171"></a> htmltext <span class="op">=</span> subprocess.run([<span class="st">"pandoc"</span>,</span> <span><a aria-hidden="true" href="#cb24-172"></a> <span class="co"># extract media to the folder</span></span> <span><a aria-hidden="true" href="#cb24-173"></a> <span class="co"># disabled after migration</span></span> <span><a aria-hidden="true" href="#cb24-174"></a> <span class="co"># "--extract-media={}".format(imgdir),</span></span> <span><a aria-hidden="true" href="#cb24-175"></a> <span class="co"># standalone (full html)</span></span> <span><a aria-hidden="true" href="#cb24-176"></a> <span class="st">"-s"</span>,</span> <span><a aria-hidden="true" href="#cb24-177"></a> <span class="co"># create table of content</span></span> <span><a aria-hidden="true" href="#cb24-178"></a> <span class="st">"--toc"</span>,</span> <span><a aria-hidden="true" href="#cb24-179"></a> <span class="st">"--toc-depth=5"</span>,</span> <span><a aria-hidden="true" href="#cb24-180"></a> <span class="co"># mediawiki markup as input format</span></span> <span><a aria-hidden="true" href="#cb24-181"></a> <span class="st">"-f"</span>, <span class="st">"mediawiki"</span>,</span> <span><a aria-hidden="true" href="#cb24-182"></a> <span class="co"># html as output format</span></span> <span><a aria-hidden="true" href="#cb24-183"></a> <span class="st">"-t"</span>, <span class="st">"html"</span>,</span> <span><a aria-hidden="true" href="#cb24-184"></a> <span class="co"># input file</span></span> <span><a aria-hidden="true" href="#cb24-185"></a> <span class="st">"-i"</span>, <span class="va">self</span>.inpath</span> <span><a aria-hidden="true" href="#cb24-186"></a> <span class="co"># don't use stdout, return the result</span></span> <span><a aria-hidden="true" href="#cb24-187"></a> ], capture_output<span class="op">=</span><span class="va">True</span>)</span> <span><a aria-hidden="true" href="#cb24-188"></a></span> <span><a aria-hidden="true" href="#cb24-189"></a> html_doc <span class="op">=</span> htmltext.stdout.decode(<span class="st">"utf-8"</span>)</span> <span><a aria-hidden="true" href="#cb24-190"></a> builder <span class="op">=</span> HTMLParserTreeBuilder()</span> <span><a aria-hidden="true" href="#cb24-191"></a> soup <span class="op">=</span> BeautifulSoup(html_doc, builder<span class="op">=</span>builder)</span> <span><a aria-hidden="true" href="#cb24-192"></a></span> <span><a aria-hidden="true" href="#cb24-193"></a> <span class="co"># stupid but not avoidable:</span></span> <span><a aria-hidden="true" href="#cb24-194"></a> <span class="co"># pandoc does not know where we store the plain html.</span></span> <span><a aria-hidden="true" href="#cb24-195"></a> <span class="co"># therefore it cannot set the links to medias correctly.</span></span> <span><a aria-hidden="true" href="#cb24-196"></a> <span class="co"># we have to give a helping hand</span></span> <span><a aria-hidden="true" href="#cb24-197"></a> <span class="co"># We could pandoc tell to use another working director to get</span></span> <span><a aria-hidden="true" href="#cb24-198"></a> <span class="co"># the paths correct. </span><span class="al">TODO</span><span class="co">: change this when folders move again</span></span> <span><a aria-hidden="true" href="#cb24-199"></a> tags <span class="op">=</span> soup.find_all(<span class="st">"img"</span>)</span> <span><a aria-hidden="true" href="#cb24-200"></a> <span class="cf">if</span> tags:</span> <span><a aria-hidden="true" href="#cb24-201"></a> <span class="cf">for</span> tag <span class="kw">in</span> tags:</span> <span><a aria-hidden="true" href="#cb24-202"></a> tag.attrs.update({<span class="st">"src"</span>: <span class="st">"../website/image/"</span> <span class="op">+</span> tag[<span class="st">"src"</span>]})</span> <span><a aria-hidden="true" href="#cb24-203"></a> <span class="co"># since we are already here, provide a cheap</span></span> <span><a aria-hidden="true" href="#cb24-204"></a> <span class="co"># picture maximization via hraf to target _blank</span></span> <span><a aria-hidden="true" href="#cb24-205"></a> newtag <span class="op">=</span> soup.new_tag(<span class="st">"a"</span>)</span> <span><a aria-hidden="true" href="#cb24-206"></a> tag.insert_after(newtag)</span> <span><a aria-hidden="true" href="#cb24-207"></a> newtag.insert(<span class="dv">0</span>, tag)</span> <span><a aria-hidden="true" href="#cb24-208"></a> href <span class="op">=</span> tag[<span class="st">"src"</span>]</span> <span><a aria-hidden="true" href="#cb24-209"></a> <span class="co"># Special exception for licence icons</span></span> <span><a aria-hidden="true" href="#cb24-210"></a> creative_commons <span class="op">=</span> re.<span class="bu">compile</span>(</span> <span><a aria-hidden="true" href="#cb24-211"></a> <span class="vs">r".*CC-Icon.png"</span>)</span> <span><a aria-hidden="true" href="#cb24-212"></a> href <span class="op">=</span> creative_commons.sub(</span> <span><a aria-hidden="true" href="#cb24-213"></a> <span class="st">"creative-commons-cc0-1-0-universal.html"</span>,</span> <span><a aria-hidden="true" href="#cb24-214"></a> href)</span> <span><a aria-hidden="true" href="#cb24-215"></a> creative_commons_0 <span class="op">=</span> re.<span class="bu">compile</span>(</span> <span><a aria-hidden="true" href="#cb24-216"></a> <span class="vs">r".*CC0-Icon.png"</span>)</span> <span><a aria-hidden="true" href="#cb24-217"></a> href <span class="op">=</span> creative_commons_0.sub(</span> <span><a aria-hidden="true" href="#cb24-218"></a> <span class="st">"creative-commons-cc0-1-0-universal.html"</span>,</span> <span><a aria-hidden="true" href="#cb24-219"></a> href)</span> <span><a aria-hidden="true" href="#cb24-220"></a> newtag.attrs.update({<span class="st">"href"</span>: href, <span class="st">"target"</span>: <span class="st">"_blank"</span>})</span> <span><a aria-hidden="true" href="#cb24-221"></a></span> <span><a aria-hidden="true" href="#cb24-222"></a> <span class="co"># inject language information</span></span> <span><a aria-hidden="true" href="#cb24-223"></a> tag <span class="op">=</span> soup.find(<span class="st">"html"</span>)</span> <span><a aria-hidden="true" href="#cb24-224"></a> tag.attrs.update({<span class="st">"lang"</span>: article_data[PubMetaData.locale]})</span> <span><a aria-hidden="true" href="#cb24-225"></a> tag.attrs.update({<span class="st">"xml:lang"</span>: article_data[PubMetaData.locale]})</span> <span><a aria-hidden="true" href="#cb24-226"></a></span> <span><a aria-hidden="true" href="#cb24-227"></a> <span class="co"># inject stylesheet link</span></span> <span><a aria-hidden="true" href="#cb24-228"></a> <span class="co"># &lt;link rel="stylesheet" href="../website/css/fs.css"/&gt;</span></span> <span><a aria-hidden="true" href="#cb24-229"></a> tag <span class="op">=</span> soup.find(<span class="st">"head"</span>)</span> <span><a aria-hidden="true" href="#cb24-230"></a> newtag <span class="op">=</span> soup.new_tag(<span class="st">"link"</span>)</span> <span><a aria-hidden="true" href="#cb24-231"></a> newtag.attrs.update(</span> <span><a aria-hidden="true" href="#cb24-232"></a> {<span class="st">"rel"</span>: <span class="st">"stylesheet"</span>, <span class="st">"href"</span>: <span class="st">"../website/css/fs.css"</span>})</span> <span><a aria-hidden="true" href="#cb24-233"></a> tag.insert(<span class="dv">6</span>, newtag)</span> <span><a aria-hidden="true" href="#cb24-234"></a></span> <span><a aria-hidden="true" href="#cb24-235"></a> <span class="cf">for</span> key <span class="kw">in</span> article_data.keys():</span> <span><a aria-hidden="true" href="#cb24-236"></a> <span class="cf">if</span> (key.startswith(<span class="st">'og:'</span>) <span class="kw">or</span> key.startswith(<span class="st">'article:'</span>)):</span> <span><a aria-hidden="true" href="#cb24-237"></a> newtag <span class="op">=</span> soup.new_tag(<span class="st">"meta"</span>)</span> <span><a aria-hidden="true" href="#cb24-238"></a> newtag.attrs.update({<span class="st">"property"</span>: key,</span> <span><a aria-hidden="true" href="#cb24-239"></a> <span class="st">"content"</span>: article_data[key]})</span> <span><a aria-hidden="true" href="#cb24-240"></a> tag.insert(<span class="dv">6</span>, newtag)</span> <span><a aria-hidden="true" href="#cb24-241"></a></span> <span><a aria-hidden="true" href="#cb24-242"></a> <span class="co"># my own invention: article:urn</span></span> <span><a aria-hidden="true" href="#cb24-243"></a> newtag <span class="op">=</span> soup.new_tag(<span class="st">"meta"</span>)</span> <span><a aria-hidden="true" href="#cb24-244"></a> newtag.attrs.update({<span class="st">"property"</span>: PubMetaData.urn,</span> <span><a aria-hidden="true" href="#cb24-245"></a> <span class="st">"content"</span>: article_data.name})</span> <span><a aria-hidden="true" href="#cb24-246"></a> tag.insert(<span class="dv">6</span>, newtag)</span> <span><a aria-hidden="true" href="#cb24-247"></a></span> <span><a aria-hidden="true" href="#cb24-248"></a> <span class="co"># http://www.gnuterrypratchett.com/</span></span> <span><a aria-hidden="true" href="#cb24-249"></a> newtag <span class="op">=</span> soup.new_tag(<span class="st">"meta"</span>)</span> <span><a aria-hidden="true" href="#cb24-250"></a> newtag.attrs.update({<span class="st">"http-equiv"</span>: <span class="st">"X-Clacks-Overhead"</span>,</span> <span><a aria-hidden="true" href="#cb24-251"></a> <span class="st">"content"</span>: <span class="st">"Terry Pratchett"</span>})</span> <span><a aria-hidden="true" href="#cb24-252"></a></span> <span><a aria-hidden="true" href="#cb24-253"></a> <span class="co"># inject the generator meta information.</span></span> <span><a aria-hidden="true" href="#cb24-254"></a> <span class="co"># one exists already</span></span> <span><a aria-hidden="true" href="#cb24-255"></a> tag <span class="op">=</span> soup.find(<span class="st">"meta"</span>, attrs<span class="op">=</span>{<span class="st">"name"</span>: <span class="st">"generator"</span>})</span> <span><a aria-hidden="true" href="#cb24-256"></a> tag.attrs.update({<span class="st">"name"</span>: <span class="st">"generator"</span>, <span class="st">"content"</span>: gmc.generator})</span> <span><a aria-hidden="true" href="#cb24-257"></a></span> <span><a aria-hidden="true" href="#cb24-258"></a> <span class="co"># WikiLinks [https://webpage https//webpage]</span></span> <span><a aria-hidden="true" href="#cb24-259"></a> <span class="co"># leads to nested anchor tags.</span></span> <span><a aria-hidden="true" href="#cb24-260"></a> <span class="co"># The resulting page works in firefox, but it is no valid html.</span></span> <span><a aria-hidden="true" href="#cb24-261"></a> <span class="co"># We use soup for the correction.</span></span> <span><a aria-hidden="true" href="#cb24-262"></a> tags <span class="op">=</span> soup.find_all(<span class="st">"a"</span>)</span> <span><a aria-hidden="true" href="#cb24-263"></a> <span class="cf">for</span> tag <span class="kw">in</span> tags:</span> <span><a aria-hidden="true" href="#cb24-264"></a> nested_a <span class="op">=</span> tag.find(<span class="st">"a"</span>)</span> <span><a aria-hidden="true" href="#cb24-265"></a> <span class="cf">if</span> nested_a:</span> <span><a aria-hidden="true" href="#cb24-266"></a> atext <span class="op">=</span> <span class="st">""</span> <span class="op">+</span> nested_a.text</span> <span><a aria-hidden="true" href="#cb24-267"></a> nested_a.decompose()</span> <span><a aria-hidden="true" href="#cb24-268"></a> tag.append(atext)</span> <span><a aria-hidden="true" href="#cb24-269"></a></span> <span><a aria-hidden="true" href="#cb24-270"></a> <span class="co"># use a better symbol for backreferences</span></span> <span><a aria-hidden="true" href="#cb24-271"></a> tags <span class="op">=</span> soup.find_all(<span class="st">"a"</span>, text<span class="op">=</span><span class="st">'↩︎'</span>)</span> <span><a aria-hidden="true" href="#cb24-272"></a> <span class="cf">for</span> tag <span class="kw">in</span> tags:</span> <span><a aria-hidden="true" href="#cb24-273"></a> tag.clear()</span> <span><a aria-hidden="true" href="#cb24-274"></a> tag.append(<span class="st">'↑'</span>)</span> <span><a aria-hidden="true" href="#cb24-275"></a></span> <span><a aria-hidden="true" href="#cb24-276"></a> <span class="co"># Move the TOC to the correct location</span></span> <span><a aria-hidden="true" href="#cb24-277"></a> toc <span class="op">=</span> soup.find(<span class="st">"nav"</span>, <span class="bu">id</span><span class="op">=</span><span class="st">'TOC'</span>)</span> <span><a aria-hidden="true" href="#cb24-278"></a> tag <span class="op">=</span> soup.find(<span class="st">"p"</span>, text<span class="op">=</span><span class="st">'__TOC__'</span>)</span> <span><a aria-hidden="true" href="#cb24-279"></a> <span class="cf">if</span> tag:</span> <span><a aria-hidden="true" href="#cb24-280"></a> tag.replace_with(toc)</span> <span><a aria-hidden="true" href="#cb24-281"></a> <span class="cf">else</span>:</span> <span><a aria-hidden="true" href="#cb24-282"></a> tag <span class="op">=</span> soup.find(<span class="st">"p"</span>, text<span class="op">=</span><span class="st">'__NOTOC__'</span>)</span> <span><a aria-hidden="true" href="#cb24-283"></a> <span class="cf">if</span> tag:</span> <span><a aria-hidden="true" href="#cb24-284"></a> tag.decompose()</span> <span><a aria-hidden="true" href="#cb24-285"></a> toc.decompose()</span> <span><a aria-hidden="true" href="#cb24-286"></a></span> <span><a aria-hidden="true" href="#cb24-287"></a> <span class="co"># Footnotes get not placed at the location</span></span> <span><a aria-hidden="true" href="#cb24-288"></a> <span class="co"># of the &lt;references/&gt; tag.</span></span> <span><a aria-hidden="true" href="#cb24-289"></a></span> <span><a aria-hidden="true" href="#cb24-290"></a> <span class="co"># Footnotes are generated as section</span></span> <span><a aria-hidden="true" href="#cb24-291"></a> <span class="co"># &lt;section class="footnotes" role="doc-endnotes"&gt;</span></span> <span><a aria-hidden="true" href="#cb24-292"></a></span> <span><a aria-hidden="true" href="#cb24-293"></a> <span class="co"># Search Section and use it to replace References.</span></span> <span><a aria-hidden="true" href="#cb24-294"></a> footnotes <span class="op">=</span> soup.find(<span class="st">"section"</span>, class_<span class="op">=</span><span class="st">"footnotes"</span>)</span> <span><a aria-hidden="true" href="#cb24-295"></a> <span class="cf">if</span> footnotes:</span> <span><a aria-hidden="true" href="#cb24-296"></a> tag <span class="op">=</span> soup.find(<span class="st">"references"</span>)</span> <span><a aria-hidden="true" href="#cb24-297"></a> <span class="cf">if</span> tag:</span> <span><a aria-hidden="true" href="#cb24-298"></a> tag.replace_with(footnotes)</span> <span><a aria-hidden="true" href="#cb24-299"></a> <span class="cf">else</span>:</span> <span><a aria-hidden="true" href="#cb24-300"></a> <span class="bu">print</span>(<span class="st">"Provode a reference tag as footnote target location."</span>)</span> <span><a aria-hidden="true" href="#cb24-301"></a> sys.exit(<span class="dv">1</span>)</span> <span><a aria-hidden="true" href="#cb24-302"></a></span> <span><a aria-hidden="true" href="#cb24-303"></a> <span class="co"># Category-Links get a title "wikilink"</span></span> <span><a aria-hidden="true" href="#cb24-304"></a> <span class="co"># Add those anchors a class "category" to hide them until</span></span> <span><a aria-hidden="true" href="#cb24-305"></a> <span class="co"># I decide to use them.</span></span> <span><a aria-hidden="true" href="#cb24-306"></a></span> <span><a aria-hidden="true" href="#cb24-307"></a> <span class="co"># But "Kategorie:Artikel" gets removed. These are all articles.</span></span> <span><a aria-hidden="true" href="#cb24-308"></a> tag <span class="op">=</span> soup.find(<span class="st">"a"</span>, href<span class="op">=</span><span class="st">"Kategorie:Artikel"</span>)</span> <span><a aria-hidden="true" href="#cb24-309"></a> <span class="cf">if</span> tag:</span> <span><a aria-hidden="true" href="#cb24-310"></a> tag.decompose()</span> <span><a aria-hidden="true" href="#cb24-311"></a></span> <span><a aria-hidden="true" href="#cb24-312"></a> tags <span class="op">=</span> soup.find_all(<span class="st">"a"</span>, title<span class="op">=</span><span class="st">"wikilink"</span>)</span> <span><a aria-hidden="true" href="#cb24-313"></a> <span class="cf">for</span> tag <span class="kw">in</span> tags:</span> <span><a aria-hidden="true" href="#cb24-314"></a> tag.attrs.update({<span class="st">"class"</span>: <span class="st">"category"</span>})</span> <span><a aria-hidden="true" href="#cb24-315"></a></span> <span><a aria-hidden="true" href="#cb24-316"></a> site_r <span class="op">=</span> <span class="vs">r"https://idee\.frank-siebert\.de"</span></span> <span><a aria-hidden="true" href="#cb24-317"></a> date_r <span class="op">=</span> <span class="vs">r"[/]\d</span><span class="sc">{4}</span><span class="vs">[/]\d</span><span class="sc">{2}</span><span class="vs">[/]\d</span><span class="sc">{2}</span><span class="vs">[/]"</span> <span class="co"># '/yyyy/MM/dd'</span></span> <span><a aria-hidden="true" href="#cb24-318"></a> article_r <span class="op">=</span> <span class="vs">r"[/][a][r][t][i][c][l][e][/]"</span></span> <span><a aria-hidden="true" href="#cb24-319"></a></span> <span><a aria-hidden="true" href="#cb24-320"></a> <span class="co"># Links to own articles will be addressed by relative path,</span></span> <span><a aria-hidden="true" href="#cb24-321"></a> <span class="co"># In article migration we point to pages in the same location.</span></span> <span><a aria-hidden="true" href="#cb24-322"></a></span> <span><a aria-hidden="true" href="#cb24-323"></a> repattern <span class="op">=</span> re.<span class="bu">compile</span>(site_r <span class="op">+</span> date_r)</span> <span><a aria-hidden="true" href="#cb24-324"></a> tags <span class="op">=</span> soup.find_all(<span class="st">"a"</span>, href<span class="op">=</span>repattern)</span> <span><a aria-hidden="true" href="#cb24-325"></a></span> <span><a aria-hidden="true" href="#cb24-326"></a> <span class="cf">for</span> tag <span class="kw">in</span> tags:</span> <span><a aria-hidden="true" href="#cb24-327"></a> <span class="co"># in case page internal id was addressed</span></span> <span><a aria-hidden="true" href="#cb24-328"></a> url <span class="op">=</span> tag[<span class="st">"href"</span>].split(<span class="st">"#"</span>)</span> <span><a aria-hidden="true" href="#cb24-329"></a> url[<span class="dv">0</span>] <span class="op">=</span> <span class="st">"./"</span> <span class="op">+</span> repattern.sub(<span class="st">""</span>, url[<span class="dv">0</span>].rstrip(<span class="st">"/"</span>)) <span class="op">+</span> <span class="st">".html"</span></span> <span><a aria-hidden="true" href="#cb24-330"></a> new_url <span class="op">=</span> <span class="st">'#'</span>.join(url)</span> <span><a aria-hidden="true" href="#cb24-331"></a> new_url <span class="op">=</span> new_url.lower() <span class="co"># change camel case to lower case</span></span> <span><a aria-hidden="true" href="#cb24-332"></a> tag.attrs.update({<span class="st">"href"</span>: new_url})</span> <span><a aria-hidden="true" href="#cb24-333"></a></span> <span><a aria-hidden="true" href="#cb24-334"></a> <span class="co"># Links to other resources will be also addressed by relative path,</span></span> <span><a aria-hidden="true" href="#cb24-335"></a> <span class="co"># Those resources need to be addressed by ../</span></span> <span><a aria-hidden="true" href="#cb24-336"></a></span> <span><a aria-hidden="true" href="#cb24-337"></a> repattern <span class="op">=</span> re.<span class="bu">compile</span>(site_r)</span> <span><a aria-hidden="true" href="#cb24-338"></a> tags <span class="op">=</span> soup.find_all(<span class="st">"a"</span>, href<span class="op">=</span>repattern)</span> <span><a aria-hidden="true" href="#cb24-339"></a></span> <span><a aria-hidden="true" href="#cb24-340"></a> <span class="cf">for</span> tag <span class="kw">in</span> tags:</span> <span><a aria-hidden="true" href="#cb24-341"></a> new_url <span class="op">=</span> repattern.sub(<span class="st">".."</span>, tag[<span class="st">"href"</span>])</span> <span><a aria-hidden="true" href="#cb24-342"></a> new_url <span class="op">=</span> new_url.lower() <span class="co"># change camel case to lower case</span></span> <span><a aria-hidden="true" href="#cb24-343"></a> tag.attrs.update({<span class="st">"href"</span>: new_url})</span> <span><a aria-hidden="true" href="#cb24-344"></a></span> <span><a aria-hidden="true" href="#cb24-345"></a> <span class="co"># References to own articles in the new portal</span></span> <span><a aria-hidden="true" href="#cb24-346"></a> <span class="co"># shall be relative as well.</span></span> <span><a aria-hidden="true" href="#cb24-347"></a> reart <span class="op">=</span> re.<span class="bu">compile</span>(site_r<span class="op">+</span>article_r)</span> <span><a aria-hidden="true" href="#cb24-348"></a> tags <span class="op">=</span> soup.find_all(re.<span class="bu">compile</span>(<span class="vs">r"^a$"</span>), attrs<span class="op">=</span>{<span class="st">"href"</span>: reart})</span> <span><a aria-hidden="true" href="#cb24-349"></a> <span class="cf">for</span> tag <span class="kw">in</span> tags:</span> <span><a aria-hidden="true" href="#cb24-350"></a> new_url <span class="op">=</span> <span class="st">"./"</span> <span class="op">+</span> reart.sub(<span class="st">""</span>, tag[<span class="st">"href"</span>])</span> <span><a aria-hidden="true" href="#cb24-351"></a> new_url <span class="op">=</span> new_url.lower()</span> <span><a aria-hidden="true" href="#cb24-352"></a> tag.attrs.update({<span class="st">"href"</span>: new_url})</span> <span><a aria-hidden="true" href="#cb24-353"></a></span> <span><a aria-hidden="true" href="#cb24-354"></a> <span class="co"># its about articles, one article a page.</span></span> <span><a aria-hidden="true" href="#cb24-355"></a> <span class="co"># For later site function injection, we need a</span></span> <span><a aria-hidden="true" href="#cb24-356"></a> <span class="co"># container around the main content.</span></span> <span><a aria-hidden="true" href="#cb24-357"></a></span> <span><a aria-hidden="true" href="#cb24-358"></a> <span class="co"># After reading https://html.spec.whatwg.org/dev/sections.html</span></span> <span><a aria-hidden="true" href="#cb24-359"></a> <span class="co"># I go for this structure:</span></span> <span><a aria-hidden="true" href="#cb24-360"></a> <span class="co"># &lt;body&gt;</span></span> <span><a aria-hidden="true" href="#cb24-361"></a> <span class="co"># &lt;header"&gt;</span></span> <span><a aria-hidden="true" href="#cb24-362"></a> <span class="co"># &lt;/header&gt; Injected by SSI module in nginx</span></span> <span><a aria-hidden="true" href="#cb24-363"></a> <span class="co"># &lt;main&gt; as semantic element for the main content</span></span> <span><a aria-hidden="true" href="#cb24-364"></a> <span class="co"># &lt;article&gt; as semantic element for the article</span></span> <span><a aria-hidden="true" href="#cb24-365"></a> <span class="co"># &lt;header&gt; an article header</span></span> <span><a aria-hidden="true" href="#cb24-366"></a> <span class="co"># &lt;h1&gt;</span></span> <span><a aria-hidden="true" href="#cb24-367"></a> <span class="co"># &lt;div&gt;</span></span> <span><a aria-hidden="true" href="#cb24-368"></a> <span class="co"># &lt;time pubdate="true" datetime=</span></span> <span><a aria-hidden="true" href="#cb24-369"></a> <span class="co"># "2022-01-19T13:03:08"&gt;</span></span> <span><a aria-hidden="true" href="#cb24-370"></a> <span class="co"># 2022-01-19</span></span> <span><a aria-hidden="true" href="#cb24-371"></a> <span class="co"># &lt;/time&gt;</span></span> <span><a aria-hidden="true" href="#cb24-372"></a> <span class="co"># &lt;address&gt;Author Name&lt;/address&gt;</span></span> <span><a aria-hidden="true" href="#cb24-373"></a> body <span class="op">=</span> soup.find(<span class="st">"body"</span>)</span> <span><a aria-hidden="true" href="#cb24-374"></a> newbody <span class="op">=</span> soup.new_tag(<span class="st">"body"</span>) <span class="co"># temporary container</span></span> <span><a aria-hidden="true" href="#cb24-375"></a></span> <span><a aria-hidden="true" href="#cb24-376"></a> <span class="co"># SSI header injection is a function of the language</span></span> <span><a aria-hidden="true" href="#cb24-377"></a> <span class="cf">if</span> article_data[PubMetaData.locale].startswith(<span class="st">"de"</span>):</span> <span><a aria-hidden="true" href="#cb24-378"></a> newtag <span class="op">=</span> Comment(<span class="st">'# include file="/portal/idee-header.html" '</span>)</span> <span><a aria-hidden="true" href="#cb24-379"></a> <span class="cf">else</span>:</span> <span><a aria-hidden="true" href="#cb24-380"></a> newtag <span class="op">=</span> Comment(<span class="st">'# include file="/portal/concept-header.html" '</span>)</span> <span><a aria-hidden="true" href="#cb24-381"></a> newbody.insert(<span class="dv">0</span>, newtag)</span> <span><a aria-hidden="true" href="#cb24-382"></a></span> <span><a aria-hidden="true" href="#cb24-383"></a> newtag <span class="op">=</span> soup.new_tag(<span class="st">"main"</span>)</span> <span><a aria-hidden="true" href="#cb24-384"></a> newbody.insert(<span class="dv">1</span>, newtag)</span> <span><a aria-hidden="true" href="#cb24-385"></a></span> <span><a aria-hidden="true" href="#cb24-386"></a> tag <span class="op">=</span> newtag</span> <span><a aria-hidden="true" href="#cb24-387"></a> article <span class="op">=</span> soup.new_tag(<span class="st">"article"</span>)</span> <span><a aria-hidden="true" href="#cb24-388"></a> tag.insert(<span class="dv">0</span>, article)</span> <span><a aria-hidden="true" href="#cb24-389"></a></span> <span><a aria-hidden="true" href="#cb24-390"></a> <span class="co"># previous body content becomes article content</span></span> <span><a aria-hidden="true" href="#cb24-391"></a> <span class="co"># the new body replaces the old</span></span> <span><a aria-hidden="true" href="#cb24-392"></a> article.contents <span class="op">=</span> body.contents.copy()</span> <span><a aria-hidden="true" href="#cb24-393"></a> body.replace_with(newbody)</span> <span><a aria-hidden="true" href="#cb24-394"></a></span> <span><a aria-hidden="true" href="#cb24-395"></a> <span class="co"># inject article header information about</span></span> <span><a aria-hidden="true" href="#cb24-396"></a> <span class="co"># title, creation date and author</span></span> <span><a aria-hidden="true" href="#cb24-397"></a> tag <span class="op">=</span> article</span> <span><a aria-hidden="true" href="#cb24-398"></a> newtag <span class="op">=</span> soup.new_tag(<span class="st">"header"</span>)</span> <span><a aria-hidden="true" href="#cb24-399"></a> tag.insert(<span class="dv">1</span>, newtag)</span> <span><a aria-hidden="true" href="#cb24-400"></a> tag <span class="op">=</span> newtag</span> <span><a aria-hidden="true" href="#cb24-401"></a> newtag <span class="op">=</span> soup.new_tag(<span class="st">"h1"</span>)</span> <span><a aria-hidden="true" href="#cb24-402"></a> newtag.append(title)</span> <span><a aria-hidden="true" href="#cb24-403"></a> tag.insert(<span class="dv">0</span>, newtag)</span> <span><a aria-hidden="true" href="#cb24-404"></a> newtag <span class="op">=</span> soup.new_tag(<span class="st">"div"</span>)</span> <span><a aria-hidden="true" href="#cb24-405"></a> tag.insert(<span class="dv">1</span>, newtag)</span> <span><a aria-hidden="true" href="#cb24-406"></a> tag <span class="op">=</span> newtag</span> <span><a aria-hidden="true" href="#cb24-407"></a> newtag <span class="op">=</span> soup.new_tag(<span class="st">"time"</span>)</span> <span><a aria-hidden="true" href="#cb24-408"></a> newtag.append(article_data[PubMetaData.pubdate][:<span class="dv">10</span>])</span> <span><a aria-hidden="true" href="#cb24-409"></a> newtag.attrs.update({<span class="st">"datetime"</span>:</span> <span><a aria-hidden="true" href="#cb24-410"></a> article_data[PubMetaData.pubdate][:<span class="dv">10</span>][:<span class="dv">19</span>]})</span> <span><a aria-hidden="true" href="#cb24-411"></a> <span class="co"># probably deprecated by itemprop alternative</span></span> <span><a aria-hidden="true" href="#cb24-412"></a> newtag.attrs.update({<span class="st">"pubdate"</span>: <span class="st">"true"</span>})</span> <span><a aria-hidden="true" href="#cb24-413"></a> tag.insert(<span class="dv">0</span>, newtag)</span> <span><a aria-hidden="true" href="#cb24-414"></a> newtag <span class="op">=</span> soup.new_tag(<span class="st">"address"</span>)</span> <span><a aria-hidden="true" href="#cb24-415"></a> newtag.append(article_data.get(<span class="st">"article:author"</span>))</span> <span><a aria-hidden="true" href="#cb24-416"></a> tag.insert(<span class="dv">1</span>, newtag)</span> <span><a aria-hidden="true" href="#cb24-417"></a></span> <span><a aria-hidden="true" href="#cb24-418"></a> html_doc <span class="op">=</span> soup.prettify()</span> <span><a aria-hidden="true" href="#cb24-419"></a></span> <span><a aria-hidden="true" href="#cb24-420"></a> <span class="cf">with</span> <span class="bu">open</span>(<span class="va">self</span>.outpath, <span class="st">'w'</span>) <span class="im">as</span> outfile:</span> <span><a aria-hidden="true" href="#cb24-421"></a> <span class="bu">print</span>(html_doc, <span class="bu">file</span><span class="op">=</span>outfile)</span> <span><a aria-hidden="true" href="#cb24-422"></a> outfile.flush()</span> <span><a aria-hidden="true" href="#cb24-423"></a> outfile.close()</span> <span><a aria-hidden="true" href="#cb24-424"></a></span> <span><a aria-hidden="true" href="#cb24-425"></a> <span class="bu">print</span>(<span class="st">'wrote file </span><span class="sc">{0}</span><span class="st">'</span>.<span class="bu">format</span>(<span class="va">self</span>.outpath))</span> <span><a aria-hidden="true" href="#cb24-426"></a></span> <span><a aria-hidden="true" href="#cb24-427"></a> subprocess.run([<span class="st">"firefox"</span>, <span class="va">self</span>.outpath], capture_output<span class="op">=</span><span class="va">False</span>)</span> <span><a aria-hidden="true" href="#cb24-428"></a></span> <span><a aria-hidden="true" href="#cb24-429"></a> <span class="co"># Placing a worklist item for the PdfWorker</span></span> <span><a aria-hidden="true" href="#cb24-430"></a> <span class="cf">if</span> article_data[PubMetaData.pdfdraft] <span class="op">==</span> <span class="st">"true"</span>:</span> <span><a aria-hidden="true" href="#cb24-431"></a> <span class="va">self</span>.dispatcher.worklist.append(</span> <span><a aria-hidden="true" href="#cb24-432"></a> PdfWorker.make_pdf_worklist_item(</span> <span><a aria-hidden="true" href="#cb24-433"></a> article_data.name,</span> <span><a aria-hidden="true" href="#cb24-434"></a> html_doc,</span> <span><a aria-hidden="true" href="#cb24-435"></a> gmc.plainpath,</span> <span><a aria-hidden="true" href="#cb24-436"></a> MsgWorker.task_create,</span> <span><a aria-hidden="true" href="#cb24-437"></a> draft<span class="op">=</span><span class="va">True</span></span> <span><a aria-hidden="true" href="#cb24-438"></a> )</span> <span><a aria-hidden="true" href="#cb24-439"></a> )</span> <span><a aria-hidden="true" href="#cb24-440"></a></span> <span><a aria-hidden="true" href="#cb24-441"></a> <span class="kw">def</span> delete(<span class="va">self</span>):</span> <span><a aria-hidden="true" href="#cb24-442"></a> <span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb24-443"></a><span class="co"> Delete the generated HTML.</span></span> <span><a aria-hidden="true" href="#cb24-444"></a></span> <span><a aria-hidden="true" href="#cb24-445"></a><span class="co"> Resources used by the HTML need additional care.</span></span> <span><a aria-hidden="true" href="#cb24-446"></a><span class="co"> If the delete was triggered by rename, no resources have to be deleted.</span></span> <span><a aria-hidden="true" href="#cb24-447"></a><span class="co"> If it was triggered by a delete, a check is required,</span></span> <span><a aria-hidden="true" href="#cb24-448"></a><span class="co"> whether the resources are used by other pages as well.</span></span> <span><a aria-hidden="true" href="#cb24-449"></a><span class="co"> But resources are place anyhow in the final website location.</span></span> <span><a aria-hidden="true" href="#cb24-450"></a><span class="co"> They must not be deleted by the MwWorker.</span></span> <span><a aria-hidden="true" href="#cb24-451"></a><span class="co"> """</span></span> <span><a aria-hidden="true" href="#cb24-452"></a></span> <span><a aria-hidden="true" href="#cb24-453"></a></span> <span><a aria-hidden="true" href="#cb24-454"></a><span class="cf">if</span> <span class="va">__name__</span> <span class="op">==</span> <span class="st">"__main__"</span>:</span> <span><a aria-hidden="true" href="#cb24-455"></a> <span class="im">from</span> gitmsgdispatcher <span class="im">import</span> GitMsgDispatcher</span> <span><a aria-hidden="true" href="#cb24-456"></a></span> <span><a aria-hidden="true" href="#cb24-457"></a> <span class="bu">print</span>(<span class="st">"Running Test-Cases"</span>)</span> <span><a aria-hidden="true" href="#cb24-458"></a></span> <span><a aria-hidden="true" href="#cb24-459"></a> mwworker <span class="op">=</span> MwWorker(<span class="vs">r".*(new file|modified).*author[/].*\.mediawiki"</span>)</span> <span><a aria-hidden="true" href="#cb24-460"></a> pdfworker <span class="op">=</span> PdfWorker(<span class="vs">r""</span> <span class="op">+</span> PdfWorker.pdfworkitem)</span> <span><a aria-hidden="true" href="#cb24-461"></a></span> <span><a aria-hidden="true" href="#cb24-462"></a> <span class="co"># MESSAGEFILE = "test/PDF-Icon-TestCase-1"</span></span> <span><a aria-hidden="true" href="#cb24-463"></a> <span class="co"># MESSAGEFILE = "test/mw_new_testcase"</span></span> <span><a aria-hidden="true" href="#cb24-464"></a> <span class="co"># MESSAGEFILE = "test/WordPress-testcase-1"</span></span> <span><a aria-hidden="true" href="#cb24-465"></a> <span class="co"># MESSAGEFILE = "test/ich-denke-TestCase-1"</span></span> <span><a aria-hidden="true" href="#cb24-466"></a> <span class="co"># MESSAGEFILE = "test/FragenSieIhrenArzt-TestCase1"</span></span> <span><a aria-hidden="true" href="#cb24-467"></a> <span class="co"># MESSAGEFILE = "test/PandemieBeenden-TestCase-1"</span></span> <span><a aria-hidden="true" href="#cb24-468"></a> <span class="co"># MESSAGEFILE = "test/LegalTribune-TestCase-1"</span></span> <span><a aria-hidden="true" href="#cb24-469"></a> MESSAGEFILE <span class="op">=</span> <span class="st">"test/TwoArticles-TestCase-1"</span></span> <span><a aria-hidden="true" href="#cb24-470"></a> disp <span class="op">=</span> GitMsgDispatcher(MESSAGEFILE, [mwworker, pdfworker])</span></code></pre> </div> <h4> Publishing Meta Data Management: pubmetadata.py </h4> <p> I have already written about the meta data export from WordPress and about the Python Module choice to manage the meta data in a csv file. </p> <p> During migration it is of vital importance to identify the correct meta data entry to get correct publishing date shown in the article. And later we want to keep track of the original publishing date as well, if we perform updates. </p> <p> The code above already shows that this is done in the PubMetaData class. Because the pages URN is the identifier in the stored publishing meta data, the Python file with class PubMetaData also contains the method pageurn(pagename), which computes the Unified Resource Name from the MediaWiki file name, which equals the article title used in the MediaWiki. </p> <p> While the mwworker.py does not trigger a meta data save, it is vital for the migration and in the new scenario also for article updates, that the mwworker uses meta data if such exists for the processed article. </p> <p> <strong> ~/projects/idee/generator/pubmetadata.py </strong> </p> <div class="sourceCode"> <pre class="sourceCode Python"><code class="sourceCode python"><span><a aria-hidden="true" href="#cb25-1"></a><span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb25-2"></a><span class="co">The PubMetaData manages the meta information about the publishings.</span></span> <span><a aria-hidden="true" href="#cb25-3"></a></span> <span><a aria-hidden="true" href="#cb25-4"></a><span class="co">@author: Frank Siebert</span></span> <span><a aria-hidden="true" href="#cb25-5"></a><span class="co">@license: https://creativecommons.org/publicdomain/zero/1.0/deed.en</span></span> <span><a aria-hidden="true" href="#cb25-6"></a><span class="co">@date: 2022-03-15</span></span> <span><a aria-hidden="true" href="#cb25-7"></a></span> <span><a aria-hidden="true" href="#cb25-8"></a><span class="co">This includes the migrated data from WordPress as well as publishing data</span></span> <span><a aria-hidden="true" href="#cb25-9"></a><span class="co">created by new publishings with the new page generator.</span></span> <span><a aria-hidden="true" href="#cb25-10"></a><span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb25-11"></a><span class="im">import</span> datetime</span> <span><a aria-hidden="true" href="#cb25-12"></a><span class="im">import</span> pandas <span class="im">as</span> pd</span> <span><a aria-hidden="true" href="#cb25-13"></a></span> <span><a aria-hidden="true" href="#cb25-14"></a><span class="im">from</span> gitmsgconstants <span class="im">import</span> GitMsgConstants <span class="im">as</span> gmc</span> <span><a aria-hidden="true" href="#cb25-15"></a></span> <span><a aria-hidden="true" href="#cb25-16"></a></span> <span><a aria-hidden="true" href="#cb25-17"></a><span class="kw">def</span> pageurn(pagename):</span> <span><a aria-hidden="true" href="#cb25-18"></a> <span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb25-19"></a><span class="co"> Create a browser friendly urn from the pagename.</span></span> <span><a aria-hidden="true" href="#cb25-20"></a></span> <span><a aria-hidden="true" href="#cb25-21"></a><span class="co"> German special characters are replaced by readable two-character</span></span> <span><a aria-hidden="true" href="#cb25-22"></a><span class="co"> alternatives, and spaces in the filename are replaced with '-'.</span></span> <span><a aria-hidden="true" href="#cb25-23"></a></span> <span><a aria-hidden="true" href="#cb25-24"></a><span class="co"> Parameters</span></span> <span><a aria-hidden="true" href="#cb25-25"></a><span class="co"> ----------</span></span> <span><a aria-hidden="true" href="#cb25-26"></a><span class="co"> pagename : String</span></span> <span><a aria-hidden="true" href="#cb25-27"></a><span class="co"> The pagename, which is also the title of the article.</span></span> <span><a aria-hidden="true" href="#cb25-28"></a><span class="co"> All characters can appear, but we want not all in the resulting URL.</span></span> <span><a aria-hidden="true" href="#cb25-29"></a></span> <span><a aria-hidden="true" href="#cb25-30"></a><span class="co"> Returns</span></span> <span><a aria-hidden="true" href="#cb25-31"></a><span class="co"> -------</span></span> <span><a aria-hidden="true" href="#cb25-32"></a><span class="co"> :String</span></span> <span><a aria-hidden="true" href="#cb25-33"></a><span class="co"> Alternative URL friendly name.</span></span> <span><a aria-hidden="true" href="#cb25-34"></a><span class="co"> """</span></span> <span><a aria-hidden="true" href="#cb25-35"></a> urn <span class="op">=</span> pagename.lower().strip() <span class="op">\</span></span> <span><a aria-hidden="true" href="#cb25-36"></a> .replace(<span class="st">' '</span>, <span class="st">'-'</span>).replace(<span class="st">'/'</span>, <span class="st">'-'</span>) <span class="op">\</span></span> <span><a aria-hidden="true" href="#cb25-37"></a> .replace(<span class="st">'ß'</span>, <span class="st">'ss'</span>).replace(<span class="st">'ä'</span>, <span class="st">'ae'</span>) <span class="op">\</span></span> <span><a aria-hidden="true" href="#cb25-38"></a> .replace(<span class="st">'ö'</span>, <span class="st">'oe'</span>).replace(<span class="st">'ü'</span>, <span class="st">'ue'</span>) <span class="op">\</span></span> <span><a aria-hidden="true" href="#cb25-39"></a> .replace(<span class="st">'&amp;'</span>, <span class="st">'and'</span>).replace(<span class="st">'</span><span class="ch">\\</span><span class="st">'</span>, <span class="st">'-'</span>) <span class="op">\</span></span> <span><a aria-hidden="true" href="#cb25-40"></a> .replace(<span class="st">'?'</span>, <span class="st">''</span>).replace(<span class="st">':'</span>, <span class="st">''</span>) <span class="op">\</span></span> <span><a aria-hidden="true" href="#cb25-41"></a> .replace(<span class="st">'.'</span>, <span class="st">'-'</span>).replace(<span class="st">','</span>, <span class="st">''</span>) <span class="op">\</span></span> <span><a aria-hidden="true" href="#cb25-42"></a> .replace(<span class="st">"("</span>, <span class="st">""</span>).replace(<span class="st">")"</span>, <span class="st">""</span>) <span class="op">\</span></span> <span><a aria-hidden="true" href="#cb25-43"></a> .replace(<span class="st">"</span><span class="ch">\"</span><span class="st">"</span>, <span class="st">""</span>).replace(<span class="st">"!"</span>, <span class="st">"-"</span>) <span class="op">\</span></span> <span><a aria-hidden="true" href="#cb25-44"></a> .replace(<span class="st">"„"</span>, <span class="st">""</span>).replace(<span class="st">"“"</span>, <span class="st">""</span>) <span class="op">\</span></span> <span><a aria-hidden="true" href="#cb25-45"></a> .replace(<span class="st">"#"</span>, <span class="st">""</span>).replace(<span class="st">"%"</span>, <span class="st">""</span>) <span class="op">\</span></span> <span><a aria-hidden="true" href="#cb25-46"></a> .replace(<span class="st">"'"</span>, <span class="st">""</span>)</span> <span><a aria-hidden="true" href="#cb25-47"></a></span> <span><a aria-hidden="true" href="#cb25-48"></a> <span class="co"># remove stacked hypens</span></span> <span><a aria-hidden="true" href="#cb25-49"></a> <span class="cf">while</span> <span class="st">"--"</span> <span class="kw">in</span> urn:</span> <span><a aria-hidden="true" href="#cb25-50"></a> urn <span class="op">=</span> urn.replace(<span class="st">"--"</span>, <span class="st">"-"</span>)</span> <span><a aria-hidden="true" href="#cb25-51"></a></span> <span><a aria-hidden="true" href="#cb25-52"></a> urn <span class="op">=</span> urn.rstrip(<span class="st">'-'</span>)</span> <span><a aria-hidden="true" href="#cb25-53"></a></span> <span><a aria-hidden="true" href="#cb25-54"></a> <span class="cf">return</span> urn</span> <span><a aria-hidden="true" href="#cb25-55"></a></span> <span><a aria-hidden="true" href="#cb25-56"></a></span> <span><a aria-hidden="true" href="#cb25-57"></a><span class="kw">class</span> PubMetaData():</span> <span><a aria-hidden="true" href="#cb25-58"></a> <span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb25-59"></a><span class="co"> The PubMetaData manages the meta information about the publishings.</span></span> <span><a aria-hidden="true" href="#cb25-60"></a></span> <span><a aria-hidden="true" href="#cb25-61"></a><span class="co"> Parameters</span></span> <span><a aria-hidden="true" href="#cb25-62"></a><span class="co"> ----------</span></span> <span><a aria-hidden="true" href="#cb25-63"></a><span class="co"> None.</span></span> <span><a aria-hidden="true" href="#cb25-64"></a></span> <span><a aria-hidden="true" href="#cb25-65"></a><span class="co"> Returns</span></span> <span><a aria-hidden="true" href="#cb25-66"></a><span class="co"> -------</span></span> <span><a aria-hidden="true" href="#cb25-67"></a><span class="co"> None.</span></span> <span><a aria-hidden="true" href="#cb25-68"></a><span class="co"> """</span></span> <span><a aria-hidden="true" href="#cb25-69"></a></span> <span><a aria-hidden="true" href="#cb25-70"></a> instance <span class="op">=</span> <span class="va">None</span></span> <span><a aria-hidden="true" href="#cb25-71"></a></span> <span><a aria-hidden="true" href="#cb25-72"></a> <span class="co"># used as column names as well as meta tag names</span></span> <span><a aria-hidden="true" href="#cb25-73"></a> <span class="co"># article:urn is my own invention, who cares? It serves as unique index.</span></span> <span><a aria-hidden="true" href="#cb25-74"></a> urn <span class="op">=</span> <span class="st">"article:urn"</span></span> <span><a aria-hidden="true" href="#cb25-75"></a> author <span class="op">=</span> <span class="st">"article:author"</span></span> <span><a aria-hidden="true" href="#cb25-76"></a> pubdate <span class="op">=</span> <span class="st">"article:published_time"</span></span> <span><a aria-hidden="true" href="#cb25-77"></a> revdate <span class="op">=</span> <span class="st">"article:modified_time"</span> <span class="co"># Updatedate sounds stupid</span></span> <span><a aria-hidden="true" href="#cb25-78"></a> commentcount <span class="op">=</span> <span class="st">"comments:count"</span> <span class="co"># of some interest during migration.</span></span> <span><a aria-hidden="true" href="#cb25-79"></a> title <span class="op">=</span> <span class="st">"og:title"</span></span> <span><a aria-hidden="true" href="#cb25-80"></a> site <span class="op">=</span> <span class="st">"og:site_name"</span></span> <span><a aria-hidden="true" href="#cb25-81"></a> locale <span class="op">=</span> <span class="st">"og:locale"</span></span> <span><a aria-hidden="true" href="#cb25-82"></a></span> <span><a aria-hidden="true" href="#cb25-83"></a> <span class="co"># not used in persistance</span></span> <span><a aria-hidden="true" href="#cb25-84"></a> pdfdraft <span class="op">=</span> <span class="st">"pdf:draft"</span></span> <span><a aria-hidden="true" href="#cb25-85"></a> <span class="co"># not used now in persistance</span></span> <span><a aria-hidden="true" href="#cb25-86"></a> deletion <span class="op">=</span> <span class="st">"deleted_time"</span></span> <span><a aria-hidden="true" href="#cb25-87"></a></span> <span><a aria-hidden="true" href="#cb25-88"></a> <span class="kw">class</span> _PubMetaData():</span> <span><a aria-hidden="true" href="#cb25-89"></a></span> <span><a aria-hidden="true" href="#cb25-90"></a> <span class="kw">def</span> <span class="fu">__len__</span>(<span class="va">self</span>):</span> <span><a aria-hidden="true" href="#cb25-91"></a> <span class="cf">return</span> <span class="bu">len</span>(<span class="va">self</span>._storage)</span> <span><a aria-hidden="true" href="#cb25-92"></a></span> <span><a aria-hidden="true" href="#cb25-93"></a> <span class="kw">def</span> <span class="fu">__init__</span>(<span class="va">self</span>, disp_msgparam):</span> <span><a aria-hidden="true" href="#cb25-94"></a> <span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb25-95"></a><span class="co"> Initiale only one publishing dictionary.</span></span> <span><a aria-hidden="true" href="#cb25-96"></a></span> <span><a aria-hidden="true" href="#cb25-97"></a><span class="co"> Returns</span></span> <span><a aria-hidden="true" href="#cb25-98"></a><span class="co"> -------</span></span> <span><a aria-hidden="true" href="#cb25-99"></a><span class="co"> None.</span></span> <span><a aria-hidden="true" href="#cb25-100"></a></span> <span><a aria-hidden="true" href="#cb25-101"></a><span class="co"> """</span></span> <span><a aria-hidden="true" href="#cb25-102"></a> <span class="co"># The msg parameters from the message dispatcher</span></span> <span><a aria-hidden="true" href="#cb25-103"></a> <span class="va">self</span>._msgparam <span class="op">=</span> disp_msgparam</span> <span><a aria-hidden="true" href="#cb25-104"></a></span> <span><a aria-hidden="true" href="#cb25-105"></a> <span class="co"># Registers for updates and deletions</span></span> <span><a aria-hidden="true" href="#cb25-106"></a> <span class="va">self</span>._updates <span class="op">=</span> []</span> <span><a aria-hidden="true" href="#cb25-107"></a> <span class="va">self</span>._deletions <span class="op">=</span> []</span> <span><a aria-hidden="true" href="#cb25-108"></a></span> <span><a aria-hidden="true" href="#cb25-109"></a> <span class="cf">if</span> <span class="kw">not</span> gmc.publishingdatapath.exists():</span> <span><a aria-hidden="true" href="#cb25-110"></a> <span class="va">self</span>._read_migration_list()</span> <span><a aria-hidden="true" href="#cb25-111"></a> <span class="cf">else</span>:</span> <span><a aria-hidden="true" href="#cb25-112"></a> <span class="va">self</span>._read()</span> <span><a aria-hidden="true" href="#cb25-113"></a></span> <span><a aria-hidden="true" href="#cb25-114"></a> <span class="kw">def</span> _read_migration_list(<span class="va">self</span>):</span> <span><a aria-hidden="true" href="#cb25-115"></a> <span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb25-116"></a><span class="co"> Read the migration list.</span></span> <span><a aria-hidden="true" href="#cb25-117"></a></span> <span><a aria-hidden="true" href="#cb25-118"></a><span class="co"> The migrationlist.csv is one of two trusted sources</span></span> <span><a aria-hidden="true" href="#cb25-119"></a><span class="co"> for the correct publishing date.</span></span> <span><a aria-hidden="true" href="#cb25-120"></a></span> <span><a aria-hidden="true" href="#cb25-121"></a><span class="co"> The second one is the pubmetadata.csv.</span></span> <span><a aria-hidden="true" href="#cb25-122"></a></span> <span><a aria-hidden="true" href="#cb25-123"></a><span class="co"> This method moves the migration list entries to pubmetadata.</span></span> <span><a aria-hidden="true" href="#cb25-124"></a></span> <span><a aria-hidden="true" href="#cb25-125"></a><span class="co"> As soon as the pubmetadata has been saved once,</span></span> <span><a aria-hidden="true" href="#cb25-126"></a><span class="co"> this method is no longer required.</span></span> <span><a aria-hidden="true" href="#cb25-127"></a></span> <span><a aria-hidden="true" href="#cb25-128"></a><span class="co"> The data structure aligns to the planned pubmetadata</span></span> <span><a aria-hidden="true" href="#cb25-129"></a><span class="co"> data structure.</span></span> <span><a aria-hidden="true" href="#cb25-130"></a></span> <span><a aria-hidden="true" href="#cb25-131"></a><span class="co"> The urn is the stem part of the url the page will finally have.</span></span> <span><a aria-hidden="true" href="#cb25-132"></a><span class="co"> It serves as index in the pandas dataframe, which translates</span></span> <span><a aria-hidden="true" href="#cb25-133"></a><span class="co"> into the name of the respecive Series of ones articles data.</span></span> <span><a aria-hidden="true" href="#cb25-134"></a></span> <span><a aria-hidden="true" href="#cb25-135"></a><span class="co"> article:published_time 2022-02-15T14:41:13.367917</span></span> <span><a aria-hidden="true" href="#cb25-136"></a><span class="co"> article:modified_time 2022-02-15T14:41:13.367917</span></span> <span><a aria-hidden="true" href="#cb25-137"></a><span class="co"> comments:count 0</span></span> <span><a aria-hidden="true" href="#cb25-138"></a><span class="co"> og:site_name Idee</span></span> <span><a aria-hidden="true" href="#cb25-139"></a><span class="co"> og:locale de-DE</span></span> <span><a aria-hidden="true" href="#cb25-140"></a><span class="co"> article:author Frank Siebert</span></span> <span><a aria-hidden="true" href="#cb25-141"></a><span class="co"> og:title Creative Commons CC0 1.0 Universal</span></span> <span><a aria-hidden="true" href="#cb25-142"></a><span class="co"> pdf:draft true</span></span> <span><a aria-hidden="true" href="#cb25-143"></a><span class="co"> Name: creative-commons-cc0-1-0-universal, dtype: object</span></span> <span><a aria-hidden="true" href="#cb25-144"></a></span> <span><a aria-hidden="true" href="#cb25-145"></a><span class="co"> Returns</span></span> <span><a aria-hidden="true" href="#cb25-146"></a><span class="co"> -------</span></span> <span><a aria-hidden="true" href="#cb25-147"></a><span class="co"> None.</span></span> <span><a aria-hidden="true" href="#cb25-148"></a><span class="co"> """</span></span> <span><a aria-hidden="true" href="#cb25-149"></a> <span class="va">self</span>._storage <span class="op">=</span> pd.read_csv(gmc.migrationlistpath,</span> <span><a aria-hidden="true" href="#cb25-150"></a> delimiter<span class="op">=</span><span class="st">'</span><span class="ch">\t</span><span class="st">'</span>,</span> <span><a aria-hidden="true" href="#cb25-151"></a> index_col<span class="op">=</span>PubMetaData.urn)</span> <span><a aria-hidden="true" href="#cb25-152"></a></span> <span><a aria-hidden="true" href="#cb25-153"></a> <span class="kw">def</span> get_new_revision(<span class="va">self</span>, title<span class="op">=</span><span class="va">None</span>, urn<span class="op">=</span><span class="va">None</span>):</span> <span><a aria-hidden="true" href="#cb25-154"></a> <span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb25-155"></a><span class="co"> Provide publishing dictionary data for the title.</span></span> <span><a aria-hidden="true" href="#cb25-156"></a></span> <span><a aria-hidden="true" href="#cb25-157"></a><span class="co"> A message worker may use this method to get information</span></span> <span><a aria-hidden="true" href="#cb25-158"></a><span class="co"> about the current publishing in work.</span></span> <span><a aria-hidden="true" href="#cb25-159"></a></span> <span><a aria-hidden="true" href="#cb25-160"></a><span class="co"> To make this useful, the meta information from the current</span></span> <span><a aria-hidden="true" href="#cb25-161"></a><span class="co"> git message is incorporated into the article entry, if the meta</span></span> <span><a aria-hidden="true" href="#cb25-162"></a><span class="co"> information is not already in by pevious publishings, bringing</span></span> <span><a aria-hidden="true" href="#cb25-163"></a><span class="co"> all metadata required into one place.</span></span> <span><a aria-hidden="true" href="#cb25-164"></a></span> <span><a aria-hidden="true" href="#cb25-165"></a><span class="co"> If the worker succeeds and his work was not DRAFT publishing, the</span></span> <span><a aria-hidden="true" href="#cb25-166"></a><span class="co"> worker may provide the article_data to get it saved via update().</span></span> <span><a aria-hidden="true" href="#cb25-167"></a></span> <span><a aria-hidden="true" href="#cb25-168"></a><span class="co"> If the workers task was the deletion of the publishing, the</span></span> <span><a aria-hidden="true" href="#cb25-169"></a><span class="co"> worker nay provide the article_data to get the deletion</span></span> <span><a aria-hidden="true" href="#cb25-170"></a><span class="co"> information saved via deletion().</span></span> <span><a aria-hidden="true" href="#cb25-171"></a></span> <span><a aria-hidden="true" href="#cb25-172"></a><span class="co"> Parameters</span></span> <span><a aria-hidden="true" href="#cb25-173"></a><span class="co"> ----------</span></span> <span><a aria-hidden="true" href="#cb25-174"></a><span class="co"> title:</span></span> <span><a aria-hidden="true" href="#cb25-175"></a><span class="co"> The title of the article, whose data has to be updated. If</span></span> <span><a aria-hidden="true" href="#cb25-176"></a><span class="co"> privided, it is used to compute the urn of the article.</span></span> <span><a aria-hidden="true" href="#cb25-177"></a><span class="co"> urn:</span></span> <span><a aria-hidden="true" href="#cb25-178"></a><span class="co"> The unique resource name of the article, whose data has to be</span></span> <span><a aria-hidden="true" href="#cb25-179"></a><span class="co"> updated.</span></span> <span><a aria-hidden="true" href="#cb25-180"></a></span> <span><a aria-hidden="true" href="#cb25-181"></a><span class="co"> Returns</span></span> <span><a aria-hidden="true" href="#cb25-182"></a><span class="co"> -------</span></span> <span><a aria-hidden="true" href="#cb25-183"></a><span class="co"> dict:</span></span> <span><a aria-hidden="true" href="#cb25-184"></a><span class="co"> The titles data dictionary with revised data entries.</span></span> <span><a aria-hidden="true" href="#cb25-185"></a><span class="co"> """</span></span> <span><a aria-hidden="true" href="#cb25-186"></a> nowdate <span class="op">=</span> datetime.datetime.now().isoformat()</span> <span><a aria-hidden="true" href="#cb25-187"></a></span> <span><a aria-hidden="true" href="#cb25-188"></a> <span class="cf">if</span> <span class="kw">not</span> urn <span class="kw">and</span> <span class="kw">not</span> title:</span> <span><a aria-hidden="true" href="#cb25-189"></a> <span class="cf">return</span> <span class="va">None</span></span> <span><a aria-hidden="true" href="#cb25-190"></a> <span class="cf">if</span> <span class="kw">not</span> urn:</span> <span><a aria-hidden="true" href="#cb25-191"></a> urn <span class="op">=</span> pageurn(title)</span> <span><a aria-hidden="true" href="#cb25-192"></a></span> <span><a aria-hidden="true" href="#cb25-193"></a> <span class="cf">if</span> urn <span class="kw">in</span> <span class="va">self</span>._storage.index: <span class="co"># .to_list():</span></span> <span><a aria-hidden="true" href="#cb25-194"></a> article_data <span class="op">=</span> <span class="va">self</span>._storage.loc[urn]</span> <span><a aria-hidden="true" href="#cb25-195"></a> <span class="co"># Working copy</span></span> <span><a aria-hidden="true" href="#cb25-196"></a> article_data <span class="op">=</span> article_data.copy()</span> <span><a aria-hidden="true" href="#cb25-197"></a> <span class="cf">else</span>:</span> <span><a aria-hidden="true" href="#cb25-198"></a> <span class="cf">for</span> index, article_data <span class="kw">in</span> <span class="va">self</span>._storage.iterrows():</span> <span><a aria-hidden="true" href="#cb25-199"></a> article_data <span class="op">=</span> pd.Series(</span> <span><a aria-hidden="true" href="#cb25-200"></a> data<span class="op">=</span>{</span> <span><a aria-hidden="true" href="#cb25-201"></a> PubMetaData.title: title,</span> <span><a aria-hidden="true" href="#cb25-202"></a> PubMetaData.pubdate: nowdate,</span> <span><a aria-hidden="true" href="#cb25-203"></a> PubMetaData.commentcount: <span class="dv">0</span>,</span> <span><a aria-hidden="true" href="#cb25-204"></a> PubMetaData.site: <span class="va">None</span>,</span> <span><a aria-hidden="true" href="#cb25-205"></a> PubMetaData.locale: <span class="va">None</span>,</span> <span><a aria-hidden="true" href="#cb25-206"></a> PubMetaData.author: <span class="va">None</span>,</span> <span><a aria-hidden="true" href="#cb25-207"></a> },</span> <span><a aria-hidden="true" href="#cb25-208"></a> index<span class="op">=</span>article_data.index,</span> <span><a aria-hidden="true" href="#cb25-209"></a> dtype<span class="op">=</span>article_data.dtype,</span> <span><a aria-hidden="true" href="#cb25-210"></a> name<span class="op">=</span>urn)</span> <span><a aria-hidden="true" href="#cb25-211"></a> article_data <span class="op">=</span> article_data.copy()</span> <span><a aria-hidden="true" href="#cb25-212"></a> article_data.update({<span class="st">"Name"</span>: urn})</span> <span><a aria-hidden="true" href="#cb25-213"></a> <span class="cf">break</span></span> <span><a aria-hidden="true" href="#cb25-214"></a></span> <span><a aria-hidden="true" href="#cb25-215"></a> <span class="co"># Set the revision date</span></span> <span><a aria-hidden="true" href="#cb25-216"></a> article_data.update({PubMetaData.revdate: nowdate})</span> <span><a aria-hidden="true" href="#cb25-217"></a></span> <span><a aria-hidden="true" href="#cb25-218"></a> <span class="co"># Iterate the message parameter keys and add parameters and their</span></span> <span><a aria-hidden="true" href="#cb25-219"></a> <span class="co"># value, if data for this key is not present in the titles</span></span> <span><a aria-hidden="true" href="#cb25-220"></a> <span class="co"># data series.</span></span> <span><a aria-hidden="true" href="#cb25-221"></a></span> <span><a aria-hidden="true" href="#cb25-222"></a> <span class="co"># This also adds a key, if it is not part of the pubmetadata.csv.</span></span> <span><a aria-hidden="true" href="#cb25-223"></a> <span class="cf">for</span> key <span class="kw">in</span> <span class="va">self</span>._msgparam.keys():</span> <span><a aria-hidden="true" href="#cb25-224"></a> <span class="cf">if</span> <span class="kw">not</span> article_data.get(key):</span> <span><a aria-hidden="true" href="#cb25-225"></a> article_data.loc[key] <span class="op">=</span> <span class="va">self</span>._msgparam[key]</span> <span><a aria-hidden="true" href="#cb25-226"></a> <span class="cf">return</span> article_data</span> <span><a aria-hidden="true" href="#cb25-227"></a></span> <span><a aria-hidden="true" href="#cb25-228"></a> <span class="kw">def</span> update(<span class="va">self</span>, series):</span> <span><a aria-hidden="true" href="#cb25-229"></a> <span class="va">self</span>._updates.append(series)</span> <span><a aria-hidden="true" href="#cb25-230"></a></span> <span><a aria-hidden="true" href="#cb25-231"></a> <span class="kw">def</span> delete(<span class="va">self</span>, series):</span> <span><a aria-hidden="true" href="#cb25-232"></a> <span class="va">self</span>._deletions.append(series)</span> <span><a aria-hidden="true" href="#cb25-233"></a></span> <span><a aria-hidden="true" href="#cb25-234"></a> <span class="kw">def</span> save(<span class="va">self</span>):</span> <span><a aria-hidden="true" href="#cb25-235"></a> <span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb25-236"></a><span class="co"> Save the publishing dict data.</span></span> <span><a aria-hidden="true" href="#cb25-237"></a></span> <span><a aria-hidden="true" href="#cb25-238"></a><span class="co"> Incorporates updates and new entries,</span></span> <span><a aria-hidden="true" href="#cb25-239"></a><span class="co"> and removes entries deleted (Implementation</span></span> <span><a aria-hidden="true" href="#cb25-240"></a><span class="co"> pending, probably I decide to extend the</span></span> <span><a aria-hidden="true" href="#cb25-241"></a><span class="co"> data structure with a deleted column).</span></span> <span><a aria-hidden="true" href="#cb25-242"></a></span> <span><a aria-hidden="true" href="#cb25-243"></a><span class="co"> Deletions never took place till now,</span></span> <span><a aria-hidden="true" href="#cb25-244"></a><span class="co"> might take a while till its implemented.</span></span> <span><a aria-hidden="true" href="#cb25-245"></a></span> <span><a aria-hidden="true" href="#cb25-246"></a><span class="co"> Returns</span></span> <span><a aria-hidden="true" href="#cb25-247"></a><span class="co"> -------</span></span> <span><a aria-hidden="true" href="#cb25-248"></a><span class="co"> None.</span></span> <span><a aria-hidden="true" href="#cb25-249"></a><span class="co"> """</span></span> <span><a aria-hidden="true" href="#cb25-250"></a> <span class="cf">for</span> article_data <span class="kw">in</span> <span class="va">self</span>._updates:</span> <span><a aria-hidden="true" href="#cb25-251"></a> urn <span class="op">=</span> article_data.name</span> <span><a aria-hidden="true" href="#cb25-252"></a> <span class="va">self</span>._storage.loc[urn] <span class="op">=</span> article_data</span> <span><a aria-hidden="true" href="#cb25-253"></a></span> <span><a aria-hidden="true" href="#cb25-254"></a> <span class="cf">for</span> article_data <span class="kw">in</span> <span class="va">self</span>._deletions:</span> <span><a aria-hidden="true" href="#cb25-255"></a> <span class="cf">pass</span> <span class="co"># implementation pending</span></span> <span><a aria-hidden="true" href="#cb25-256"></a></span> <span><a aria-hidden="true" href="#cb25-257"></a> <span class="va">self</span>._storage.to_csv(gmc.publishingdatapath,</span> <span><a aria-hidden="true" href="#cb25-258"></a> sep<span class="op">=</span><span class="st">';'</span>, quotechar<span class="op">=</span><span class="st">'"'</span>)</span> <span><a aria-hidden="true" href="#cb25-259"></a></span> <span><a aria-hidden="true" href="#cb25-260"></a> <span class="kw">def</span> _read(<span class="va">self</span>):</span> <span><a aria-hidden="true" href="#cb25-261"></a> <span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb25-262"></a><span class="co"> Read the publishing dict data from previous publishings.</span></span> <span><a aria-hidden="true" href="#cb25-263"></a></span> <span><a aria-hidden="true" href="#cb25-264"></a><span class="co"> Returns</span></span> <span><a aria-hidden="true" href="#cb25-265"></a><span class="co"> -------</span></span> <span><a aria-hidden="true" href="#cb25-266"></a><span class="co"> None.</span></span> <span><a aria-hidden="true" href="#cb25-267"></a><span class="co"> """</span></span> <span><a aria-hidden="true" href="#cb25-268"></a> <span class="va">self</span>._storage <span class="op">=</span> pd.read_csv(gmc.publishingdatapath,</span> <span><a aria-hidden="true" href="#cb25-269"></a> delimiter<span class="op">=</span><span class="st">';'</span>,</span> <span><a aria-hidden="true" href="#cb25-270"></a> index_col<span class="op">=</span>PubMetaData.urn)</span> <span><a aria-hidden="true" href="#cb25-271"></a></span> <span><a aria-hidden="true" href="#cb25-272"></a> <span class="kw">def</span> <span class="fu">__init__</span>(<span class="va">self</span>, disp_msgparam):</span> <span><a aria-hidden="true" href="#cb25-273"></a> <span class="cf">if</span> <span class="kw">not</span> PubMetaData.instance:</span> <span><a aria-hidden="true" href="#cb25-274"></a> PubMetaData.instance <span class="op">=</span> PubMetaData._PubMetaData(disp_msgparam)</span> <span><a aria-hidden="true" href="#cb25-275"></a></span> <span><a aria-hidden="true" href="#cb25-276"></a> <span class="kw">def</span> <span class="fu">__getattr__</span>(<span class="va">self</span>, name):</span> <span><a aria-hidden="true" href="#cb25-277"></a> <span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb25-278"></a><span class="co"> Get attrubute value by name.</span></span> <span><a aria-hidden="true" href="#cb25-279"></a></span> <span><a aria-hidden="true" href="#cb25-280"></a><span class="co"> Parameters</span></span> <span><a aria-hidden="true" href="#cb25-281"></a><span class="co"> ----------</span></span> <span><a aria-hidden="true" href="#cb25-282"></a><span class="co"> name : str</span></span> <span><a aria-hidden="true" href="#cb25-283"></a><span class="co"> Name of the attribute.</span></span> <span><a aria-hidden="true" href="#cb25-284"></a></span> <span><a aria-hidden="true" href="#cb25-285"></a><span class="co"> Returns</span></span> <span><a aria-hidden="true" href="#cb25-286"></a><span class="co"> -------</span></span> <span><a aria-hidden="true" href="#cb25-287"></a><span class="co"> TYPE</span></span> <span><a aria-hidden="true" href="#cb25-288"></a><span class="co"> Value of the attribute.</span></span> <span><a aria-hidden="true" href="#cb25-289"></a></span> <span><a aria-hidden="true" href="#cb25-290"></a><span class="co"> """</span></span> <span><a aria-hidden="true" href="#cb25-291"></a> <span class="cf">return</span> <span class="bu">getattr</span>(<span class="va">self</span>.instance, name)</span> <span><a aria-hidden="true" href="#cb25-292"></a></span> <span><a aria-hidden="true" href="#cb25-293"></a> <span class="kw">def</span> <span class="fu">__len__</span>(<span class="va">self</span>):</span> <span><a aria-hidden="true" href="#cb25-294"></a> <span class="cf">return</span> <span class="bu">len</span>(PubMetaData.instance)</span> <span><a aria-hidden="true" href="#cb25-295"></a></span> <span><a aria-hidden="true" href="#cb25-296"></a></span> <span><a aria-hidden="true" href="#cb25-297"></a><span class="cf">if</span> <span class="va">__name__</span> <span class="op">==</span> <span class="st">"__main__"</span>:</span> <span><a aria-hidden="true" href="#cb25-298"></a> <span class="cf">pass</span></span></code></pre> </div> <h4> Constants </h4> <p> That's a rather strange decision. Why would someone create a class to store constants? </p> <p> I see these values less as real constants, but they are more likely to become, at least partly, configuration entries, when I decide to separate the generator code into a software package usable for more than one content project, </p> <p> The existence of this class and its current content is a strong signal for the unfinished nature of the project. It's just ready for first use, nothing more. </p> <p> <strong> ~/projects/idee/generator/gitmsgconstants.py </strong> </p> <div class="sourceCode"> <pre class="sourceCode Python"><code class="sourceCode python"><span><a aria-hidden="true" href="#cb26-1"></a><span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb26-2"></a><span class="co">GitMsgConstants provides project wide constants.</span></span> <span><a aria-hidden="true" href="#cb26-3"></a></span> <span><a aria-hidden="true" href="#cb26-4"></a><span class="co">@author: Frank Siebert</span></span> <span><a aria-hidden="true" href="#cb26-5"></a><span class="co">@license: https://creativecommons.org/publicdomain/zero/1.0/deed.en</span></span> <span><a aria-hidden="true" href="#cb26-6"></a><span class="co">@date: 2022-03-15</span></span> <span><a aria-hidden="true" href="#cb26-7"></a></span> <span><a aria-hidden="true" href="#cb26-8"></a><span class="co">No Instance is required. Could leverage in future a config file.</span></span> <span><a aria-hidden="true" href="#cb26-9"></a><span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb26-10"></a><span class="im">from</span> pathlib <span class="im">import</span> Path</span> <span><a aria-hidden="true" href="#cb26-11"></a></span> <span><a aria-hidden="true" href="#cb26-12"></a></span> <span><a aria-hidden="true" href="#cb26-13"></a><span class="kw">class</span> GitMsgConstants():</span> <span><a aria-hidden="true" href="#cb26-14"></a> <span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb26-15"></a><span class="co"> Dispatch the lines of the git message to registered workers.</span></span> <span><a aria-hidden="true" href="#cb26-16"></a></span> <span><a aria-hidden="true" href="#cb26-17"></a><span class="co"> Parameters</span></span> <span><a aria-hidden="true" href="#cb26-18"></a><span class="co"> ----------</span></span> <span><a aria-hidden="true" href="#cb26-19"></a><span class="co"> gitmessagepath : Path</span></span> <span><a aria-hidden="true" href="#cb26-20"></a><span class="co"> Path as type str or type Path pointing to the git message.</span></span> <span><a aria-hidden="true" href="#cb26-21"></a></span> <span><a aria-hidden="true" href="#cb26-22"></a><span class="co"> msgworkers : List of MsgWorker</span></span> <span><a aria-hidden="true" href="#cb26-23"></a><span class="co"> The list of message workers is used as worker queue. Workers first</span></span> <span><a aria-hidden="true" href="#cb26-24"></a><span class="co"> in the queue get their workitems first.</span></span> <span><a aria-hidden="true" href="#cb26-25"></a></span> <span><a aria-hidden="true" href="#cb26-26"></a><span class="co"> Workers can return their work result to be picked up by</span></span> <span><a aria-hidden="true" href="#cb26-27"></a><span class="co"> later workers.</span></span> <span><a aria-hidden="true" href="#cb26-28"></a></span> <span><a aria-hidden="true" href="#cb26-29"></a><span class="co"> Returns</span></span> <span><a aria-hidden="true" href="#cb26-30"></a><span class="co"> -------</span></span> <span><a aria-hidden="true" href="#cb26-31"></a><span class="co"> GitMsgConstants.</span></span> <span><a aria-hidden="true" href="#cb26-32"></a></span> <span><a aria-hidden="true" href="#cb26-33"></a><span class="co"> """</span></span> <span><a aria-hidden="true" href="#cb26-34"></a></span> <span><a aria-hidden="true" href="#cb26-35"></a> generator <span class="op">=</span> <span class="st">"pandoc, fs-commit-msg-hook 1.0"</span></span> <span><a aria-hidden="true" href="#cb26-36"></a> website <span class="op">=</span> <span class="st">"https://idee.frank-siebert.de"</span></span> <span><a aria-hidden="true" href="#cb26-37"></a> pdfimage <span class="op">=</span> <span class="st">"3cd97bab8bb20288768b35fd72979ec3bbf4b2a8.png"</span></span> <span><a aria-hidden="true" href="#cb26-38"></a></span> <span><a aria-hidden="true" href="#cb26-39"></a> plainpath <span class="op">=</span> Path(<span class="st">"plain"</span>)</span> <span><a aria-hidden="true" href="#cb26-40"></a> confpath <span class="op">=</span> Path(<span class="st">"config"</span>)</span> <span><a aria-hidden="true" href="#cb26-41"></a> sitepath <span class="op">=</span> Path(<span class="st">"website"</span>)</span> <span><a aria-hidden="true" href="#cb26-42"></a> articlepath <span class="op">=</span> sitepath <span class="op">/</span> <span class="st">"article"</span></span> <span><a aria-hidden="true" href="#cb26-43"></a> audiopath <span class="op">=</span> sitepath <span class="op">/</span> <span class="st">"audio"</span></span> <span><a aria-hidden="true" href="#cb26-44"></a> csspath <span class="op">=</span> sitepath <span class="op">/</span> <span class="st">"css"</span> <span class="op">/</span> <span class="st">"fs.css"</span></span> <span><a aria-hidden="true" href="#cb26-45"></a> headerpath <span class="op">=</span> sitepath <span class="op">/</span> <span class="st">"portal"</span> <span class="op">/</span> <span class="st">"header.html"</span></span> <span><a aria-hidden="true" href="#cb26-46"></a> imagepath <span class="op">=</span> sitepath <span class="op">/</span> <span class="st">"image"</span></span> <span><a aria-hidden="true" href="#cb26-47"></a> pdfpath <span class="op">=</span> sitepath <span class="op">/</span> <span class="st">"pdf"</span></span> <span><a aria-hidden="true" href="#cb26-48"></a> qrpath <span class="op">=</span> sitepath <span class="op">/</span> <span class="st">"qrcode"</span></span> <span><a aria-hidden="true" href="#cb26-49"></a></span> <span><a aria-hidden="true" href="#cb26-50"></a> migrationlistpath <span class="op">=</span> confpath <span class="op">/</span> <span class="st">"migrationlist.csv"</span></span> <span><a aria-hidden="true" href="#cb26-51"></a> publishingdatapath <span class="op">=</span> sitepath <span class="op">/</span> <span class="st">"pubmetadata.csv"</span></span> <span><a aria-hidden="true" href="#cb26-52"></a></span> <span><a aria-hidden="true" href="#cb26-53"></a> pdfdraft <span class="op">=</span> <span class="st">"pdf:draft"</span></span> <span><a aria-hidden="true" href="#cb26-54"></a> locale <span class="op">=</span> <span class="st">"og:locale"</span></span> <span><a aria-hidden="true" href="#cb26-55"></a></span> <span><a aria-hidden="true" href="#cb26-56"></a> archivepath <span class="op">=</span> sitepath <span class="op">/</span> <span class="st">"archive"</span></span> <span><a aria-hidden="true" href="#cb26-57"></a> idee_archive <span class="op">=</span> archivepath <span class="op">/</span> Path(<span class="st">"idee-archive.html"</span>)</span> <span><a aria-hidden="true" href="#cb26-58"></a> concept_archive <span class="op">=</span> archivepath <span class="op">/</span> Path(<span class="st">"concept-archive.html"</span>)</span> <span><a aria-hidden="true" href="#cb26-59"></a> sitemap <span class="op">=</span> sitepath <span class="op">/</span> Path(<span class="st">"sitemap.xml"</span>)</span> <span><a aria-hidden="true" href="#cb26-60"></a> idee_map <span class="op">=</span> sitepath <span class="op">/</span> Path(<span class="st">"idee-map.xml"</span>)</span> <span><a aria-hidden="true" href="#cb26-61"></a> concept_map <span class="op">=</span> sitepath <span class="op">/</span> Path(<span class="st">"concept-map.xml"</span>)</span> <span><a aria-hidden="true" href="#cb26-62"></a> sitemappath <span class="op">=</span> sitepath <span class="op">/</span> <span class="st">"sitemap"</span></span> <span><a aria-hidden="true" href="#cb26-63"></a> map_template <span class="op">=</span> sitepath <span class="op">/</span> <span class="st">"portal"</span> <span class="op">/</span> <span class="st">"monthly-map.xml"</span></span> <span><a aria-hidden="true" href="#cb26-64"></a> archive_template <span class="op">=</span> sitepath <span class="op">/</span> <span class="st">"portal"</span> <span class="op">/</span> <span class="st">"monthly-archive.html"</span></span> <span><a aria-hidden="true" href="#cb26-65"></a></span> <span><a aria-hidden="true" href="#cb26-66"></a> idee_rss <span class="op">=</span> sitepath <span class="op">/</span> Path(<span class="st">"idee-rss.xml"</span>)</span> <span><a aria-hidden="true" href="#cb26-67"></a> concept_rss <span class="op">=</span> sitepath <span class="op">/</span> Path(<span class="st">"concept-rss.xml"</span>)</span> <span><a aria-hidden="true" href="#cb26-68"></a></span> <span><a aria-hidden="true" href="#cb26-69"></a> idee_index <span class="op">=</span> sitepath <span class="op">/</span> Path(<span class="st">"idee-index.html"</span>)</span> <span><a aria-hidden="true" href="#cb26-70"></a> concept_index <span class="op">=</span> sitepath <span class="op">/</span> Path(<span class="st">"concept-index.html"</span>)</span> <span><a aria-hidden="true" href="#cb26-71"></a></span> <span><a aria-hidden="true" href="#cb26-72"></a></span> <span><a aria-hidden="true" href="#cb26-73"></a><span class="cf">if</span> <span class="va">__name__</span> <span class="op">==</span> <span class="st">"__main__"</span>:</span> <span><a aria-hidden="true" href="#cb26-74"></a> <span class="cf">pass</span></span></code></pre> </div> <h4> HTML Formatting: fs.css </h4> <p> If we generate HTML, we want also a nice view on it. The CSS is a critical part to get a nice looking result. </p> <p> <strong> ~/projects/idee/website/css/fs.css </strong> </p> <div class="sourceCode"> <pre class="sourceCode CSS"><code class="sourceCode css"><span><a aria-hidden="true" href="#cb27-1"></a><span class="co">/* ***************************************************************************</span></span> <span><a aria-hidden="true" href="#cb27-2"></a><span class="co"> * Frank Siebert's CSS </span></span> <span><a aria-hidden="true" href="#cb27-3"></a><span class="co"> +</span></span> <span><a aria-hidden="true" href="#cb27-4"></a><span class="co"> * Licence: CC0 </span></span> <span><a aria-hidden="true" href="#cb27-5"></a><span class="co"> * httpx://frank-siebert.de/article/creative-commons-cc0-1-0-universal.html </span></span> <span><a aria-hidden="true" href="#cb27-6"></a><span class="co"> * ***************************************************************************/</span></span> <span><a aria-hidden="true" href="#cb27-7"></a></span> <span><a aria-hidden="true" href="#cb27-8"></a><span class="in">:root</span> { </span> <span><a aria-hidden="true" href="#cb27-9"></a> <span class="co">/* kind of blue */</span></span> <span><a aria-hidden="true" href="#cb27-10"></a> <span class="va">--theme-color</span>: <span class="cn">#006080</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-11"></a> <span class="co">/* black on white */</span></span> <span><a aria-hidden="true" href="#cb27-12"></a> <span class="va">--theme-text-color</span>: <span class="cn">#000000</span><span class="op">;</span> </span> <span><a aria-hidden="true" href="#cb27-13"></a> <span class="co">/* white background */</span></span> <span><a aria-hidden="true" href="#cb27-14"></a> <span class="va">--theme-background-color</span>: <span class="cn">#ffffff</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-15"></a> <span class="co">/* for minor meta information */</span></span> <span><a aria-hidden="true" href="#cb27-16"></a> <span class="va">--theme-meta-color</span>: <span class="cn">#999999</span><span class="op">;</span> </span> <span><a aria-hidden="true" href="#cb27-17"></a> <span class="co">/* Arial and Helvetica exist on my Computer */</span></span> <span><a aria-hidden="true" href="#cb27-18"></a> <span class="co">/* --theme-font-family: Arial, Helvetica, Verdana, Tahoma, sans-serif; */</span></span> <span><a aria-hidden="true" href="#cb27-19"></a> <span class="va">--theme-font-family</span>: Liberation Sans<span class="op">,</span> <span class="dv">sans-serif</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-20"></a> <span class="co">/* One theme font only, based on the theme font-family */</span></span> <span><a aria-hidden="true" href="#cb27-21"></a> <span class="va">--theme-font</span>: <span class="dv">16</span><span class="dt">px</span>/<span class="dv">1.4</span> Liberation Sans<span class="op">,</span> <span class="dv">sans-serif</span><span class="op">;</span> </span> <span><a aria-hidden="true" href="#cb27-22"></a> <span class="co">/* Improve readability */</span></span> <span><a aria-hidden="true" href="#cb27-23"></a> <span class="va">--theme-letter-spacing</span>: <span class="dv">0.05</span><span class="dt">em</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-24"></a>}</span> <span><a aria-hidden="true" href="#cb27-25"></a></span> <span><a aria-hidden="true" href="#cb27-26"></a>html {</span> <span><a aria-hidden="true" href="#cb27-27"></a> <span class="kw">padding</span>: <span class="dv">0</span><span class="dt">px</span> <span class="dv">5</span><span class="dt">px</span> <span class="dv">0</span><span class="dt">px</span> <span class="dv">0</span><span class="dt">px</span><span class="op">;</span> </span> <span><a aria-hidden="true" href="#cb27-28"></a> <span class="kw">margin</span>: <span class="dv">0</span><span class="op">;</span> </span> <span><a aria-hidden="true" href="#cb27-29"></a> <span class="kw">border</span>: <span class="dv">0</span><span class="op">;</span> </span> <span><a aria-hidden="true" href="#cb27-30"></a> <span class="kw">font</span>: <span class="fu">var(</span><span class="va">--theme-font</span><span class="fu">)</span><span class="op">;</span> </span> <span><a aria-hidden="true" href="#cb27-31"></a> <span class="kw">letter-spacing</span>: <span class="fu">var(</span><span class="va">--theme-letter-spacing</span><span class="fu">)</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-32"></a> <span class="kw">background-color</span>: <span class="cn">lightgray</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-33"></a>}</span> <span><a aria-hidden="true" href="#cb27-34"></a></span> <span><a aria-hidden="true" href="#cb27-35"></a>body {</span> <span><a aria-hidden="true" href="#cb27-36"></a> <span class="kw">width</span>: <span class="dv">100</span><span class="dt">%</span><span class="op">;</span> </span> <span><a aria-hidden="true" href="#cb27-37"></a> <span class="kw">height</span>: <span class="dv">100</span><span class="dt">%</span><span class="op">;</span> </span> <span><a aria-hidden="true" href="#cb27-38"></a> <span class="kw">min-width</span>: <span class="dv">280</span><span class="dt">px</span><span class="op">;</span> </span> <span><a aria-hidden="true" href="#cb27-39"></a> <span class="kw">max-width</span>:<span class="dv">1200</span><span class="dt">px</span><span class="op">;</span> </span> <span><a aria-hidden="true" href="#cb27-40"></a> <span class="kw">padding</span>: <span class="dv">0</span> <span class="dv">0</span> <span class="dv">0</span> <span class="dv">0</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-41"></a> <span class="kw">margin-top</span>: <span class="dv">0</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-42"></a> <span class="kw">margin-bottom</span>: <span class="dv">0</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-43"></a> <span class="kw">margin-left</span>:<span class="bu">auto</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-44"></a> <span class="kw">margin-right</span>:<span class="bu">auto</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-45"></a> <span class="kw">border-right</span>: <span class="dv">1</span><span class="dt">px</span> <span class="dv">solid</span> <span class="fu">var(</span><span class="va">--theme-color</span><span class="fu">)</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-46"></a> <span class="kw">border-left</span>: <span class="dv">1</span><span class="dt">px</span> <span class="dv">solid</span> <span class="fu">var(</span><span class="va">--theme-color</span><span class="fu">)</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-47"></a></span> <span><a aria-hidden="true" href="#cb27-48"></a> <span class="kw">color</span>: <span class="fu">var(</span><span class="va">--theme-text-color</span><span class="fu">)</span><span class="op">;</span> </span> <span><a aria-hidden="true" href="#cb27-49"></a> <span class="kw">background-color</span>: <span class="fu">var(</span><span class="va">--theme-background-color</span><span class="fu">)</span><span class="op">;</span> </span> <span><a aria-hidden="true" href="#cb27-50"></a> <span class="kw">font-size</span>: <span class="dv">1</span><span class="dt">em</span><span class="op">;</span> </span> <span><a aria-hidden="true" href="#cb27-51"></a> <span class="kw">word-wrap</span>: break-word<span class="op">;</span> </span> <span><a aria-hidden="true" href="#cb27-52"></a>}</span> <span><a aria-hidden="true" href="#cb27-53"></a></span> <span><a aria-hidden="true" href="#cb27-54"></a><span class="co">/* **************************************************************************</span></span> <span><a aria-hidden="true" href="#cb27-55"></a><span class="co"> * keep the two body elements in sync </span></span> <span><a aria-hidden="true" href="#cb27-56"></a><span class="co"> * **************************************************************************/</span></span> <span><a aria-hidden="true" href="#cb27-57"></a></span> <span><a aria-hidden="true" href="#cb27-58"></a>div<span class="fu">.row</span><span class="op">,</span></span> <span><a aria-hidden="true" href="#cb27-59"></a>body header<span class="op">,</span> </span> <span><a aria-hidden="true" href="#cb27-60"></a>body main { </span> <span><a aria-hidden="true" href="#cb27-61"></a> <span class="kw">min-height</span>: <span class="dv">100</span><span class="dt">px</span><span class="op">;</span> </span> <span><a aria-hidden="true" href="#cb27-62"></a> <span class="kw">padding</span>: <span class="dv">5</span><span class="dt">px</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-63"></a></span> <span><a aria-hidden="true" href="#cb27-64"></a> <span class="kw">background-color</span>: <span class="fu">var(</span><span class="va">--theme-background-color</span><span class="fu">)</span><span class="op">;</span> </span> <span><a aria-hidden="true" href="#cb27-65"></a> <span class="kw">background-repeat</span>: <span class="dv">no-repeat</span><span class="op">;</span> </span> <span><a aria-hidden="true" href="#cb27-66"></a> <span class="kw">background-position</span>: <span class="dv">top</span> <span class="dv">center</span><span class="op">;</span> </span> <span><a aria-hidden="true" href="#cb27-67"></a> <span class="kw">background-size</span>: <span class="bu">auto</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-68"></a>}</span> <span><a aria-hidden="true" href="#cb27-69"></a></span> <span><a aria-hidden="true" href="#cb27-70"></a>body header nav { </span> <span><a aria-hidden="true" href="#cb27-71"></a> <span class="kw">padding</span>: <span class="dv">0</span> <span class="dv">0</span> <span class="dv">0</span> <span class="dv">0</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-72"></a> <span class="co">/* background: #ddcc99; */</span></span> <span><a aria-hidden="true" href="#cb27-73"></a>}</span> <span><a aria-hidden="true" href="#cb27-74"></a></span> <span><a aria-hidden="true" href="#cb27-75"></a><span class="co">/* **************************************************************************</span></span> <span><a aria-hidden="true" href="#cb27-76"></a><span class="co"> * The tag &lt;figure&gt; comes with build in padding,</span></span> <span><a aria-hidden="true" href="#cb27-77"></a><span class="co"> * but we have to have the same for the article.</span></span> <span><a aria-hidden="true" href="#cb27-78"></a><span class="co"> *</span></span> <span><a aria-hidden="true" href="#cb27-79"></a><span class="co"> * These styles keep the respectve block elements horizontally alligned.</span></span> <span><a aria-hidden="true" href="#cb27-80"></a><span class="co"> *</span></span> <span><a aria-hidden="true" href="#cb27-81"></a><span class="co"> * ==== MEDIA SCREEN Variants ====</span></span> <span><a aria-hidden="true" href="#cb27-82"></a><span class="co"> * **************************************************************************/</span></span> <span><a aria-hidden="true" href="#cb27-83"></a></span> <span><a aria-hidden="true" href="#cb27-84"></a><span class="im">@media</span> <span class="dv">screen</span> <span class="kw">and</span> (<span class="kw">min-width</span>: <span class="dv">641</span><span class="dt">px</span>) {</span> <span><a aria-hidden="true" href="#cb27-85"></a> body div div<span class="op">,</span> <span class="co">/* yacy search */</span></span> <span><a aria-hidden="true" href="#cb27-86"></a> header figure<span class="op">,</span> </span> <span><a aria-hidden="true" href="#cb27-87"></a> header nav<span class="op">,</span> </span> <span><a aria-hidden="true" href="#cb27-88"></a> header hr<span class="op">,</span> </span> <span><a aria-hidden="true" href="#cb27-89"></a> <span class="co">/* main&gt;h3 is used in the archive.html*/</span></span> <span><a aria-hidden="true" href="#cb27-90"></a> main article<span class="op">,</span></span> <span><a aria-hidden="true" href="#cb27-91"></a> main<span class="op">&gt;</span>h3 {</span> <span><a aria-hidden="true" href="#cb27-92"></a> <span class="kw">display</span>: <span class="dv">block</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-93"></a> <span class="kw">margin</span>: <span class="dv">1</span><span class="dt">em</span> <span class="dv">3</span><span class="dt">em</span> <span class="dv">1</span><span class="dt">em</span> <span class="dv">3</span><span class="dt">em</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-94"></a> <span class="co">/* border-style: dotted; </span></span> <span><a aria-hidden="true" href="#cb27-95"></a><span class="co"> * border-width: 2px; */</span></span> <span><a aria-hidden="true" href="#cb27-96"></a> }</span> <span><a aria-hidden="true" href="#cb27-97"></a> main<span class="op">&gt;</span>h1 {</span> <span><a aria-hidden="true" href="#cb27-98"></a> <span class="kw">display</span>: <span class="dv">block</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-99"></a> <span class="kw">margin</span>: <span class="dv">0.6</span><span class="dt">em</span> <span class="dv">1.8</span><span class="dt">em</span> <span class="dv">0.6</span><span class="dt">em</span> <span class="dv">1.8</span><span class="dt">em</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-100"></a> }</span> <span><a aria-hidden="true" href="#cb27-101"></a> <span class="fu">.searchinput</span> {</span> <span><a aria-hidden="true" href="#cb27-102"></a> <span class="kw">max-width</span>: <span class="dv">600</span><span class="dt">px</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-103"></a> }</span> <span><a aria-hidden="true" href="#cb27-104"></a>}</span> <span><a aria-hidden="true" href="#cb27-105"></a></span> <span><a aria-hidden="true" href="#cb27-106"></a><span class="im">@media</span> <span class="dv">screen</span> <span class="kw">and</span> (<span class="kw">max-width</span>: <span class="dv">640</span><span class="dt">px</span>) {</span> <span><a aria-hidden="true" href="#cb27-107"></a> body div div<span class="op">,</span> <span class="co">/* yacy search */</span></span> <span><a aria-hidden="true" href="#cb27-108"></a> header figure<span class="op">,</span> </span> <span><a aria-hidden="true" href="#cb27-109"></a> header nav<span class="op">,</span> </span> <span><a aria-hidden="true" href="#cb27-110"></a> header hr<span class="op">,</span> </span> <span><a aria-hidden="true" href="#cb27-111"></a> <span class="co">/* main&gt;h3 is used in the archive.html*/</span></span> <span><a aria-hidden="true" href="#cb27-112"></a> main article<span class="op">,</span></span> <span><a aria-hidden="true" href="#cb27-113"></a> main<span class="op">&gt;</span>h3 {</span> <span><a aria-hidden="true" href="#cb27-114"></a> <span class="kw">display</span>: <span class="dv">block</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-115"></a> <span class="kw">margin</span>: <span class="dv">1</span><span class="dt">em</span> <span class="dv">1</span><span class="dt">em</span> <span class="dv">1</span><span class="dt">em</span> <span class="dv">0</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-116"></a> }</span> <span><a aria-hidden="true" href="#cb27-117"></a> main<span class="op">&gt;</span>h1 {</span> <span><a aria-hidden="true" href="#cb27-118"></a> <span class="kw">display</span>: <span class="dv">block</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-119"></a> <span class="kw">margin</span>: <span class="dv">0.6</span><span class="dt">em</span> <span class="dv">0.6</span><span class="dt">em</span> <span class="dv">0.6</span><span class="dt">em</span> <span class="dv">0.2</span><span class="dt">em</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-120"></a> }</span> <span><a aria-hidden="true" href="#cb27-121"></a> <span class="fu">.searchinput</span> {</span> <span><a aria-hidden="true" href="#cb27-122"></a> <span class="kw">max-width</span>: <span class="dv">260</span><span class="dt">px</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-123"></a> }</span> <span><a aria-hidden="true" href="#cb27-124"></a>} </span> <span><a aria-hidden="true" href="#cb27-125"></a></span> <span><a aria-hidden="true" href="#cb27-126"></a><span class="co">/* **************************************************************************</span></span> <span><a aria-hidden="true" href="#cb27-127"></a><span class="co"> * ==== </span><span class="re">END</span><span class="co"> OF MEDIA SCREEN Variants ====</span></span> <span><a aria-hidden="true" href="#cb27-128"></a><span class="co"> * **************************************************************************/</span></span> <span><a aria-hidden="true" href="#cb27-129"></a></span> <span><a aria-hidden="true" href="#cb27-130"></a><span class="co">/* the main content is the article */</span></span> <span><a aria-hidden="true" href="#cb27-131"></a>article { </span> <span><a aria-hidden="true" href="#cb27-132"></a> <span class="kw">display</span>: <span class="dv">block</span><span class="op">;</span> </span> <span><a aria-hidden="true" href="#cb27-133"></a>}</span> <span><a aria-hidden="true" href="#cb27-134"></a></span> <span><a aria-hidden="true" href="#cb27-135"></a><span class="co">/* **************************************************************************</span></span> <span><a aria-hidden="true" href="#cb27-136"></a><span class="co">/* ==== all about headlines ====</span></span> <span><a aria-hidden="true" href="#cb27-137"></a><span class="co"> * **************************************************************************/</span></span> <span><a aria-hidden="true" href="#cb27-138"></a><span class="co">/* Ich glaube nicht, dass ich a tags unter die Überschriften legen werde.</span></span> <span><a aria-hidden="true" href="#cb27-139"></a><span class="co"> * h1 a, h2 a, h3 a, h4 a, h5 a, h6 a { text-decoration: none; } */</span></span> <span><a aria-hidden="true" href="#cb27-140"></a>h1<span class="op">,</span> h2<span class="op">,</span> h3<span class="op">,</span> h4<span class="op">,</span> h5<span class="op">,</span> h6 </span> <span><a aria-hidden="true" href="#cb27-141"></a>{ </span> <span><a aria-hidden="true" href="#cb27-142"></a> <span class="kw">line-height</span>: <span class="dv">1.1</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-143"></a> <span class="kw">margin</span>: <span class="dv">0</span><span class="op">;</span> </span> <span><a aria-hidden="true" href="#cb27-144"></a> <span class="kw">padding</span>: <span class="dv">1</span><span class="dt">em</span> <span class="dv">0</span> <span class="dv">0.5</span><span class="dt">em</span> <span class="dv">0</span><span class="op">;</span> </span> <span><a aria-hidden="true" href="#cb27-145"></a></span> <span><a aria-hidden="true" href="#cb27-146"></a> <span class="kw">color</span>: <span class="fu">var(</span><span class="va">--theme-color</span><span class="fu">)</span><span class="op">;</span> </span> <span><a aria-hidden="true" href="#cb27-147"></a> <span class="kw">font-family</span>: <span class="fu">var(</span><span class="va">--theme-font-family</span><span class="fu">)</span><span class="op">;</span> </span> <span><a aria-hidden="true" href="#cb27-148"></a> <span class="kw">font-weight</span>: <span class="dv">bold</span><span class="op">;</span> </span> <span><a aria-hidden="true" href="#cb27-149"></a>}</span> <span><a aria-hidden="true" href="#cb27-150"></a></span> <span><a aria-hidden="true" href="#cb27-151"></a>h1 { <span class="kw">font-size</span>: <span class="dv">1.8</span><span class="dt">em</span><span class="op">;</span> }</span> <span><a aria-hidden="true" href="#cb27-152"></a>h2 { <span class="kw">font-size</span>: <span class="dv">1.6</span><span class="dt">em</span><span class="op">;</span> }</span> <span><a aria-hidden="true" href="#cb27-153"></a>h3 { <span class="kw">font-size</span>: <span class="dv">1.4</span><span class="dt">em</span><span class="op">;</span> }</span> <span><a aria-hidden="true" href="#cb27-154"></a>h4 { <span class="kw">font-size</span>: <span class="dv">1.2</span><span class="dt">em</span><span class="op">;</span> }</span> <span><a aria-hidden="true" href="#cb27-155"></a>h5<span class="op">,</span> h6 { <span class="kw">font-size</span>: <span class="dv">1</span><span class="dt">em</span><span class="op">;</span> }</span> <span><a aria-hidden="true" href="#cb27-156"></a></span> <span><a aria-hidden="true" href="#cb27-157"></a><span class="co">/* Newspaper Style First Letter of First Paragraph Upper-Case */</span></span> <span><a aria-hidden="true" href="#cb27-158"></a>article<span class="op">&gt;</span>p<span class="in">:first-of-type::first-letter</span><span class="op">,</span></span> <span><a aria-hidden="true" href="#cb27-159"></a>hr<span class="op">+</span>p<span class="in">::first-letter</span><span class="op">,</span></span> <span><a aria-hidden="true" href="#cb27-160"></a>h2<span class="op">+</span>p<span class="in">::first-letter</span><span class="op">,</span></span> <span><a aria-hidden="true" href="#cb27-161"></a>h3<span class="op">+</span>p<span class="in">::first-letter</span><span class="op">,</span></span> <span><a aria-hidden="true" href="#cb27-162"></a>h4<span class="op">+</span>p<span class="in">::first-letter</span> { </span> <span><a aria-hidden="true" href="#cb27-163"></a> <span class="kw">font-family</span>: <span class="dv">serif</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-164"></a> <span class="kw">font-size</span>: <span class="dv">1.8</span><span class="dt">em</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-165"></a> <span class="kw">font-weight</span>: <span class="dv">bold</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-166"></a>}</span> <span><a aria-hidden="true" href="#cb27-167"></a></span> <span><a aria-hidden="true" href="#cb27-168"></a><span class="co">/* **************************************************************************</span></span> <span><a aria-hidden="true" href="#cb27-169"></a><span class="co"> * ==== Article Header ====</span></span> <span><a aria-hidden="true" href="#cb27-170"></a><span class="co"> * - h1 headline</span></span> <span><a aria-hidden="true" href="#cb27-171"></a><span class="co"> * - address information</span></span> <span><a aria-hidden="true" href="#cb27-172"></a><span class="co"> * - page qr-code</span></span> <span><a aria-hidden="true" href="#cb27-173"></a><span class="co"> * - licence information</span></span> <span><a aria-hidden="true" href="#cb27-174"></a><span class="co"> * - audio player</span></span> <span><a aria-hidden="true" href="#cb27-175"></a><span class="co"> * **************************************************************************/</span></span> <span><a aria-hidden="true" href="#cb27-176"></a></span> <span><a aria-hidden="true" href="#cb27-177"></a>article header {</span> <span><a aria-hidden="true" href="#cb27-178"></a> <span class="kw">min-height</span>: <span class="dv">0</span> </span> <span><a aria-hidden="true" href="#cb27-179"></a>}</span> <span><a aria-hidden="true" href="#cb27-180"></a></span> <span><a aria-hidden="true" href="#cb27-181"></a>article header h1 {</span> <span><a aria-hidden="true" href="#cb27-182"></a> <span class="kw">padding</span>: <span class="dv">0</span> <span class="dv">0</span> <span class="dv">0.2</span><span class="dt">em</span> <span class="dv">0</span><span class="op">;</span> </span> <span><a aria-hidden="true" href="#cb27-183"></a>}</span> <span><a aria-hidden="true" href="#cb27-184"></a></span> <span><a aria-hidden="true" href="#cb27-185"></a>article header div {</span> <span><a aria-hidden="true" href="#cb27-186"></a> <span class="kw">color</span>: <span class="fu">var(</span><span class="va">--theme-meta-color</span><span class="fu">)</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-187"></a> <span class="kw">font-size</span>: <span class="dv">0.8</span><span class="dt">em</span><span class="op">;</span> </span> <span><a aria-hidden="true" href="#cb27-188"></a> <span class="kw">padding</span>: <span class="dv">0</span> <span class="dv">0</span> <span class="dv">1</span><span class="dt">em</span> <span class="dv">0</span><span class="op">;</span> </span> <span><a aria-hidden="true" href="#cb27-189"></a>}</span> <span><a aria-hidden="true" href="#cb27-190"></a></span> <span><a aria-hidden="true" href="#cb27-191"></a></span> <span><a aria-hidden="true" href="#cb27-192"></a><span class="co">/* The browser decided, that address gets rendered italic,</span></span> <span><a aria-hidden="true" href="#cb27-193"></a><span class="co"> * but we do not want this */</span></span> <span><a aria-hidden="true" href="#cb27-194"></a>article header time<span class="op">,</span></span> <span><a aria-hidden="true" href="#cb27-195"></a>article header address {</span> <span><a aria-hidden="true" href="#cb27-196"></a> <span class="kw">padding-right</span>: <span class="dv">20</span><span class="dt">px</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-197"></a> <span class="kw">display</span>: <span class="dv">inline</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-198"></a> <span class="kw">font</span>: <span class="fu">var(</span><span class="va">--theme-font</span><span class="fu">)</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-199"></a> <span class="kw">font-size</span>:<span class="bu">inherit</span></span> <span><a aria-hidden="true" href="#cb27-200"></a>}</span> <span><a aria-hidden="true" href="#cb27-201"></a></span> <span><a aria-hidden="true" href="#cb27-202"></a><span class="co">/* **************************************************************************</span></span> <span><a aria-hidden="true" href="#cb27-203"></a><span class="co"> * ==== Article Block Elements</span></span> <span><a aria-hidden="true" href="#cb27-204"></a><span class="co"> * **************************************************************************/</span></span> <span><a aria-hidden="true" href="#cb27-205"></a>p {</span> <span><a aria-hidden="true" href="#cb27-206"></a> <span class="kw">margin</span>: <span class="dv">0</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-207"></a> <span class="kw">font-size</span>: <span class="dv">1</span><span class="dt">em</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-208"></a> <span class="kw">padding</span>: <span class="dv">0</span> <span class="dv">0</span> <span class="dv">1</span><span class="dt">em</span> <span class="dv">0</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-209"></a>}</span> <span><a aria-hidden="true" href="#cb27-210"></a></span> <span><a aria-hidden="true" href="#cb27-211"></a>p<span class="in">:last-child</span></span> <span><a aria-hidden="true" href="#cb27-212"></a>{</span> <span><a aria-hidden="true" href="#cb27-213"></a> <span class="kw">padding-bottom</span>: <span class="dv">0</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-214"></a>}</span> <span><a aria-hidden="true" href="#cb27-215"></a></span> <span><a aria-hidden="true" href="#cb27-216"></a>table th {</span> <span><a aria-hidden="true" href="#cb27-217"></a> <span class="kw">background</span>: <span class="cn">#ddd</span><span class="op">;</span> </span> <span><a aria-hidden="true" href="#cb27-218"></a> <span class="kw">border-right</span>: <span class="dv">1</span><span class="dt">px</span> <span class="dv">solid</span> <span class="cn">#fff</span><span class="op">;</span> </span> <span><a aria-hidden="true" href="#cb27-219"></a> <span class="kw">padding</span>: <span class="dv">10</span><span class="dt">px</span> <span class="dv">20</span><span class="dt">px</span><span class="op">;</span> </span> <span><a aria-hidden="true" href="#cb27-220"></a>}</span> <span><a aria-hidden="true" href="#cb27-221"></a></span> <span><a aria-hidden="true" href="#cb27-222"></a>table tr th<span class="in">:last-child</span> {</span> <span><a aria-hidden="true" href="#cb27-223"></a> <span class="kw">border-right</span>: <span class="dv">1</span><span class="dt">px</span> <span class="dv">solid</span> <span class="cn">#ddd</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-224"></a>}</span> <span><a aria-hidden="true" href="#cb27-225"></a></span> <span><a aria-hidden="true" href="#cb27-226"></a>table td {</span> <span><a aria-hidden="true" href="#cb27-227"></a> <span class="kw">padding</span>: <span class="dv">5</span><span class="dt">px</span> <span class="dv">20</span><span class="dt">px</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-228"></a> <span class="kw">border</span>: <span class="dv">1</span><span class="dt">px</span> <span class="dv">solid</span> <span class="cn">#ddd</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-229"></a>}</span> <span><a aria-hidden="true" href="#cb27-230"></a></span> <span><a aria-hidden="true" href="#cb27-231"></a><span class="co">/* **************************************************************************</span></span> <span><a aria-hidden="true" href="#cb27-232"></a><span class="co"> * ==== Figures in the header and in the article ====</span></span> <span><a aria-hidden="true" href="#cb27-233"></a><span class="co"> * **************************************************************************/</span></span> <span><a aria-hidden="true" href="#cb27-234"></a></span> <span><a aria-hidden="true" href="#cb27-235"></a>figure img { <span class="kw">width</span>: <span class="dv">100</span><span class="dt">%</span><span class="op">;</span> <span class="kw">height</span>: <span class="bu">auto</span><span class="op">;</span> }</span> <span><a aria-hidden="true" href="#cb27-236"></a>figure audio { <span class="kw">width</span>: <span class="dv">50</span><span class="dt">%</span><span class="op">;</span> <span class="kw">height</span>: <span class="bu">auto</span><span class="op">;</span> <span class="kw">min-height</span>:<span class="dv">2</span><span class="dt">em</span><span class="op">;</span>}</span> <span><a aria-hidden="true" href="#cb27-237"></a>header figure figcaption { <span class="kw">font</span>: <span class="fu">var(</span><span class="va">--theme-font</span><span class="fu">)</span><span class="op">;</span> <span class="kw">font-size</span>: <span class="dv">1</span><span class="dt">em</span><span class="op">;</span> </span> <span><a aria-hidden="true" href="#cb27-238"></a><span class="kw">color</span>: <span class="fu">var(</span><span class="va">--theme-color</span><span class="fu">)</span><span class="op">;</span> <span class="kw">font-weight</span>: <span class="dv">bold</span>}</span> <span><a aria-hidden="true" href="#cb27-239"></a>article figure { <span class="kw">margin</span>: <span class="dv">10</span><span class="dt">px</span> }</span> <span><a aria-hidden="true" href="#cb27-240"></a>figure figcaption { <span class="kw">font</span>: <span class="fu">var(</span><span class="va">--theme-font</span><span class="fu">)</span><span class="op">;</span> <span class="kw">font-size</span>: <span class="dv">0.8</span><span class="dt">em</span><span class="op">;</span> </span> <span><a aria-hidden="true" href="#cb27-241"></a><span class="kw">color</span>: <span class="fu">var(</span><span class="va">--theme-color</span><span class="fu">)</span><span class="op">;</span> <span class="kw">font-style</span>: <span class="dv">italic</span><span class="op">;</span> <span class="kw">padding</span>: <span class="dv">2</span><span class="dt">px</span><span class="op">;</span>}</span> <span><a aria-hidden="true" href="#cb27-242"></a></span> <span><a aria-hidden="true" href="#cb27-243"></a>article header div figure { <span class="kw">display</span>: <span class="dv">Inline</span><span class="op">;</span> }</span> <span><a aria-hidden="true" href="#cb27-244"></a>article header div figure img { <span class="kw">width</span>: <span class="dv">50</span><span class="dt">px</span><span class="op">;</span> }</span> <span><a aria-hidden="true" href="#cb27-245"></a>article header div figure figcaption { <span class="kw">display</span>: <span class="dv">Inline</span><span class="op">;</span> <span class="kw">width</span>: <span class="dv">150</span><span class="dt">px</span> }</span> <span><a aria-hidden="true" href="#cb27-246"></a>article header div figure audio { <span class="kw">margin</span>: <span class="dv">.5</span><span class="dt">em</span> <span class="dv">.5</span><span class="dt">em</span> <span class="dv">.5</span><span class="dt">em</span> <span class="dv">.5</span><span class="dt">em</span><span class="op">;</span> }</span> <span><a aria-hidden="true" href="#cb27-247"></a></span> <span><a aria-hidden="true" href="#cb27-248"></a><span class="co">/* **************************************************************************</span></span> <span><a aria-hidden="true" href="#cb27-249"></a><span class="co"> * ==== Navigation in the header ====</span></span> <span><a aria-hidden="true" href="#cb27-250"></a><span class="co"> * **************************************************************************/</span></span> <span><a aria-hidden="true" href="#cb27-251"></a></span> <span><a aria-hidden="true" href="#cb27-252"></a>header<span class="op">&gt;</span>nav<span class="op">&gt;</span>a {</span> <span><a aria-hidden="true" href="#cb27-253"></a> <span class="kw">font-size</span>: <span class="dv">1.2</span><span class="dt">em</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-254"></a> <span class="kw">padding</span>: <span class="dv">0</span> <span class="dv">0.5</span><span class="dt">em</span> <span class="dv">0</span> <span class="dv">0</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-255"></a> <span class="kw">display</span>: inline-grid<span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-256"></a> <span class="kw">grid-template-columns</span>: <span class="dv">30</span><span class="dt">px</span> <span class="bu">auto</span> <span class="bu">auto</span> <span class="bu">auto</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-257"></a>}</span> <span><a aria-hidden="true" href="#cb27-258"></a></span> <span><a aria-hidden="true" href="#cb27-259"></a>header<span class="op">&gt;</span>nav<span class="op">&gt;</span>a<span class="op">&gt;</span>img {</span> <span><a aria-hidden="true" href="#cb27-260"></a> <span class="kw">width</span>: <span class="dv">24</span><span class="dt">px</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-261"></a> <span class="kw">vertical-align</span>: <span class="dv">sub</span><span class="op">;</span> </span> <span><a aria-hidden="true" href="#cb27-262"></a>}</span> <span><a aria-hidden="true" href="#cb27-263"></a></span> <span><a aria-hidden="true" href="#cb27-264"></a>header<span class="op">&gt;</span>nav<span class="op">&gt;</span>form {</span> <span><a aria-hidden="true" href="#cb27-265"></a> <span class="kw">display</span>: <span class="dv">inline</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-266"></a> <span class="kw">padding</span>: <span class="dv">0</span> <span class="dv">0.5</span><span class="dt">em</span> <span class="dv">0</span> <span class="dv">0</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-267"></a> <span class="kw">margin</span>: <span class="dv">0</span> <span class="dv">0</span> <span class="dv">0</span> <span class="dv">0</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-268"></a>}</span> <span><a aria-hidden="true" href="#cb27-269"></a></span> <span><a aria-hidden="true" href="#cb27-270"></a>header<span class="op">&gt;</span>nav<span class="op">&gt;</span>form<span class="op">&gt;</span>input{</span> <span><a aria-hidden="true" href="#cb27-271"></a> <span class="kw">font</span>: <span class="fu">var(</span><span class="va">--theme-font</span><span class="fu">)</span><span class="op">;</span> </span> <span><a aria-hidden="true" href="#cb27-272"></a> <span class="kw">letter-spacing</span>: <span class="fu">var(</span><span class="va">--theme-letter-spacing</span><span class="fu">)</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-273"></a> <span class="kw">font-size</span>: <span class="dv">1</span><span class="dt">em</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-274"></a> <span class="kw">vertical-align</span>: <span class="dv">super</span><span class="op">;</span> </span> <span><a aria-hidden="true" href="#cb27-275"></a> <span class="kw">padding</span>: <span class="dv">0</span> <span class="dv">0</span> <span class="dv">0</span> <span class="dv">0</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-276"></a> <span class="kw">margin</span>: <span class="dv">0</span> <span class="dv">0</span> <span class="dv">0</span> <span class="dv">0</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-277"></a> <span class="kw">border-color</span>: <span class="fu">var(</span><span class="va">--theme-color</span><span class="fu">)</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-278"></a>}</span> <span><a aria-hidden="true" href="#cb27-279"></a></span> <span><a aria-hidden="true" href="#cb27-280"></a><span class="co">/* context break is meta information */</span></span> <span><a aria-hidden="true" href="#cb27-281"></a>hr {</span> <span><a aria-hidden="true" href="#cb27-282"></a> <span class="kw">height</span>:<span class="dv">1</span><span class="dt">px</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-283"></a> <span class="kw">border-width</span>:<span class="dv">0</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-284"></a> <span class="kw">background-color</span>: <span class="fu">var(</span><span class="va">--theme-meta-color</span><span class="fu">)</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-285"></a>}</span> <span><a aria-hidden="true" href="#cb27-286"></a></span> <span><a aria-hidden="true" href="#cb27-287"></a><span class="co">/* **************************************************************************</span></span> <span><a aria-hidden="true" href="#cb27-288"></a><span class="co"> * inline HTML TAGS</span></span> <span><a aria-hidden="true" href="#cb27-289"></a><span class="co"> * **************************************************************************/</span></span> <span><a aria-hidden="true" href="#cb27-290"></a></span> <span><a aria-hidden="true" href="#cb27-291"></a>pre {</span> <span><a aria-hidden="true" href="#cb27-292"></a> <span class="kw">background</span>: <span class="cn">#f5f5f5</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-293"></a> <span class="kw">border</span>: <span class="dv">1</span><span class="dt">px</span> <span class="dv">solid</span> <span class="cn">#ddd</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-294"></a> <span class="kw">padding</span>: <span class="dv">10</span><span class="dt">px</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-295"></a> <span class="kw">text-shadow</span>: <span class="dv">1</span><span class="dt">px</span> <span class="dv">1</span><span class="dt">px</span> <span class="fu">rgba(</span><span class="dv">255</span><span class="op">,</span> <span class="dv">255</span><span class="op">,</span> <span class="dv">255</span><span class="op">,</span> <span class="dv">0.4</span><span class="fu">)</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-296"></a> <span class="kw">font-size</span>: <span class="dv">0.8</span><span class="dt">em</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-297"></a> <span class="kw">line-height</span>: <span class="dv">1.25</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-298"></a> <span class="kw">margin</span>: <span class="dv">0</span> <span class="dv">0</span> <span class="dv">1</span><span class="dt">em</span> <span class="dv">0</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-299"></a> <span class="kw">overflow</span>: <span class="bu">auto</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-300"></a>}</span> <span><a aria-hidden="true" href="#cb27-301"></a></span> <span><a aria-hidden="true" href="#cb27-302"></a>sup<span class="op">,</span> sub { </span> <span><a aria-hidden="true" href="#cb27-303"></a> <span class="kw">font-size</span>: <span class="dv">0.75</span><span class="dt">em</span><span class="op">;</span> </span> <span><a aria-hidden="true" href="#cb27-304"></a> <span class="kw">height</span>: <span class="dv">0</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-305"></a> <span class="kw">line-height</span>: <span class="dv">0</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-306"></a> <span class="kw">position</span>: <span class="dv">relative</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-307"></a> <span class="kw">vertical-align</span>: <span class="dv">baseline</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-308"></a>}</span> <span><a aria-hidden="true" href="#cb27-309"></a></span> <span><a aria-hidden="true" href="#cb27-310"></a>sup {</span> <span><a aria-hidden="true" href="#cb27-311"></a> <span class="kw">bottom</span>: <span class="dv">1</span><span class="dt">ex</span><span class="op">;</span> </span> <span><a aria-hidden="true" href="#cb27-312"></a>}</span> <span><a aria-hidden="true" href="#cb27-313"></a></span> <span><a aria-hidden="true" href="#cb27-314"></a>sub {</span> <span><a aria-hidden="true" href="#cb27-315"></a> <span class="kw">top</span>: <span class="dv">1</span><span class="dt">ex</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-316"></a>}</span> <span><a aria-hidden="true" href="#cb27-317"></a></span> <span><a aria-hidden="true" href="#cb27-318"></a>small { </span> <span><a aria-hidden="true" href="#cb27-319"></a> <span class="kw">font-size</span>: <span class="dv">0.75</span><span class="dt">em</span> </span> <span><a aria-hidden="true" href="#cb27-320"></a>}</span> <span><a aria-hidden="true" href="#cb27-321"></a></span> <span><a aria-hidden="true" href="#cb27-322"></a></span> <span><a aria-hidden="true" href="#cb27-323"></a><span class="co">/* **************************************************************************</span></span> <span><a aria-hidden="true" href="#cb27-324"></a><span class="co"> * ==== Navigation and their targets ====</span></span> <span><a aria-hidden="true" href="#cb27-325"></a><span class="co"> * **************************************************************************/</span></span> <span><a aria-hidden="true" href="#cb27-326"></a></span> <span><a aria-hidden="true" href="#cb27-327"></a><span class="op">*</span><span class="in">:target</span> {</span> <span><a aria-hidden="true" href="#cb27-328"></a> <span class="kw">border-bottom</span>: <span class="dv">0.3</span><span class="dt">em</span> <span class="dv">solid</span> <span class="fu">var(</span><span class="va">--theme-color</span><span class="fu">)</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-329"></a>}</span> <span><a aria-hidden="true" href="#cb27-330"></a></span> <span><a aria-hidden="true" href="#cb27-331"></a>a { </span> <span><a aria-hidden="true" href="#cb27-332"></a> <span class="kw">text-decoration</span>: <span class="dv">none</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-333"></a> <span class="kw">font</span>: <span class="fu">var(</span><span class="va">--theme-font</span><span class="fu">)</span><span class="op">;</span> </span> <span><a aria-hidden="true" href="#cb27-334"></a> <span class="kw">font-size</span>: <span class="dv">1</span><span class="dt">em</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-335"></a> <span class="kw">font-weight</span>: <span class="dv">bold</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-336"></a> <span class="kw">color</span>: <span class="fu">var(</span><span class="va">--theme-color</span><span class="fu">)</span><span class="op">;</span> </span> <span><a aria-hidden="true" href="#cb27-337"></a> <span class="kw">border-width</span>: <span class="dv">0</span> <span class="dv">0</span> <span class="dv">0</span> <span class="dv">0</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-338"></a> <span class="kw">border-style</span>: <span class="dv">none</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-339"></a>}</span> <span><a aria-hidden="true" href="#cb27-340"></a>a<span class="in">:link</span> { <span class="kw">color</span>: <span class="fu">var(</span><span class="va">--theme-color</span><span class="fu">)</span><span class="op">;</span> }</span> <span><a aria-hidden="true" href="#cb27-341"></a>a<span class="in">:visited</span> { <span class="kw">color</span>: <span class="fu">var(</span><span class="va">--theme-text-color</span><span class="fu">)</span><span class="op">;</span> }</span> <span><a aria-hidden="true" href="#cb27-342"></a></span> <span><a aria-hidden="true" href="#cb27-343"></a><span class="co">/* figure:has(a:focus), */</span> <span class="co">/* Wait for CSS 4 */</span></span> <span><a aria-hidden="true" href="#cb27-344"></a>a<span class="in">:focus</span><span class="op">,</span></span> <span><a aria-hidden="true" href="#cb27-345"></a>a<span class="in">:hover</span> <span class="co">/* ,</span></span> <span><a aria-hidden="true" href="#cb27-346"></a><span class="co">a:active */</span> { </span> <span><a aria-hidden="true" href="#cb27-347"></a><span class="kw">color</span>: <span class="fu">var(</span><span class="va">--theme-background-color</span><span class="fu">)</span><span class="op">;</span> </span> <span><a aria-hidden="true" href="#cb27-348"></a><span class="kw">background-color</span>: <span class="fu">var(</span><span class="va">--theme-color</span><span class="fu">)</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-349"></a><span class="kw">outline</span>: <span class="dv">none</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-350"></a>}</span> <span><a aria-hidden="true" href="#cb27-351"></a></span> <span><a aria-hidden="true" href="#cb27-352"></a>figure a<span class="in">:focus</span><span class="op">,</span></span> <span><a aria-hidden="true" href="#cb27-353"></a>figure a<span class="in">:hover</span> { </span> <span><a aria-hidden="true" href="#cb27-354"></a> <span class="kw">color</span>: <span class="fu">var(</span><span class="va">--theme-background-color</span><span class="fu">)</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-355"></a> <span class="kw">background-color</span>: <span class="fu">var(</span><span class="va">--theme-color</span><span class="fu">)</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-356"></a> <span class="kw">outline</span>: <span class="dv">none</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-357"></a> <span class="kw">border</span>: <span class="dv">none</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-358"></a>}</span> <span><a aria-hidden="true" href="#cb27-359"></a></span> <span><a aria-hidden="true" href="#cb27-360"></a>header<span class="op">&gt;</span>div<span class="op">&gt;</span>a<span class="in">:focus</span><span class="op">,</span></span> <span><a aria-hidden="true" href="#cb27-361"></a>header<span class="op">&gt;</span>div<span class="op">&gt;</span>a<span class="in">:hover</span> {</span> <span><a aria-hidden="true" href="#cb27-362"></a> <span class="kw">background-color</span>: <span class="fu">var(</span><span class="va">--theme-background-color</span><span class="fu">)</span><span class="op">;</span> </span> <span><a aria-hidden="true" href="#cb27-363"></a> <span class="kw">color</span>: <span class="fu">var(</span><span class="va">--theme-color</span><span class="fu">)</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-364"></a> <span class="kw">outline</span>: <span class="dv">none</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-365"></a> <span class="kw">border</span>: <span class="dv">none</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-366"></a>}</span> <span><a aria-hidden="true" href="#cb27-367"></a></span> <span><a aria-hidden="true" href="#cb27-368"></a>a<span class="fu">.category</span> { <span class="kw">visibility</span>: <span class="dv">hidden</span><span class="op">;</span> }</span> <span><a aria-hidden="true" href="#cb27-369"></a></span> <span><a aria-hidden="true" href="#cb27-370"></a></span> <span><a aria-hidden="true" href="#cb27-371"></a><span class="co">/* **************************************************************************</span></span> <span><a aria-hidden="true" href="#cb27-372"></a><span class="co"> * ==== YaCy Search ====</span></span> <span><a aria-hidden="true" href="#cb27-373"></a><span class="co"> * **************************************************************************/</span></span> <span><a aria-hidden="true" href="#cb27-374"></a></span> <span><a aria-hidden="true" href="#cb27-375"></a>p<span class="fu">.urlinfo</span> <span class="in">:nth-child(2)</span><span class="op">,</span></span> <span><a aria-hidden="true" href="#cb27-376"></a>p<span class="fu">.urlinfo</span> <span class="in">:nth-child(3)</span><span class="op">,</span></span> <span><a aria-hidden="true" href="#cb27-377"></a>p<span class="fu">.urlinfo</span> <span class="in">:nth-child(4)</span><span class="op">,</span></span> <span><a aria-hidden="true" href="#cb27-378"></a>p<span class="fu">.urlinfo</span> <span class="in">:nth-child(5)</span><span class="op">,</span></span> <span><a aria-hidden="true" href="#cb27-379"></a><span class="fu">.favicon</span><span class="op">,</span></span> <span><a aria-hidden="true" href="#cb27-380"></a><span class="fu">.navbar</span><span class="op">,</span></span> <span><a aria-hidden="true" href="#cb27-381"></a><span class="fu">.starter-template</span><span class="op">,</span></span> <span><a aria-hidden="true" href="#cb27-382"></a><span class="fu">.hidden</span><span class="op">,</span></span> <span><a aria-hidden="true" href="#cb27-383"></a><span class="fu">.urlactions</span><span class="op">,</span></span> <span><a aria-hidden="true" href="#cb27-384"></a><span class="fu">.input-group-btn</span><span class="op">,</span></span> <span><a aria-hidden="true" href="#cb27-385"></a><span class="fu">.sidebar</span><span class="op">,</span></span> <span><a aria-hidden="true" href="#cb27-386"></a><span class="pp">#datehistogram</span><span class="op">,</span></span> <span><a aria-hidden="true" href="#cb27-387"></a><span class="pp">#api</span> {</span> <span><a aria-hidden="true" href="#cb27-388"></a> <span class="kw">display</span>: <span class="dv">none</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-389"></a>}</span> <span><a aria-hidden="true" href="#cb27-390"></a></span> <span><a aria-hidden="true" href="#cb27-391"></a>div {</span> <span><a aria-hidden="true" href="#cb27-392"></a> <span class="kw">min-height</span>: <span class="dv">10</span><span class="dt">px</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-393"></a> <span class="kw">margin</span>: <span class="dv">0</span> <span class="dv">0</span> <span class="dv">0</span> <span class="dv">0</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-394"></a> <span class="kw">padding</span>: <span class="dv">0</span> <span class="dv">0</span> <span class="dv">0</span> <span class="dv">0</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-395"></a>}</span> <span><a aria-hidden="true" href="#cb27-396"></a></span> <span><a aria-hidden="true" href="#cb27-397"></a>span<span class="pp">#resNav</span> ul li {</span> <span><a aria-hidden="true" href="#cb27-398"></a> <span class="kw">display</span>: <span class="dv">inline</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-399"></a> <span class="kw">font-size</span>: <span class="dv">1.4</span><span class="dt">em</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-400"></a>}</span> <span><a aria-hidden="true" href="#cb27-401"></a></span> <span><a aria-hidden="true" href="#cb27-402"></a><span class="fu">.searchinput</span> {</span> <span><a aria-hidden="true" href="#cb27-403"></a> <span class="kw">font</span>: <span class="fu">var(</span><span class="va">--theme-font</span><span class="fu">)</span><span class="op">;</span> </span> <span><a aria-hidden="true" href="#cb27-404"></a> <span class="kw">letter-spacing</span>: <span class="fu">var(</span><span class="va">--theme-letter-spacing</span><span class="fu">)</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-405"></a> <span class="kw">font-size</span>: <span class="dv">1</span><span class="dt">em</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-406"></a> <span class="kw">border-color</span>: <span class="fu">var(</span><span class="va">--theme-color</span><span class="fu">)</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-407"></a> <span class="kw">outline</span>: <span class="dv">5</span><span class="dt">px</span> <span class="dv">solid</span> <span class="fu">var(</span><span class="va">--theme-meta-color</span><span class="fu">)</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-408"></a>}</span> <span><a aria-hidden="true" href="#cb27-409"></a></span> <span><a aria-hidden="true" href="#cb27-410"></a><span class="fu">.linktitle</span><span class="op">,</span></span> <span><a aria-hidden="true" href="#cb27-411"></a><span class="fu">.pagination</span> {</span> <span><a aria-hidden="true" href="#cb27-412"></a> <span class="kw">font-size</span>: <span class="dv">1.4</span><span class="dt">em</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-413"></a> <span class="kw">border-top</span>: <span class="dv">2</span><span class="dt">px</span> <span class="dv">solid</span> <span class="fu">var(</span><span class="va">--theme-meta-color</span><span class="fu">)</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-414"></a>}</span> <span><a aria-hidden="true" href="#cb27-415"></a></span> <span><a aria-hidden="true" href="#cb27-416"></a><span class="co">/* **************************************************************************</span></span> <span><a aria-hidden="true" href="#cb27-417"></a><span class="co"> * ==== syntaxhighlight ====</span></span> <span><a aria-hidden="true" href="#cb27-418"></a><span class="co"> * CSS as created in the html style-element by WeasyOrint for syntaxhighlight</span></span> <span><a aria-hidden="true" href="#cb27-419"></a><span class="co"> * Changes for the print version need to be applied in fspdf.css</span></span> <span><a aria-hidden="true" href="#cb27-420"></a><span class="co"> * Changes for the browser version need to be applied at the end of this file.</span></span> <span><a aria-hidden="true" href="#cb27-421"></a><span class="co"> * **************************************************************************/</span></span> <span><a aria-hidden="true" href="#cb27-422"></a></span> <span><a aria-hidden="true" href="#cb27-423"></a>code{<span class="kw">white-space</span>: <span class="dv">pre-wrap</span><span class="op">;</span>}</span> <span><a aria-hidden="true" href="#cb27-424"></a>span<span class="fu">.smallcaps</span>{<span class="kw">font-variant</span>: <span class="dv">small-caps</span><span class="op">;</span>}</span> <span><a aria-hidden="true" href="#cb27-425"></a>span<span class="fu">.underline</span>{<span class="kw">text-decoration</span>: <span class="dv">underline</span><span class="op">;</span>}</span> <span><a aria-hidden="true" href="#cb27-426"></a>div<span class="fu">.column</span>{<span class="kw">display</span>: <span class="dv">inline-block</span><span class="op">;</span> <span class="kw">vertical-align</span>: <span class="dv">top</span><span class="op">;</span> <span class="kw">width</span>: <span class="dv">50</span><span class="dt">%</span><span class="op">;</span>}</span> <span><a aria-hidden="true" href="#cb27-427"></a>div<span class="fu">.hanging-indent</span>{<span class="kw">margin-left</span>: <span class="dv">1.5</span><span class="dt">em</span><span class="op">;</span> <span class="kw">text-indent</span>: <span class="dv">-1.5</span><span class="dt">em</span><span class="op">;</span>}</span> <span><a aria-hidden="true" href="#cb27-428"></a>ul<span class="fu">.task-list</span>{<span class="kw">list-style</span>: <span class="dv">none</span><span class="op">;</span>}</span> <span><a aria-hidden="true" href="#cb27-429"></a>pre <span class="op">&gt;</span> code<span class="fu">.sourceCode</span> { <span class="kw">white-space</span>: <span class="dv">pre</span><span class="op">;</span> <span class="kw">position</span>: <span class="dv">relative</span><span class="op">;</span> }</span> <span><a aria-hidden="true" href="#cb27-430"></a>pre <span class="op">&gt;</span> code<span class="fu">.sourceCode</span> <span class="op">&gt;</span> span { <span class="kw">display</span>: <span class="dv">inline-block</span><span class="op">;</span> <span class="kw">line-height</span>: <span class="dv">1.25</span><span class="op">;</span> }</span> <span><a aria-hidden="true" href="#cb27-431"></a>pre <span class="op">&gt;</span> code<span class="fu">.sourceCode</span> <span class="op">&gt;</span> span<span class="in">:empty</span> { <span class="kw">height</span>: <span class="dv">1.2</span><span class="dt">em</span><span class="op">;</span> }</span> <span><a aria-hidden="true" href="#cb27-432"></a>code<span class="fu">.sourceCode</span> <span class="op">&gt;</span> span { <span class="kw">color</span>: <span class="bu">inherit</span><span class="op">;</span> <span class="kw">text-decoration</span>: <span class="bu">inherit</span><span class="op">;</span> }</span> <span><a aria-hidden="true" href="#cb27-433"></a>div<span class="fu">.sourceCode</span> { <span class="kw">margin</span>: <span class="dv">1</span><span class="dt">em</span> <span class="dv">0</span><span class="op">;</span> }</span> <span><a aria-hidden="true" href="#cb27-434"></a>pre<span class="fu">.sourceCode</span> { <span class="kw">margin</span>: <span class="dv">0</span><span class="op">;</span> }</span> <span><a aria-hidden="true" href="#cb27-435"></a></span> <span><a aria-hidden="true" href="#cb27-436"></a><span class="im">@media</span> <span class="dv">screen</span> {</span> <span><a aria-hidden="true" href="#cb27-437"></a> div<span class="fu">.sourceCode</span> { <span class="kw">overflow</span>: <span class="bu">auto</span><span class="op">;</span> }</span> <span><a aria-hidden="true" href="#cb27-438"></a>}</span> <span><a aria-hidden="true" href="#cb27-439"></a></span> <span><a aria-hidden="true" href="#cb27-440"></a></span> <span><a aria-hidden="true" href="#cb27-441"></a><span class="im">@media</span> <span class="dv">print</span> {</span> <span><a aria-hidden="true" href="#cb27-442"></a> pre <span class="op">&gt;</span> code<span class="fu">.sourceCode</span> { <span class="kw">white-space</span>: <span class="dv">pre-wrap</span><span class="op">;</span> }</span> <span><a aria-hidden="true" href="#cb27-443"></a> pre <span class="op">&gt;</span> code<span class="fu">.sourceCode</span> <span class="op">&gt;</span> span { <span class="kw">text-indent</span>: <span class="dv">-5</span><span class="dt">em</span><span class="op">;</span> <span class="kw">padding-left</span>: <span class="dv">5</span><span class="dt">em</span><span class="op">;</span> }</span> <span><a aria-hidden="true" href="#cb27-444"></a>}</span> <span><a aria-hidden="true" href="#cb27-445"></a></span> <span><a aria-hidden="true" href="#cb27-446"></a>pre<span class="fu">.numberSource</span> code</span> <span><a aria-hidden="true" href="#cb27-447"></a> { <span class="kw">counter-reset</span>: source-line <span class="dv">0</span><span class="op">;</span> }</span> <span><a aria-hidden="true" href="#cb27-448"></a>pre<span class="fu">.numberSource</span> code <span class="op">&gt;</span> span</span> <span><a aria-hidden="true" href="#cb27-449"></a> { <span class="kw">position</span>: <span class="dv">relative</span><span class="op">;</span> <span class="kw">left</span>: <span class="dv">-4</span><span class="dt">em</span><span class="op">;</span> <span class="kw">counter-increment</span>: source-line<span class="op">;</span> }</span> <span><a aria-hidden="true" href="#cb27-450"></a>pre<span class="fu">.numberSource</span> code <span class="op">&gt;</span> span <span class="op">&gt;</span> a<span class="in">:first-child::before</span></span> <span><a aria-hidden="true" href="#cb27-451"></a> { <span class="kw">content</span>: counter<span class="fu">(</span>source-line<span class="fu">)</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-452"></a> <span class="kw">position</span>: <span class="dv">relative</span><span class="op">;</span> <span class="kw">left</span>: <span class="dv">-1</span><span class="dt">em</span><span class="op">;</span> <span class="kw">text-align</span>: <span class="dv">right</span><span class="op">;</span> <span class="kw">vertical-align</span>: <span class="dv">baseline</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-453"></a> <span class="kw">border</span>: <span class="dv">none</span><span class="op">;</span> <span class="kw">display</span>: <span class="dv">inline-block</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-454"></a> -webkit-touch-callout: <span class="dv">none</span><span class="op">;</span> <span class="kw">-webkit-user-select</span>: <span class="dv">none</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-455"></a> -khtml-user-select: <span class="dv">none</span><span class="op">;</span> <span class="kw">-moz-user-select</span>: <span class="dv">none</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-456"></a> <span class="kw">-ms-user-select</span>: <span class="dv">none</span><span class="op">;</span> <span class="kw">user-select</span>: <span class="dv">none</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-457"></a> <span class="kw">padding</span>: <span class="dv">0</span> <span class="dv">4</span><span class="dt">px</span><span class="op">;</span> <span class="kw">width</span>: <span class="dv">4</span><span class="dt">em</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-458"></a> <span class="kw">color</span>: <span class="cn">#aaaaaa</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb27-459"></a> }</span> <span><a aria-hidden="true" href="#cb27-460"></a>pre<span class="fu">.numberSource</span> { <span class="kw">margin-left</span>: <span class="dv">3</span><span class="dt">em</span><span class="op">;</span> <span class="kw">border-left</span>: <span class="dv">1</span><span class="dt">px</span> <span class="dv">solid</span> <span class="cn">#aaaaaa</span><span class="op">;</span> </span> <span><a aria-hidden="true" href="#cb27-461"></a> <span class="kw">padding-left</span>: <span class="dv">4</span><span class="dt">px</span><span class="op">;</span> }</span> <span><a aria-hidden="true" href="#cb27-462"></a>div<span class="fu">.sourceCode</span></span> <span><a aria-hidden="true" href="#cb27-463"></a> { }</span> <span><a aria-hidden="true" href="#cb27-464"></a></span> <span><a aria-hidden="true" href="#cb27-465"></a><span class="im">@media</span> <span class="dv">screen</span> {</span> <span><a aria-hidden="true" href="#cb27-466"></a> pre <span class="op">&gt;</span> code<span class="fu">.sourceCode</span> <span class="op">&gt;</span> span <span class="op">&gt;</span> a<span class="in">:first-child::before</span> { </span> <span><a aria-hidden="true" href="#cb27-467"></a> <span class="kw">text-decoration</span>: <span class="dv">underline</span><span class="op">;</span> }</span> <span><a aria-hidden="true" href="#cb27-468"></a>}</span> <span><a aria-hidden="true" href="#cb27-469"></a></span> <span><a aria-hidden="true" href="#cb27-470"></a>code span<span class="fu">.al</span> { <span class="kw">color</span>: <span class="cn">#ff0000</span><span class="op">;</span> <span class="kw">font-weight</span>: <span class="dv">bold</span><span class="op">;</span> } <span class="co">/* Alert */</span></span> <span><a aria-hidden="true" href="#cb27-471"></a>code span<span class="fu">.an</span> { <span class="kw">color</span>: <span class="cn">#60a0b0</span><span class="op">;</span> <span class="kw">font-weight</span>: <span class="dv">bold</span><span class="op">;</span> <span class="kw">font-style</span>: <span class="dv">italic</span><span class="op">;</span> </span> <span><a aria-hidden="true" href="#cb27-472"></a> } <span class="co">/* Annotation */</span></span> <span><a aria-hidden="true" href="#cb27-473"></a>code span<span class="fu">.at</span> { <span class="kw">color</span>: <span class="cn">#7d9029</span><span class="op">;</span> } <span class="co">/* Attribute */</span></span> <span><a aria-hidden="true" href="#cb27-474"></a>code span<span class="fu">.bn</span> { <span class="kw">color</span>: <span class="cn">#40a070</span><span class="op">;</span> } <span class="co">/* BaseN */</span></span> <span><a aria-hidden="true" href="#cb27-475"></a>code span<span class="fu">.bu</span> { } <span class="co">/* BuiltIn */</span></span> <span><a aria-hidden="true" href="#cb27-476"></a>code span<span class="fu">.cf</span> { <span class="kw">color</span>: <span class="cn">#007020</span><span class="op">;</span> <span class="kw">font-weight</span>: <span class="dv">bold</span><span class="op">;</span> } <span class="co">/* ControlFlow */</span></span> <span><a aria-hidden="true" href="#cb27-477"></a>code span<span class="fu">.ch</span> { <span class="kw">color</span>: <span class="cn">#4070a0</span><span class="op">;</span> } <span class="co">/* Char */</span></span> <span><a aria-hidden="true" href="#cb27-478"></a>code span<span class="fu">.cn</span> { <span class="kw">color</span>: <span class="cn">#880000</span><span class="op">;</span> } <span class="co">/* Constant */</span></span> <span><a aria-hidden="true" href="#cb27-479"></a>code span<span class="fu">.co</span> { <span class="kw">color</span>: <span class="cn">#60a0b0</span><span class="op">;</span> <span class="kw">font-style</span>: <span class="dv">italic</span><span class="op">;</span> } <span class="co">/* Comment */</span></span> <span><a aria-hidden="true" href="#cb27-480"></a>code span<span class="fu">.cv</span> { <span class="kw">color</span>: <span class="cn">#60a0b0</span><span class="op">;</span> <span class="kw">font-weight</span>: <span class="dv">bold</span><span class="op">;</span> <span class="kw">font-style</span>: <span class="dv">italic</span><span class="op">;</span> </span> <span><a aria-hidden="true" href="#cb27-481"></a> } <span class="co">/* CommentVar */</span></span> <span><a aria-hidden="true" href="#cb27-482"></a>code span<span class="fu">.do</span> { <span class="kw">color</span>: <span class="cn">#ba2121</span><span class="op">;</span> <span class="kw">font-style</span>: <span class="dv">italic</span><span class="op">;</span> } <span class="co">/* Documentation */</span></span> <span><a aria-hidden="true" href="#cb27-483"></a>code span<span class="fu">.dt</span> { <span class="kw">color</span>: <span class="cn">#902000</span><span class="op">;</span> } <span class="co">/* DataType */</span></span> <span><a aria-hidden="true" href="#cb27-484"></a>code span<span class="fu">.dv</span> { <span class="kw">color</span>: <span class="cn">#40a070</span><span class="op">;</span> } <span class="co">/* DecVal */</span></span> <span><a aria-hidden="true" href="#cb27-485"></a>code span<span class="fu">.er</span> { <span class="kw">color</span>: <span class="cn">#ff0000</span><span class="op">;</span> <span class="kw">font-weight</span>: <span class="dv">bold</span><span class="op">;</span> } <span class="co">/* Error */</span></span> <span><a aria-hidden="true" href="#cb27-486"></a>code span<span class="fu">.ex</span> { } <span class="co">/* Extension */</span></span> <span><a aria-hidden="true" href="#cb27-487"></a>code span<span class="fu">.fl</span> { <span class="kw">color</span>: <span class="cn">#40a070</span><span class="op">;</span> } <span class="co">/* Float */</span></span> <span><a aria-hidden="true" href="#cb27-488"></a>code span<span class="fu">.fu</span> { <span class="kw">color</span>: <span class="cn">#06287e</span><span class="op">;</span> } <span class="co">/* Function */</span></span> <span><a aria-hidden="true" href="#cb27-489"></a>code span<span class="fu">.im</span> { } <span class="co">/* Import */</span></span> <span><a aria-hidden="true" href="#cb27-490"></a>code span<span class="fu">.in</span> { <span class="kw">color</span>: <span class="cn">#60a0b0</span><span class="op">;</span> <span class="kw">font-weight</span>: <span class="dv">bold</span><span class="op">;</span> <span class="kw">font-style</span>: <span class="dv">italic</span><span class="op">;</span> </span> <span><a aria-hidden="true" href="#cb27-491"></a> } <span class="co">/* Information */</span></span> <span><a aria-hidden="true" href="#cb27-492"></a>code span<span class="fu">.kw</span> { <span class="kw">color</span>: <span class="cn">#007020</span><span class="op">;</span> <span class="kw">font-weight</span>: <span class="dv">bold</span><span class="op">;</span> } <span class="co">/* Keyword */</span></span> <span><a aria-hidden="true" href="#cb27-493"></a>code span<span class="fu">.op</span> { <span class="kw">color</span>: <span class="cn">#666666</span><span class="op">;</span> } <span class="co">/* Operator */</span></span> <span><a aria-hidden="true" href="#cb27-494"></a>code span<span class="fu">.ot</span> { <span class="kw">color</span>: <span class="cn">#007020</span><span class="op">;</span> } <span class="co">/* Other */</span></span> <span><a aria-hidden="true" href="#cb27-495"></a>code span<span class="fu">.pp</span> { <span class="kw">color</span>: <span class="cn">#bc7a00</span><span class="op">;</span> } <span class="co">/* Preprocessor */</span></span> <span><a aria-hidden="true" href="#cb27-496"></a>code span<span class="fu">.sc</span> { <span class="kw">color</span>: <span class="cn">#4070a0</span><span class="op">;</span> } <span class="co">/* SpecialChar */</span></span> <span><a aria-hidden="true" href="#cb27-497"></a>code span<span class="fu">.ss</span> { <span class="kw">color</span>: <span class="cn">#bb6688</span><span class="op">;</span> } <span class="co">/* SpecialString */</span></span> <span><a aria-hidden="true" href="#cb27-498"></a>code span<span class="fu">.st</span> { <span class="kw">color</span>: <span class="cn">#4070a0</span><span class="op">;</span> } <span class="co">/* String */</span></span> <span><a aria-hidden="true" href="#cb27-499"></a>code span<span class="fu">.va</span> { <span class="kw">color</span>: <span class="cn">#19177c</span><span class="op">;</span> } <span class="co">/* Variable */</span></span> <span><a aria-hidden="true" href="#cb27-500"></a>code span<span class="fu">.vs</span> { <span class="kw">color</span>: <span class="cn">#4070a0</span><span class="op">;</span> } <span class="co">/* VerbatimString */</span></span> <span><a aria-hidden="true" href="#cb27-501"></a>code span<span class="fu">.wa</span> { <span class="kw">color</span>: <span class="cn">#60a0b0</span><span class="op">;</span> <span class="kw">font-weight</span>: <span class="dv">bold</span><span class="op">;</span> <span class="kw">font-style</span>: <span class="dv">italic</span><span class="op">;</span> </span> <span><a aria-hidden="true" href="#cb27-502"></a> } <span class="co">/* Warning */</span></span> <span><a aria-hidden="true" href="#cb27-503"></a></span> <span><a aria-hidden="true" href="#cb27-504"></a><span class="co">/* **************************************************************************</span></span> <span><a aria-hidden="true" href="#cb27-505"></a><span class="co"> * ==== syntaxhighlight ====</span></span> <span><a aria-hidden="true" href="#cb27-506"></a><span class="co"> * Own Part</span></span> <span><a aria-hidden="true" href="#cb27-507"></a><span class="co"> * **************************************************************************/</span></span> <span><a aria-hidden="true" href="#cb27-508"></a>pre<span class="fu">.sourceCode</span> {</span> <span><a aria-hidden="true" href="#cb27-509"></a> <span class="kw">width</span>: <span class="dv">80</span><span class="dt">ch</span><span class="op">;</span> <span class="co">/* classic terminal width for code sections */</span></span> <span><a aria-hidden="true" href="#cb27-510"></a>}</span></code></pre> </div> <h4> MediaWiki to HTML Recapitulation </h4> <p> At this point it is possible to copy a MediaWiki title and to use it via paste into the command line: </p> <div class="sourceCode"> <pre class="sourceCode bash"><code class="sourceCode bash"><span><a aria-hidden="true" href="#cb28-1"></a><span class="ex">we</span> <span class="st">'MediaWiki title'</span></span></code></pre> </div> <p> Halt! We are missing something here. The command we is unknown to your system. But you can get rid of this problem by placing the folling line into the file </p> <p> <strong> ~/.bash_aliases </strong> </p> <div class="sourceCode"> <pre class="sourceCode bash"><code class="sourceCode bash"><span><a aria-hidden="true" href="#cb29-1"></a><span class="bu">alias</span> we=<span class="st">'~/projects/wikitools/src/export.py'</span></span></code></pre> </div> <p> This simplifies your life a lot, since you need to remember only <b> w </b> iki <b> e </b> xport has to be written as we on the command line. </p> <p> The export, if you made the default wiki configuration in your configuration file correctly, will create the file 'MediaWiki title.mediawiki' in the directory <strong> ~/projects/idee/author/ </strong> . </p> <p> It does not matter in which working directory you are, when you invoke this command. </p> <p> You can also use </p> <div class="sourceCode"> <pre class="sourceCode bash"><code class="sourceCode bash"><span><a aria-hidden="true" href="#cb30-1"></a><span class="ex">~/projects/idee</span>$ <span class="fu">git</span> add .</span> <span><a aria-hidden="true" href="#cb30-2"></a><span class="ex">~/projects/idee</span>$ <span class="fu">git</span> commit</span></code></pre> </div> <p> This will add your new MediaWiki file to the commit list and start the commit. It will trigger the invocation of the MWWorker and you will get an HTML file named 'mediawiki-title.html' placed into the directory ~/projects/idee/plain/ and opened into Firefox. </p> <p> Well, you might need to comment out some parts of the code, because some not yet implemented parts are referenced in it. </p> <h3> HTML to PDF Conversion </h3> <p> I was pretty sure that I would be able to convert the plain HTML into a portal page. Therefore the priority was PDF generation first. </p> <p> Logically I started this PDF generation using Pandoc. Quite a big part in the later chapter [Migration#Migration] reports the various problems I did run into and how I managed to solve them. I keep these parts in the documentation, since they might help one or another person to solve these problems. </p> <p> In the end I found out, that I will not find any possibility in Pandoc to get working links in the footnote section, which point back to footnote number in the text. </p> <p> This was too much functional loss and I was not able to accept it. I ended up using WeasyPrint. A lot of code commented out was required to make the results in Pandoc look ok. </p> <p> For WeasyPrint I needed to create an extra CSS file, but the result looks good, at least for my taste. </p> <p> The WeasyPrint installation description is further down in this document, there where it happened in my project, nearly at the end of all. </p> <h4> The PDFWorker </h4> <p> <strong> ~/projects/idee/generator/pdfworker.py </strong> </p> <div class="sourceCode"> <pre class="sourceCode Python"><code class="sourceCode python"><span><a aria-hidden="true" href="#cb31-1"></a><span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb31-2"></a><span class="co">PdfWorker is derived from the MsgWorker base class.</span></span> <span><a aria-hidden="true" href="#cb31-3"></a></span> <span><a aria-hidden="true" href="#cb31-4"></a><span class="co">@author: Frank Siebert</span></span> <span><a aria-hidden="true" href="#cb31-5"></a><span class="co">@license: https://creativecommons.org/publicdomain/zero/1.0/deed.en</span></span> <span><a aria-hidden="true" href="#cb31-6"></a><span class="co">@date: 2022-03-15</span></span> <span><a aria-hidden="true" href="#cb31-7"></a></span> <span><a aria-hidden="true" href="#cb31-8"></a><span class="co">The PdfWorker takes care of a worklist item placed</span></span> <span><a aria-hidden="true" href="#cb31-9"></a><span class="co">by an earlier worker.</span></span> <span><a aria-hidden="true" href="#cb31-10"></a><span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb31-11"></a><span class="im">import</span> re</span> <span><a aria-hidden="true" href="#cb31-12"></a><span class="im">from</span> pathlib <span class="im">import</span> Path</span> <span><a aria-hidden="true" href="#cb31-13"></a></span> <span><a aria-hidden="true" href="#cb31-14"></a><span class="im">from</span> bs4 <span class="im">import</span> BeautifulSoup</span> <span><a aria-hidden="true" href="#cb31-15"></a><span class="im">from</span> bs4.builder._htmlparser <span class="im">import</span> HTMLParserTreeBuilder</span> <span><a aria-hidden="true" href="#cb31-16"></a></span> <span><a aria-hidden="true" href="#cb31-17"></a><span class="im">from</span> weasyprint <span class="im">import</span> HTML</span> <span><a aria-hidden="true" href="#cb31-18"></a><span class="im">from</span> weasyprint <span class="im">import</span> CSS</span> <span><a aria-hidden="true" href="#cb31-19"></a></span> <span><a aria-hidden="true" href="#cb31-20"></a><span class="im">from</span> gitmsgdispatcher <span class="im">import</span> MsgWorker</span> <span><a aria-hidden="true" href="#cb31-21"></a><span class="im">from</span> gitmsgconstants <span class="im">import</span> GitMsgConstants <span class="im">as</span> gmc</span> <span><a aria-hidden="true" href="#cb31-22"></a><span class="co"># from pubmetadata import pageurn</span></span> <span><a aria-hidden="true" href="#cb31-23"></a></span> <span><a aria-hidden="true" href="#cb31-24"></a></span> <span><a aria-hidden="true" href="#cb31-25"></a><span class="kw">class</span> PdfWorker(MsgWorker):</span> <span><a aria-hidden="true" href="#cb31-26"></a> <span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb31-27"></a><span class="co"> The PdfWorker takes care of a worklist item placed by an earlier worker.</span></span> <span><a aria-hidden="true" href="#cb31-28"></a></span> <span><a aria-hidden="true" href="#cb31-29"></a><span class="co"> The class method makePdfWorklistItem() can be used to create a work item,</span></span> <span><a aria-hidden="true" href="#cb31-30"></a><span class="co"> which can be placed into the worklist.</span></span> <span><a aria-hidden="true" href="#cb31-31"></a></span> <span><a aria-hidden="true" href="#cb31-32"></a><span class="co"> The respective PDF is created from HTML and stored in the folder</span></span> <span><a aria-hidden="true" href="#cb31-33"></a><span class="co"> GITROOT/website/pdf/</span></span> <span><a aria-hidden="true" href="#cb31-34"></a></span> <span><a aria-hidden="true" href="#cb31-35"></a><span class="co"> Parameters</span></span> <span><a aria-hidden="true" href="#cb31-36"></a><span class="co"> ----------</span></span> <span><a aria-hidden="true" href="#cb31-37"></a><span class="co"> super: MsgWorker</span></span> <span><a aria-hidden="true" href="#cb31-38"></a><span class="co"> The MwWorker is derived from the MsgWorker.</span></span> <span><a aria-hidden="true" href="#cb31-39"></a></span> <span><a aria-hidden="true" href="#cb31-40"></a><span class="co"> Returns</span></span> <span><a aria-hidden="true" href="#cb31-41"></a><span class="co"> -------</span></span> <span><a aria-hidden="true" href="#cb31-42"></a><span class="co"> MwWorker.</span></span> <span><a aria-hidden="true" href="#cb31-43"></a></span> <span><a aria-hidden="true" href="#cb31-44"></a><span class="co"> """</span></span> <span><a aria-hidden="true" href="#cb31-45"></a></span> <span><a aria-hidden="true" href="#cb31-46"></a> <span class="co"># keys</span></span> <span><a aria-hidden="true" href="#cb31-47"></a> pdfworkitem <span class="op">=</span> <span class="st">"pdfworkitem"</span></span> <span><a aria-hidden="true" href="#cb31-48"></a> urn <span class="op">=</span> <span class="st">"urn"</span></span> <span><a aria-hidden="true" href="#cb31-49"></a> title <span class="op">=</span> <span class="st">"title"</span></span> <span><a aria-hidden="true" href="#cb31-50"></a> workpath <span class="op">=</span> <span class="st">"workpath"</span></span> <span><a aria-hidden="true" href="#cb31-51"></a> html_doc <span class="op">=</span> <span class="st">"html_doc"</span></span> <span><a aria-hidden="true" href="#cb31-52"></a> draft <span class="op">=</span> <span class="st">"draft"</span></span> <span><a aria-hidden="true" href="#cb31-53"></a></span> <span><a aria-hidden="true" href="#cb31-54"></a> <span class="kw">def</span> <span class="fu">__init__</span>(<span class="va">self</span>, pattern):</span> <span><a aria-hidden="true" href="#cb31-55"></a> <span class="bu">super</span>().<span class="fu">__init__</span>(pattern)</span> <span><a aria-hidden="true" href="#cb31-56"></a> <span class="va">self</span>.values <span class="op">=</span> {}</span> <span><a aria-hidden="true" href="#cb31-57"></a></span> <span><a aria-hidden="true" href="#cb31-58"></a> <span class="kw">def</span> process(<span class="va">self</span>):</span> <span><a aria-hidden="true" href="#cb31-59"></a> <span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb31-60"></a><span class="co"> Create a PDF file for the HTML.</span></span> <span><a aria-hidden="true" href="#cb31-61"></a></span> <span><a aria-hidden="true" href="#cb31-62"></a><span class="co"> Parameters</span></span> <span><a aria-hidden="true" href="#cb31-63"></a><span class="co"> ----------</span></span> <span><a aria-hidden="true" href="#cb31-64"></a><span class="co"> html_doc: Type String of HTML</span></span> <span><a aria-hidden="true" href="#cb31-65"></a></span> <span><a aria-hidden="true" href="#cb31-66"></a><span class="co"> workpath: Type Path, Folder of html file location (planned or factual)</span></span> <span><a aria-hidden="true" href="#cb31-67"></a></span> <span><a aria-hidden="true" href="#cb31-68"></a><span class="co"> Converts the HTML provived as String containing an article with updated</span></span> <span><a aria-hidden="true" href="#cb31-69"></a><span class="co"> publishing date into PDF.</span></span> <span><a aria-hidden="true" href="#cb31-70"></a></span> <span><a aria-hidden="true" href="#cb31-71"></a><span class="co"> It might be a draft for a new plain html or it might be</span></span> <span><a aria-hidden="true" href="#cb31-72"></a><span class="co"> publishing version with updated publishing date but still</span></span> <span><a aria-hidden="true" href="#cb31-73"></a><span class="co"> without portal injection.</span></span> <span><a aria-hidden="true" href="#cb31-74"></a></span> <span><a aria-hidden="true" href="#cb31-75"></a><span class="co"> This makes no difference for the processing result.</span></span> <span><a aria-hidden="true" href="#cb31-76"></a></span> <span><a aria-hidden="true" href="#cb31-77"></a><span class="co"> Returns</span></span> <span><a aria-hidden="true" href="#cb31-78"></a><span class="co"> -------</span></span> <span><a aria-hidden="true" href="#cb31-79"></a><span class="co"> None.</span></span> <span><a aria-hidden="true" href="#cb31-80"></a></span> <span><a aria-hidden="true" href="#cb31-81"></a><span class="co"> Implementation Notes</span></span> <span><a aria-hidden="true" href="#cb31-82"></a><span class="co"> --------------------</span></span> <span><a aria-hidden="true" href="#cb31-83"></a><span class="co"> The PDF generation fails, if pictures in tables are embedded inside</span></span> <span><a aria-hidden="true" href="#cb31-84"></a><span class="co"> of a figure tag. To address this, we have to open the html file,</span></span> <span><a aria-hidden="true" href="#cb31-85"></a><span class="co"> look for figures inside of tables, and remove the figure without</span></span> <span><a aria-hidden="true" href="#cb31-86"></a><span class="co"> removing the figures content.</span></span> <span><a aria-hidden="true" href="#cb31-87"></a></span> <span><a aria-hidden="true" href="#cb31-88"></a><span class="co"> Then we need to save the result in a temporary file and tell</span></span> <span><a aria-hidden="true" href="#cb31-89"></a><span class="co"> pandoc the correct workdirectory for the successful resolutiin</span></span> <span><a aria-hidden="true" href="#cb31-90"></a><span class="co"> of relative pathes in href and src entries in the html.</span></span> <span><a aria-hidden="true" href="#cb31-91"></a><span class="co"> """</span></span> <span><a aria-hidden="true" href="#cb31-92"></a> html_doc <span class="op">=</span> <span class="va">self</span>.item[PdfWorker.html_doc]</span> <span><a aria-hidden="true" href="#cb31-93"></a> workpath <span class="op">=</span> <span class="va">self</span>.item[PdfWorker.workpath]</span> <span><a aria-hidden="true" href="#cb31-94"></a> draft <span class="op">=</span> <span class="va">self</span>.item[PdfWorker.draft]</span> <span><a aria-hidden="true" href="#cb31-95"></a></span> <span><a aria-hidden="true" href="#cb31-96"></a> builder <span class="op">=</span> HTMLParserTreeBuilder()</span> <span><a aria-hidden="true" href="#cb31-97"></a> soup <span class="op">=</span> BeautifulSoup(html_doc, builder<span class="op">=</span>builder)</span> <span><a aria-hidden="true" href="#cb31-98"></a></span> <span><a aria-hidden="true" href="#cb31-99"></a> title <span class="op">=</span> soup.find(<span class="st">"title"</span>)</span> <span><a aria-hidden="true" href="#cb31-100"></a></span> <span><a aria-hidden="true" href="#cb31-101"></a> <span class="va">self</span>.outpath <span class="op">=</span> gmc.pdfpath <span class="op">/</span> (<span class="va">self</span>.item[PdfWorker.urn] <span class="op">+</span> <span class="st">".pdf"</span>)</span> <span><a aria-hidden="true" href="#cb31-102"></a> <span class="va">self</span>.outpath <span class="op">=</span> <span class="va">self</span>.outpath.resolve()</span> <span><a aria-hidden="true" href="#cb31-103"></a> workpath <span class="op">=</span> workpath.resolve()</span> <span><a aria-hidden="true" href="#cb31-104"></a></span> <span><a aria-hidden="true" href="#cb31-105"></a> <span class="cf">if</span> draft:</span> <span><a aria-hidden="true" href="#cb31-106"></a> newtitle <span class="op">=</span> title.text.strip() <span class="op">+</span> <span class="st">" - DRAFT"</span></span> <span><a aria-hidden="true" href="#cb31-107"></a> title.clear()</span> <span><a aria-hidden="true" href="#cb31-108"></a> title.append(newtitle)</span> <span><a aria-hidden="true" href="#cb31-109"></a></span> <span><a aria-hidden="true" href="#cb31-110"></a> <span class="co"># First we need to remove some things.</span></span> <span><a aria-hidden="true" href="#cb31-111"></a></span> <span><a aria-hidden="true" href="#cb31-112"></a> <span class="co"># The article header</span></span> <span><a aria-hidden="true" href="#cb31-113"></a> tag <span class="op">=</span> soup.find(<span class="st">"article"</span>)</span> <span><a aria-hidden="true" href="#cb31-114"></a> header <span class="op">=</span> tag.find(<span class="st">"header"</span>)</span> <span><a aria-hidden="true" href="#cb31-115"></a></span> <span><a aria-hidden="true" href="#cb31-116"></a> <span class="co"># tags = header.find_all("figcaption")</span></span> <span><a aria-hidden="true" href="#cb31-117"></a> <span class="co"># for tag in tags:</span></span> <span><a aria-hidden="true" href="#cb31-118"></a> <span class="co"># tag.decompose()</span></span> <span><a aria-hidden="true" href="#cb31-119"></a></span> <span><a aria-hidden="true" href="#cb31-120"></a> tags <span class="op">=</span> header.find_all(<span class="st">"figure"</span>)</span> <span><a aria-hidden="true" href="#cb31-121"></a> <span class="cf">if</span> <span class="bu">len</span>(tags) <span class="op">==</span> <span class="dv">3</span>:</span> <span><a aria-hidden="true" href="#cb31-122"></a> tags[<span class="dv">2</span>].decompose() <span class="co"># remove audio</span></span> <span><a aria-hidden="true" href="#cb31-123"></a> <span class="cf">if</span> <span class="bu">len</span>(tags) <span class="op">&gt;</span> <span class="dv">1</span>:</span> <span><a aria-hidden="true" href="#cb31-124"></a> tags[<span class="dv">1</span>].decompose() <span class="co"># remove PDF Icon in the PDF Version</span></span> <span><a aria-hidden="true" href="#cb31-125"></a> <span class="co"># if len(tags) &gt; 0:</span></span> <span><a aria-hidden="true" href="#cb31-126"></a> <span class="co"># # size the qrcode picture</span></span> <span><a aria-hidden="true" href="#cb31-127"></a> <span class="co"># tag = tags[0].find("img")</span></span> <span><a aria-hidden="true" href="#cb31-128"></a> <span class="co"># tag.attrs.update({</span></span> <span><a aria-hidden="true" href="#cb31-129"></a> <span class="co"># "height": "80px",</span></span> <span><a aria-hidden="true" href="#cb31-130"></a> <span class="co"># "width": "80px"</span></span> <span><a aria-hidden="true" href="#cb31-131"></a> <span class="co"># })</span></span> <span><a aria-hidden="true" href="#cb31-132"></a> <span class="co"># tags[0].unwrap()</span></span> <span><a aria-hidden="true" href="#cb31-133"></a></span> <span><a aria-hidden="true" href="#cb31-134"></a> <span class="co"># figures in tables do not work in pandoc</span></span> <span><a aria-hidden="true" href="#cb31-135"></a> <span class="co"># tables = soup.find_all("table")</span></span> <span><a aria-hidden="true" href="#cb31-136"></a> <span class="co"># for table in tables:</span></span> <span><a aria-hidden="true" href="#cb31-137"></a> <span class="co"># tags = table.find_all("figcaption")</span></span> <span><a aria-hidden="true" href="#cb31-138"></a> <span class="co"># for tag in tags:</span></span> <span><a aria-hidden="true" href="#cb31-139"></a> <span class="co"># tag.unwrap()</span></span> <span><a aria-hidden="true" href="#cb31-140"></a> <span class="co"># tags = table.find_all("figure")</span></span> <span><a aria-hidden="true" href="#cb31-141"></a> <span class="co"># for tag in tags:</span></span> <span><a aria-hidden="true" href="#cb31-142"></a> <span class="co"># tag.unwrap()</span></span> <span><a aria-hidden="true" href="#cb31-143"></a></span> <span><a aria-hidden="true" href="#cb31-144"></a> <span class="co"># tables = soup.find_all("table")</span></span> <span><a aria-hidden="true" href="#cb31-145"></a> <span class="co"># for table in tables:</span></span> <span><a aria-hidden="true" href="#cb31-146"></a> <span class="co"># figs = table.find_all("figure")</span></span> <span><a aria-hidden="true" href="#cb31-147"></a> <span class="co"># for fig in figs:</span></span> <span><a aria-hidden="true" href="#cb31-148"></a> <span class="co"># figcap = fig.find("figcaption")</span></span> <span><a aria-hidden="true" href="#cb31-149"></a> <span class="co"># if figcap:</span></span> <span><a aria-hidden="true" href="#cb31-150"></a> <span class="co"># figcap.unwrap()</span></span> <span><a aria-hidden="true" href="#cb31-151"></a> <span class="co"># fig.unwrap()</span></span> <span><a aria-hidden="true" href="#cb31-152"></a></span> <span><a aria-hidden="true" href="#cb31-153"></a> <span class="co"># headers = soup.find_all("header")</span></span> <span><a aria-hidden="true" href="#cb31-154"></a> <span class="co"># for header in headers:</span></span> <span><a aria-hidden="true" href="#cb31-155"></a> <span class="co"># figs = header.find_all("figure")</span></span> <span><a aria-hidden="true" href="#cb31-156"></a> <span class="co"># for fig in figs:</span></span> <span><a aria-hidden="true" href="#cb31-157"></a> <span class="co"># figcap = fig.find("figcaption")</span></span> <span><a aria-hidden="true" href="#cb31-158"></a> <span class="co"># if figcap:</span></span> <span><a aria-hidden="true" href="#cb31-159"></a> <span class="co"># figcap.unwrap()</span></span> <span><a aria-hidden="true" href="#cb31-160"></a> <span class="co"># fig.unwrap()</span></span> <span><a aria-hidden="true" href="#cb31-161"></a></span> <span><a aria-hidden="true" href="#cb31-162"></a> <span class="co"># We need to change relative paths to own articles into absolute</span></span> <span><a aria-hidden="true" href="#cb31-163"></a> <span class="co"># paths.</span></span> <span><a aria-hidden="true" href="#cb31-164"></a> rhref <span class="op">=</span> re.<span class="bu">compile</span>(<span class="vs">r"^\.\/"</span>)</span> <span><a aria-hidden="true" href="#cb31-165"></a> anchors <span class="op">=</span> soup.find_all(<span class="st">"a"</span>, href<span class="op">=</span>rhref)</span> <span><a aria-hidden="true" href="#cb31-166"></a> <span class="cf">for</span> anchor <span class="kw">in</span> anchors:</span> <span><a aria-hidden="true" href="#cb31-167"></a> url <span class="op">=</span> rhref.sub(<span class="st">"https://idee.frank-siebert.de/article/"</span>,</span> <span><a aria-hidden="true" href="#cb31-168"></a> anchor[<span class="st">"href"</span>])</span> <span><a aria-hidden="true" href="#cb31-169"></a> anchor.attrs.update({<span class="st">"href"</span>: url})</span> <span><a aria-hidden="true" href="#cb31-170"></a></span> <span><a aria-hidden="true" href="#cb31-171"></a> <span class="co"># On paper we need complete written URLs</span></span> <span><a aria-hidden="true" href="#cb31-172"></a> rhref <span class="op">=</span> re.<span class="bu">compile</span>(<span class="vs">r"^http.*"</span>)</span> <span><a aria-hidden="true" href="#cb31-173"></a> tag <span class="op">=</span> soup.find(<span class="st">"section"</span>, class_<span class="op">=</span><span class="st">"footnotes"</span>)</span> <span><a aria-hidden="true" href="#cb31-174"></a></span> <span><a aria-hidden="true" href="#cb31-175"></a> <span class="cf">if</span> tag:</span> <span><a aria-hidden="true" href="#cb31-176"></a> anchors <span class="op">=</span> tag.find_all(<span class="st">"a"</span>, href<span class="op">=</span>rhref)</span> <span><a aria-hidden="true" href="#cb31-177"></a> <span class="cf">for</span> anchor <span class="kw">in</span> anchors:</span> <span><a aria-hidden="true" href="#cb31-178"></a> url <span class="op">=</span> anchor[<span class="st">"href"</span>]</span> <span><a aria-hidden="true" href="#cb31-179"></a> anchor.parent.append(soup.new_tag(<span class="st">"br"</span>))</span> <span><a aria-hidden="true" href="#cb31-180"></a> anchor.parent.append(url)</span> <span><a aria-hidden="true" href="#cb31-181"></a></span> <span><a aria-hidden="true" href="#cb31-182"></a> csspath <span class="op">=</span> Path(<span class="vs">r"/home/frank/projects/idee/website/css/fspdf.css"</span>)</span> <span><a aria-hidden="true" href="#cb31-183"></a> csspath.resolve()</span> <span><a aria-hidden="true" href="#cb31-184"></a> <span class="co"># if csspath.exists():</span></span> <span><a aria-hidden="true" href="#cb31-185"></a> <span class="co"># print("css exists")</span></span> <span><a aria-hidden="true" href="#cb31-186"></a></span> <span><a aria-hidden="true" href="#cb31-187"></a> html_doc <span class="op">=</span> soup.prettify()</span> <span><a aria-hidden="true" href="#cb31-188"></a></span> <span><a aria-hidden="true" href="#cb31-189"></a> weasy_html <span class="op">=</span> HTML(string<span class="op">=</span>html_doc, base_url<span class="op">=</span><span class="bu">str</span>(workpath))</span> <span><a aria-hidden="true" href="#cb31-190"></a> weasy_html.write_pdf(target<span class="op">=</span><span class="va">self</span>.outpath,</span> <span><a aria-hidden="true" href="#cb31-191"></a> stylesheets<span class="op">=</span>[CSS(filename<span class="op">=</span><span class="bu">str</span>(csspath))]</span> <span><a aria-hidden="true" href="#cb31-192"></a> )</span> <span><a aria-hidden="true" href="#cb31-193"></a></span> <span><a aria-hidden="true" href="#cb31-194"></a> <span class="co"># subprocess.run(["pandoc",</span></span> <span><a aria-hidden="true" href="#cb31-195"></a> <span class="co"># # mediawiki markup as input format</span></span> <span><a aria-hidden="true" href="#cb31-196"></a> <span class="co"># "-f", "html",</span></span> <span><a aria-hidden="true" href="#cb31-197"></a> <span class="co"># # html as output forma</span></span> <span><a aria-hidden="true" href="#cb31-198"></a> <span class="co"># "-t", "pdf",</span></span> <span><a aria-hidden="true" href="#cb31-199"></a> <span class="co"># # input file</span></span> <span><a aria-hidden="true" href="#cb31-200"></a> <span class="co"># # "-i", inpath,</span></span> <span><a aria-hidden="true" href="#cb31-201"></a> <span class="co"># # output file</span></span> <span><a aria-hidden="true" href="#cb31-202"></a> <span class="co"># "-o", self.outpath,</span></span> <span><a aria-hidden="true" href="#cb31-203"></a> <span class="co"># # "--pdf-engine=xelatex",</span></span> <span><a aria-hidden="true" href="#cb31-204"></a> <span class="co"># "--pdf-engine=weasyprint",</span></span> <span><a aria-hidden="true" href="#cb31-205"></a> <span class="co"># "--variable=mainfont:Liberation Sans",</span></span> <span><a aria-hidden="true" href="#cb31-206"></a> <span class="co"># "--variable=sansfont:Liberation Sans",</span></span> <span><a aria-hidden="true" href="#cb31-207"></a> <span class="co"># "--variable=monofont:Liberation Mono",</span></span> <span><a aria-hidden="true" href="#cb31-208"></a> <span class="co"># "--css", csspath,</span></span> <span><a aria-hidden="true" href="#cb31-209"></a> <span class="co"># # "--variable=mainfont:DejaVu Serif",</span></span> <span><a aria-hidden="true" href="#cb31-210"></a> <span class="co"># # "--variable=sansfont:DejaVu Sans",</span></span> <span><a aria-hidden="true" href="#cb31-211"></a> <span class="co"># # "--variable=monofont:DejaVu Sans Mono",</span></span> <span><a aria-hidden="true" href="#cb31-212"></a> <span class="co"># # "--variable=geometry:a4paper",</span></span> <span><a aria-hidden="true" href="#cb31-213"></a> <span class="co"># # "--variable=geometry:margin=2.5cm",</span></span> <span><a aria-hidden="true" href="#cb31-214"></a> <span class="co"># # "--variable=linkcolor:blue"</span></span> <span><a aria-hidden="true" href="#cb31-215"></a> <span class="co"># ],</span></span> <span><a aria-hidden="true" href="#cb31-216"></a> <span class="co"># capture_output=False,</span></span> <span><a aria-hidden="true" href="#cb31-217"></a> <span class="co"># # the correct workdirectory to find the images</span></span> <span><a aria-hidden="true" href="#cb31-218"></a> <span class="co"># cwd=workpath,</span></span> <span><a aria-hidden="true" href="#cb31-219"></a> <span class="co"># # html string as stdin</span></span> <span><a aria-hidden="true" href="#cb31-220"></a> <span class="co"># input=html_doc.encode("utf-8"))</span></span> <span><a aria-hidden="true" href="#cb31-221"></a></span> <span><a aria-hidden="true" href="#cb31-222"></a> <span class="co"># print('wrote file {0}'.format(self.outpath))</span></span> <span><a aria-hidden="true" href="#cb31-223"></a> <span class="co"># subprocess.run(["firefox", pdfpath], capture_output=False)</span></span> <span><a aria-hidden="true" href="#cb31-224"></a></span> <span><a aria-hidden="true" href="#cb31-225"></a> <span class="kw">def</span> delete(<span class="va">self</span>):</span> <span><a aria-hidden="true" href="#cb31-226"></a> <span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb31-227"></a><span class="co"> Delete the generated HTML.</span></span> <span><a aria-hidden="true" href="#cb31-228"></a></span> <span><a aria-hidden="true" href="#cb31-229"></a><span class="co"> Resources used by the HTML need additional care.</span></span> <span><a aria-hidden="true" href="#cb31-230"></a><span class="co"> If the delete was triggered by rename, no resources have to be deleted.</span></span> <span><a aria-hidden="true" href="#cb31-231"></a><span class="co"> If it was triggered by a delete, a check is required,</span></span> <span><a aria-hidden="true" href="#cb31-232"></a><span class="co"> whether the resources are used by other pages as well.</span></span> <span><a aria-hidden="true" href="#cb31-233"></a><span class="co"> But resources are place anyhow in the final website location.</span></span> <span><a aria-hidden="true" href="#cb31-234"></a><span class="co"> They must not be deleted by the MwWorker.</span></span> <span><a aria-hidden="true" href="#cb31-235"></a><span class="co"> """</span></span> <span><a aria-hidden="true" href="#cb31-236"></a></span> <span><a aria-hidden="true" href="#cb31-237"></a> <span class="at">@staticmethod</span></span> <span><a aria-hidden="true" href="#cb31-238"></a> <span class="kw">def</span> make_pdf_worklist_item(urn, html_doc, workpath, task_type,</span> <span><a aria-hidden="true" href="#cb31-239"></a> draft<span class="op">=</span><span class="va">False</span>):</span> <span><a aria-hidden="true" href="#cb31-240"></a> <span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb31-241"></a><span class="co"> Create a worklist item for the PdfWorker.</span></span> <span><a aria-hidden="true" href="#cb31-242"></a></span> <span><a aria-hidden="true" href="#cb31-243"></a><span class="co"> Parameters</span></span> <span><a aria-hidden="true" href="#cb31-244"></a><span class="co"> ----------</span></span> <span><a aria-hidden="true" href="#cb31-245"></a><span class="co"> title : str</span></span> <span><a aria-hidden="true" href="#cb31-246"></a><span class="co"> Title of the article</span></span> <span><a aria-hidden="true" href="#cb31-247"></a><span class="co"> urn: str</span></span> <span><a aria-hidden="true" href="#cb31-248"></a><span class="co"> The unique resource name, also stem of the related files</span></span> <span><a aria-hidden="true" href="#cb31-249"></a><span class="co"> html_doc : str</span></span> <span><a aria-hidden="true" href="#cb31-250"></a><span class="co"> The generated HTML to transform.</span></span> <span><a aria-hidden="true" href="#cb31-251"></a><span class="co"> workpath : TYPE</span></span> <span><a aria-hidden="true" href="#cb31-252"></a><span class="co"> Where to work to have the relative links right.</span></span> <span><a aria-hidden="true" href="#cb31-253"></a><span class="co"> task_type : str, optional</span></span> <span><a aria-hidden="true" href="#cb31-254"></a><span class="co"> One of MsgWorker.task_*</span></span> <span><a aria-hidden="true" href="#cb31-255"></a><span class="co"> draft : TYPE, optional</span></span> <span><a aria-hidden="true" href="#cb31-256"></a><span class="co"> Flag whether this is a PDF draft work item. The default is False.</span></span> <span><a aria-hidden="true" href="#cb31-257"></a></span> <span><a aria-hidden="true" href="#cb31-258"></a><span class="co"> Returns</span></span> <span><a aria-hidden="true" href="#cb31-259"></a><span class="co"> -------</span></span> <span><a aria-hidden="true" href="#cb31-260"></a><span class="co"> None.</span></span> <span><a aria-hidden="true" href="#cb31-261"></a><span class="co"> """</span></span> <span><a aria-hidden="true" href="#cb31-262"></a> <span class="cf">return</span> {</span> <span><a aria-hidden="true" href="#cb31-263"></a> MsgWorker.task_worker_match: PdfWorker.pdfworkitem,</span> <span><a aria-hidden="true" href="#cb31-264"></a> PdfWorker.urn: urn,</span> <span><a aria-hidden="true" href="#cb31-265"></a> PdfWorker.html_doc: html_doc,</span> <span><a aria-hidden="true" href="#cb31-266"></a> PdfWorker.workpath: workpath,</span> <span><a aria-hidden="true" href="#cb31-267"></a> MsgWorker.task_type: task_type,</span> <span><a aria-hidden="true" href="#cb31-268"></a> PdfWorker.draft: draft</span> <span><a aria-hidden="true" href="#cb31-269"></a> }</span> <span><a aria-hidden="true" href="#cb31-270"></a></span> <span><a aria-hidden="true" href="#cb31-271"></a></span> <span><a aria-hidden="true" href="#cb31-272"></a><span class="cf">if</span> <span class="va">__name__</span> <span class="op">==</span> <span class="st">"__main__"</span>:</span> <span><a aria-hidden="true" href="#cb31-273"></a> <span class="cf">pass</span></span></code></pre> </div> <h4> The PDF Style Sheet </h4> <p> <strong> ~/projects/idee/website/css/fspdf.css </strong> </p> <div class="sourceCode"> <pre class="sourceCode CSS"><code class="sourceCode css"><span><a aria-hidden="true" href="#cb32-1"></a><span class="co">/* ***************************************************************************</span></span> <span><a aria-hidden="true" href="#cb32-2"></a><span class="co"> * Frank Siebert's PDF CSS </span></span> <span><a aria-hidden="true" href="#cb32-3"></a><span class="co"> +</span></span> <span><a aria-hidden="true" href="#cb32-4"></a><span class="co"> * Licence: CC0 </span></span> <span><a aria-hidden="true" href="#cb32-5"></a><span class="co"> * httpx://frank-siebert.de/article/creative-commons-cc0-1-0-universal.html </span></span> <span><a aria-hidden="true" href="#cb32-6"></a><span class="co"> * ***************************************************************************/</span></span> <span><a aria-hidden="true" href="#cb32-7"></a></span> <span><a aria-hidden="true" href="#cb32-8"></a>html {</span> <span><a aria-hidden="true" href="#cb32-9"></a> <span class="kw">font-family</span>: Liberation Sans<span class="op">,</span> <span class="dv">sans-serif</span> <span class="at">!important</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb32-10"></a> <span class="kw">font</span>: <span class="dv">12</span><span class="dt">px</span>/<span class="dv">1.4</span> Liberation Sans<span class="op">,</span> <span class="dv">sans-serif</span> <span class="at">!important</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb32-11"></a> <span class="kw">background-color</span>: <span class="cn">#ffffff</span> <span class="at">!important</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb32-12"></a>}</span> <span><a aria-hidden="true" href="#cb32-13"></a></span> <span><a aria-hidden="true" href="#cb32-14"></a><span class="im">@page</span> {</span> <span><a aria-hidden="true" href="#cb32-15"></a> <span class="kw">size</span>: A4<span class="op">;</span> <span class="co">/* Change from the default size of A4 */</span></span> <span><a aria-hidden="true" href="#cb32-16"></a> <span class="kw">margin</span>: <span class="dv">1.5</span><span class="dt">cm</span><span class="op">;</span> <span class="co">/* Set margin on each page */</span></span> <span><a aria-hidden="true" href="#cb32-17"></a></span> <span><a aria-hidden="true" href="#cb32-18"></a> <span class="im">@top-right</span> {</span> <span><a aria-hidden="true" href="#cb32-19"></a> <span class="kw">content</span>: counter<span class="fu">(</span>page<span class="fu">)</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb32-20"></a> <span class="kw">color</span>: <span class="cn">#006080</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb32-21"></a> <span class="kw">font-size</span>: <span class="dv">1.2</span><span class="dt">em</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb32-22"></a> }</span> <span><a aria-hidden="true" href="#cb32-23"></a></span> <span><a aria-hidden="true" href="#cb32-24"></a> <span class="im">@top-left</span> {</span> <span><a aria-hidden="true" href="#cb32-25"></a> <span class="kw">content</span>: string<span class="fu">(</span>pageheader<span class="fu">)</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb32-26"></a> <span class="kw">color</span>: <span class="cn">#006080</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb32-27"></a> <span class="kw">font-size</span>: <span class="dv">1.2</span><span class="dt">em</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb32-28"></a> }</span> <span><a aria-hidden="true" href="#cb32-29"></a>}</span> <span><a aria-hidden="true" href="#cb32-30"></a></span> <span><a aria-hidden="true" href="#cb32-31"></a>header h1 {</span> <span><a aria-hidden="true" href="#cb32-32"></a> <span class="kw">string-set</span>: pageheader content<span class="fu">()</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb32-33"></a>}</span> <span><a aria-hidden="true" href="#cb32-34"></a></span> <span><a aria-hidden="true" href="#cb32-35"></a>article header div figure img { <span class="kw">width</span>: <span class="dv">150</span><span class="dt">px</span> <span class="at">!important</span><span class="op">;</span> }</span> <span><a aria-hidden="true" href="#cb32-36"></a></span> <span><a aria-hidden="true" href="#cb32-37"></a><span class="co">/* **************************************************************************</span></span> <span><a aria-hidden="true" href="#cb32-38"></a><span class="co"> * ==== syntaxhighlight ====</span></span> <span><a aria-hidden="true" href="#cb32-39"></a><span class="co"> * **************************************************************************/</span></span> <span><a aria-hidden="true" href="#cb32-40"></a></span> <span><a aria-hidden="true" href="#cb32-41"></a><span class="co">/* Allow only intentional line breaks in source code */</span></span> <span><a aria-hidden="true" href="#cb32-42"></a>pre <span class="op">&gt;</span> code<span class="fu">.sourceCode</span> <span class="op">&gt;</span> span {</span> <span><a aria-hidden="true" href="#cb32-43"></a> <span class="kw">white-space</span>: <span class="dv">nowrap</span> <span class="at">!important</span><span class="op">;</span></span> <span><a aria-hidden="true" href="#cb32-44"></a>}</span> <span><a aria-hidden="true" href="#cb32-45"></a></span> <span><a aria-hidden="true" href="#cb32-46"></a>pre<span class="fu">.sourceCode</span> {</span> <span><a aria-hidden="true" href="#cb32-47"></a> <span class="kw">width</span>: <span class="dv">80</span><span class="dt">ch</span> <span class="at">!important</span><span class="op">;</span> <span class="co">/* classic terminal width for code sections */</span></span> <span><a aria-hidden="true" href="#cb32-48"></a>}</span></code></pre> </div> <h4> HTML to PDF Recapitulation </h4> <p> At this point of the implementation the PDFWorker related parts no longer need to be commented out. And you can request in the commit message the generation of a draftPDF, if you commit a new MediaWiki file. </p> <p> Note that this is only a chapter in a much longer description. </p> <p> see </p> <p> <onlyinclude> === Plain HTML to Portal Page Conversion === We HTML, we have a PDF, it is time to create an article page, which is ready to be used as portal page. </onlyinclude> </p> <p> The portal page contains: </p> <ul class="incremental"> <li> A QRCode poining to its own URL </li> <li> A PDF <ul class="incremental"> <li> For low content pages PDF generation can be suppressed. </li> </ul> </li> <li> License Information <ul class="incremental"> <li> For low content pages License Information can be suppressed. </li> </ul> </li> <li> Audio controls if an an audio was created. </li> <li> The portal header </li> </ul> <p> The audio is not generated, it needs to recorded and saved in the folder <strong> ~/projects/idee/website/audio/ </strong> with the same filename as computed for the plain HTML file, but with extension mpg. </p> <p> Note to myself: Consider to allow ogg as alternative extension. </p> <p> The license information is given by license icons, linking to an article text about the license. Apart of the code to place the icon and to link to article this part is mainly content. </p> <p> The only missing pieces are the qrcode generator, but that exists as ready to use Python module, and the portal header to be integrated. </p> <p> The original plan was to include the portal HTML fragment into the article HTML file, because HTML does not support any includes, even not with same origin policy. Luckily I discovered that the web server nginx supports such includes on the server side. The respective include instruction is already included in the plain HTML version. </p> <h4> Portal Header </h4> <p> The Portal Header is an HTML fragment file. </p> <p> <strong> ~/projects/idee/website/portal/idee-portal.html </strong> </p> <div class="sourceCode"> <pre class="sourceCode HTML"><code class="sourceCode html"><span><a aria-hidden="true" href="#cb33-1"></a> <span class="kw">&lt;header&gt;</span></span> <span><a aria-hidden="true" href="#cb33-2"></a> <span class="kw">&lt;figure&gt;</span></span> <span><a aria-hidden="true" href="#cb33-3"></a> <span class="kw">&lt;a</span><span class="ot"> href=</span><span class="st">"/idee-index.html"</span><span class="ot"> alt=</span><span class="st">"Home"</span><span class="ot"> tabindex=</span><span class="st">"1"</span><span class="kw">&gt;</span></span> <span><a aria-hidden="true" href="#cb33-4"></a> <span class="kw">&lt;img</span><span class="ot"> src=</span><span class="st">"../image/bookpress.jpg"</span><span class="ot"> alt=</span><span class="st">"Idee der eigenen Erkenntnis"</span> </span> <span><a aria-hidden="true" href="#cb33-5"></a><span class="ot"> srcset=</span><span class="st">"../image/bookpress.jpg 1600w, </span></span> <span><a aria-hidden="true" href="#cb33-6"></a><span class="st"> ../image/bookpress-300x43.jpg 300w, </span></span> <span><a aria-hidden="true" href="#cb33-7"></a><span class="st"> ../image/bookpress-768x110.jpg 768w,</span></span> <span><a aria-hidden="true" href="#cb33-8"></a><span class="st"> ../image/bookpress-1024x147.jpg 1024w, </span></span> <span><a aria-hidden="true" href="#cb33-9"></a><span class="st"> ../image/bookpress-1568x225.jpg 1568w"</span> </span> <span><a aria-hidden="true" href="#cb33-10"></a><span class="ot"> sizes=</span><span class="st">"(max-width: 1600px) 100vw, 1600px"</span><span class="ot"> width=</span><span class="st">"1600"</span><span class="ot"> height=</span><span class="st">"auto"</span><span class="kw">/&gt;</span></span> <span><a aria-hidden="true" href="#cb33-11"></a> <span class="kw">&lt;/a&gt;</span></span> <span><a aria-hidden="true" href="#cb33-12"></a> <span class="kw">&lt;figcaption&gt;</span></span> <span><a aria-hidden="true" href="#cb33-13"></a> Idee der eigenen Erkenntnis</span> <span><a aria-hidden="true" href="#cb33-14"></a> <span class="kw">&lt;/figcaption&gt;</span></span> <span><a aria-hidden="true" href="#cb33-15"></a> <span class="kw">&lt;/figure&gt;</span></span> <span><a aria-hidden="true" href="#cb33-16"></a> <span class="kw">&lt;nav&gt;</span></span> <span><a aria-hidden="true" href="#cb33-17"></a> <span class="kw">&lt;form</span><span class="ot"> action=</span><span class="st">"../yacysearch.html"</span><span class="ot"> accept-charset=</span><span class="st">"UTF-8"</span><span class="ot"> method=</span><span class="st">"get"</span><span class="kw">&gt;</span></span> <span><a aria-hidden="true" href="#cb33-18"></a> <span class="kw">&lt;input</span><span class="ot"> type=</span><span class="st">"text"</span><span class="ot"> name=</span><span class="st">"query"</span><span class="ot"> placeholder=</span><span class="st">"Suche.."</span><span class="ot"> maxlength=</span><span class="st">"80"</span></span> <span><a aria-hidden="true" href="#cb33-19"></a><span class="ot"> autocomplete=</span><span class="st">"off"</span><span class="ot"> tabindex=</span><span class="st">"2"</span><span class="kw">/&gt;</span></span> <span><a aria-hidden="true" href="#cb33-20"></a> <span class="kw">&lt;input</span><span class="ot"> type=</span><span class="st">"hidden"</span><span class="ot"> name=</span><span class="st">"verify"</span><span class="ot"> value=</span><span class="st">"cacheonly"</span> <span class="kw">/&gt;</span></span> <span><a aria-hidden="true" href="#cb33-21"></a> <span class="kw">&lt;input</span><span class="ot"> type=</span><span class="st">"hidden"</span><span class="ot"> name=</span><span class="st">"maximumRecords"</span><span class="ot"> value=</span><span class="st">"10"</span> <span class="kw">/&gt;</span></span> <span><a aria-hidden="true" href="#cb33-22"></a> <span class="kw">&lt;input</span><span class="ot"> type=</span><span class="st">"hidden"</span><span class="ot"> name=</span><span class="st">"meanCount"</span><span class="ot"> value=</span><span class="st">"5"</span> <span class="kw">/&gt;</span></span> <span><a aria-hidden="true" href="#cb33-23"></a> <span class="kw">&lt;input</span><span class="ot"> type=</span><span class="st">"hidden"</span><span class="ot"> name=</span><span class="st">"resource"</span><span class="ot"> value=</span><span class="st">"local"</span> <span class="kw">/&gt;</span></span> <span><a aria-hidden="true" href="#cb33-24"></a> <span class="kw">&lt;input</span><span class="ot"> type=</span><span class="st">"hidden"</span><span class="ot"> name=</span><span class="st">"urlmaskfilter"</span><span class="ot"> value=</span><span class="st">".*"</span> <span class="kw">/&gt;</span></span> <span><a aria-hidden="true" href="#cb33-25"></a> <span class="kw">&lt;input</span><span class="ot"> type=</span><span class="st">"hidden"</span><span class="ot"> name=</span><span class="st">"prefermaskfilter"</span><span class="ot"> value=</span><span class="st">""</span> <span class="kw">/&gt;</span></span> <span><a aria-hidden="true" href="#cb33-26"></a> <span class="kw">&lt;input</span><span class="ot"> type=</span><span class="st">"hidden"</span><span class="ot"> name=</span><span class="st">"display"</span><span class="ot"> value=</span><span class="st">"2"</span> <span class="kw">/&gt;</span></span> <span><a aria-hidden="true" href="#cb33-27"></a> <span class="kw">&lt;input</span><span class="ot"> type=</span><span class="st">"hidden"</span><span class="ot"> name=</span><span class="st">"nav"</span><span class="ot"> value=</span><span class="st">"all"</span> <span class="kw">/&gt;</span></span> <span><a aria-hidden="true" href="#cb33-28"></a> <span class="kw">&lt;input</span><span class="ot"> type=</span><span class="st">"submit"</span><span class="ot"> name=</span><span class="st">"Enter"</span><span class="ot"> value=</span><span class="st">"Search"</span><span class="ot"> title=</span><span class="st">"Suche"</span></span> <span><a aria-hidden="true" href="#cb33-29"></a><span class="ot"> alt=</span><span class="st">"Suche"</span><span class="ot"> hidden</span><span class="kw">&gt;</span></span> <span><a aria-hidden="true" href="#cb33-30"></a> <span class="dv">&amp;nbsp;</span></span> <span><a aria-hidden="true" href="#cb33-31"></a> <span class="kw">&lt;/form&gt;</span></span> <span><a aria-hidden="true" href="#cb33-32"></a> <span class="kw">&lt;a</span><span class="ot"> href=</span><span class="st">"../idee-rss.xml"</span><span class="ot"> tabindex=</span><span class="st">"3"</span><span class="kw">&gt;</span></span> <span><a aria-hidden="true" href="#cb33-33"></a> <span class="kw">&lt;img</span><span class="ot"> src=</span><span class="st">"../image/RSS.png"</span><span class="ot"> alt=</span><span class="st">"RSS-Feed"</span><span class="ot"> width=</span><span class="st">"1em"</span><span class="kw">/&gt;</span></span> <span><a aria-hidden="true" href="#cb33-34"></a> RSS</span> <span><a aria-hidden="true" href="#cb33-35"></a> <span class="kw">&lt;/a&gt;</span></span> <span><a aria-hidden="true" href="#cb33-36"></a> <span class="kw">&lt;a</span><span class="ot"> href=</span><span class="st">"../article/rechtliches.html"</span><span class="ot"> rel=</span><span class="st">"nofollow"</span></span> <span><a aria-hidden="true" href="#cb33-37"></a><span class="ot"> alt=</span><span class="st">"Impressum, Urheberrecht und Datenschutz"</span><span class="ot"> tabindex=</span><span class="st">"4"</span><span class="kw">&gt;</span></span> <span><a aria-hidden="true" href="#cb33-38"></a> <span class="kw">&lt;img</span><span class="ot"> src=</span><span class="st">"../image/Legal.png"</span><span class="ot"> alt=</span><span class="st">"RSS-Feed"</span><span class="ot"> width=</span><span class="st">"1em"</span><span class="kw">/&gt;</span></span> <span><a aria-hidden="true" href="#cb33-39"></a> Rechtliches</span> <span><a aria-hidden="true" href="#cb33-40"></a> <span class="kw">&lt;/a&gt;</span></span> <span><a aria-hidden="true" href="#cb33-41"></a> <span class="kw">&lt;a</span><span class="ot"> href=</span><span class="st">"../archive/idee-archive.html"</span></span> <span><a aria-hidden="true" href="#cb33-42"></a><span class="ot"> alt=</span><span class="st">"Archiv"</span><span class="ot"> tabindex=</span><span class="st">"5"</span><span class="kw">&gt;</span></span> <span><a aria-hidden="true" href="#cb33-43"></a> <span class="kw">&lt;img</span><span class="ot"> src=</span><span class="st">"../image/Archive.png"</span><span class="ot"> alt=</span><span class="st">"Archiv"</span><span class="ot"> width=</span><span class="st">"1em"</span><span class="kw">/&gt;</span></span> <span><a aria-hidden="true" href="#cb33-44"></a> Archiv</span> <span><a aria-hidden="true" href="#cb33-45"></a> <span class="kw">&lt;/a&gt;</span></span> <span><a aria-hidden="true" href="#cb33-46"></a> <span class="kw">&lt;/nav&gt;</span></span> <span><a aria-hidden="true" href="#cb33-47"></a> <span class="kw">&lt;hr/&gt;</span></span> <span><a aria-hidden="true" href="#cb33-48"></a> <span class="kw">&lt;script</span><span class="ot"> src=</span><span class="st">"../js/header.js"</span><span class="ot"> type=</span><span class="st">"text/javascript"</span><span class="ot"> defer</span><span class="kw">&gt;&lt;/script&gt;</span></span> <span><a aria-hidden="true" href="#cb33-49"></a> <span class="kw">&lt;/header&gt;</span></span></code></pre> </div> <h4> Portal Page Generation: The PlainWorker </h4> <p> <strong> ~/projects/idee/generator/plainworker.py </strong> </p> <div class="sourceCode"> <pre class="sourceCode Python"><code class="sourceCode python"><span><a aria-hidden="true" href="#cb34-1"></a><span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb34-2"></a><span class="co">PlainWorker is derived from the MsgWorker base class.</span></span> <span><a aria-hidden="true" href="#cb34-3"></a></span> <span><a aria-hidden="true" href="#cb34-4"></a><span class="co">@author: Frank Siebert</span></span> <span><a aria-hidden="true" href="#cb34-5"></a><span class="co">@website: https://idee.frank-siebert.de</span></span> <span><a aria-hidden="true" href="#cb34-6"></a><span class="co">@license: https://creativecommons.org/publicdomain/zero/1.0/deed.en</span></span> <span><a aria-hidden="true" href="#cb34-7"></a><span class="co">@date: 2022-03-15</span></span> <span><a aria-hidden="true" href="#cb34-8"></a></span> <span><a aria-hidden="true" href="#cb34-9"></a><span class="co">The PlainWorker takes care of *.mediawiki files</span></span> <span><a aria-hidden="true" href="#cb34-10"></a><span class="co">in the author directory, if changes are committed</span></span> <span><a aria-hidden="true" href="#cb34-11"></a><span class="co">for them.</span></span> <span><a aria-hidden="true" href="#cb34-12"></a><span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb34-13"></a><span class="im">import</span> re</span> <span><a aria-hidden="true" href="#cb34-14"></a><span class="im">import</span> subprocess</span> <span><a aria-hidden="true" href="#cb34-15"></a><span class="im">import</span> qrcode</span> <span><a aria-hidden="true" href="#cb34-16"></a></span> <span><a aria-hidden="true" href="#cb34-17"></a><span class="im">from</span> bs4 <span class="im">import</span> BeautifulSoup</span> <span><a aria-hidden="true" href="#cb34-18"></a><span class="im">from</span> bs4.builder._htmlparser <span class="im">import</span> HTMLParserTreeBuilder</span> <span><a aria-hidden="true" href="#cb34-19"></a></span> <span><a aria-hidden="true" href="#cb34-20"></a></span> <span><a aria-hidden="true" href="#cb34-21"></a><span class="im">from</span> gitmsgdispatcher <span class="im">import</span> GitMsgDispatcher</span> <span><a aria-hidden="true" href="#cb34-22"></a><span class="im">from</span> gitmsgdispatcher <span class="im">import</span> MsgWorker</span> <span><a aria-hidden="true" href="#cb34-23"></a><span class="im">from</span> gitmsgconstants <span class="im">import</span> GitMsgConstants <span class="im">as</span> gmc</span> <span><a aria-hidden="true" href="#cb34-24"></a><span class="im">from</span> pdfworker <span class="im">import</span> PdfWorker</span> <span><a aria-hidden="true" href="#cb34-25"></a><span class="im">from</span> pubmetadata <span class="im">import</span> PubMetaData</span> <span><a aria-hidden="true" href="#cb34-26"></a></span> <span><a aria-hidden="true" href="#cb34-27"></a></span> <span><a aria-hidden="true" href="#cb34-28"></a><span class="kw">class</span> PlainWorker(MsgWorker):</span> <span><a aria-hidden="true" href="#cb34-29"></a> <span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb34-30"></a><span class="co"> The PlainWorker takes care of *.mediawiki files in the author/ directory.</span></span> <span><a aria-hidden="true" href="#cb34-31"></a></span> <span><a aria-hidden="true" href="#cb34-32"></a><span class="co"> Example of a line taken care for</span></span> <span><a aria-hidden="true" href="#cb34-33"></a><span class="co"> # modified: author/PDF-Icon.mediawiki</span></span> <span><a aria-hidden="true" href="#cb34-34"></a></span> <span><a aria-hidden="true" href="#cb34-35"></a><span class="co"> The line has to be from the section git message section:</span></span> <span><a aria-hidden="true" href="#cb34-36"></a><span class="co"> # Changes to be committed:</span></span> <span><a aria-hidden="true" href="#cb34-37"></a></span> <span><a aria-hidden="true" href="#cb34-38"></a><span class="co"> The main output is an HTML created from the mediawiki file,</span></span> <span><a aria-hidden="true" href="#cb34-39"></a><span class="co"> which is plain (without portal part) and stored in the</span></span> <span><a aria-hidden="true" href="#cb34-40"></a><span class="co"> folder GITROOT/plain/</span></span> <span><a aria-hidden="true" href="#cb34-41"></a></span> <span><a aria-hidden="true" href="#cb34-42"></a><span class="co"> A minor output, a PDF, might be requirested via the message line:</span></span> <span><a aria-hidden="true" href="#cb34-43"></a><span class="co"> # pdf:draft=true</span></span> <span><a aria-hidden="true" href="#cb34-44"></a></span> <span><a aria-hidden="true" href="#cb34-45"></a><span class="co"> The respective PDF is created from HTML and stored in the folder</span></span> <span><a aria-hidden="true" href="#cb34-46"></a><span class="co"> GITROOT/website/pdf/</span></span> <span><a aria-hidden="true" href="#cb34-47"></a></span> <span><a aria-hidden="true" href="#cb34-48"></a><span class="co"> Parameters</span></span> <span><a aria-hidden="true" href="#cb34-49"></a><span class="co"> ----------</span></span> <span><a aria-hidden="true" href="#cb34-50"></a><span class="co"> super: MsgWorker</span></span> <span><a aria-hidden="true" href="#cb34-51"></a><span class="co"> The PlainWorker is derived from the MsgWorker.</span></span> <span><a aria-hidden="true" href="#cb34-52"></a></span> <span><a aria-hidden="true" href="#cb34-53"></a><span class="co"> Returns</span></span> <span><a aria-hidden="true" href="#cb34-54"></a><span class="co"> -------</span></span> <span><a aria-hidden="true" href="#cb34-55"></a><span class="co"> PlainWorker.</span></span> <span><a aria-hidden="true" href="#cb34-56"></a><span class="co"> """</span></span> <span><a aria-hidden="true" href="#cb34-57"></a></span> <span><a aria-hidden="true" href="#cb34-58"></a> portal_header_fragment <span class="op">=</span> <span class="va">None</span></span> <span><a aria-hidden="true" href="#cb34-59"></a> licence <span class="op">=</span> <span class="st">"./creative-commons-cc0-1-0-universal.html"</span></span> <span><a aria-hidden="true" href="#cb34-60"></a> ccimg <span class="op">=</span> <span class="st">"../image/CC-Icon.png"</span></span> <span><a aria-hidden="true" href="#cb34-61"></a> cc0img <span class="op">=</span> <span class="st">"../image/CC0-Icon.png"</span></span> <span><a aria-hidden="true" href="#cb34-62"></a></span> <span><a aria-hidden="true" href="#cb34-63"></a> <span class="kw">def</span> <span class="fu">__init__</span>(<span class="va">self</span>, pattern):</span> <span><a aria-hidden="true" href="#cb34-64"></a> <span class="bu">super</span>().<span class="fu">__init__</span>(pattern)</span> <span><a aria-hidden="true" href="#cb34-65"></a> <span class="va">self</span>.values <span class="op">=</span> {}</span> <span><a aria-hidden="true" href="#cb34-66"></a></span> <span><a aria-hidden="true" href="#cb34-67"></a> <span class="at">@staticmethod</span></span> <span><a aria-hidden="true" href="#cb34-68"></a> <span class="kw">def</span> __make_qrcode__(stem):</span> <span><a aria-hidden="true" href="#cb34-69"></a> <span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb34-70"></a><span class="co"> Create a qrcode for the page, whose stem name is provided.</span></span> <span><a aria-hidden="true" href="#cb34-71"></a></span> <span><a aria-hidden="true" href="#cb34-72"></a><span class="co"> The created qrcode is saved in the sites qrcode directory.</span></span> <span><a aria-hidden="true" href="#cb34-73"></a><span class="co"> We create a QR Code for each article, containing its URL</span></span> <span><a aria-hidden="true" href="#cb34-74"></a></span> <span><a aria-hidden="true" href="#cb34-75"></a><span class="co"> Parameters</span></span> <span><a aria-hidden="true" href="#cb34-76"></a><span class="co"> ----------</span></span> <span><a aria-hidden="true" href="#cb34-77"></a><span class="co"> stem : String</span></span> <span><a aria-hidden="true" href="#cb34-78"></a></span> <span><a aria-hidden="true" href="#cb34-79"></a><span class="co"> Returns</span></span> <span><a aria-hidden="true" href="#cb34-80"></a><span class="co"> -------</span></span> <span><a aria-hidden="true" href="#cb34-81"></a><span class="co"> None.</span></span> <span><a aria-hidden="true" href="#cb34-82"></a><span class="co"> """</span></span> <span><a aria-hidden="true" href="#cb34-83"></a> docurl <span class="op">=</span> gmc.website <span class="op">+</span> <span class="st">"/article/"</span> <span class="op">+</span> stem <span class="op">+</span> <span class="st">".html"</span></span> <span><a aria-hidden="true" href="#cb34-84"></a> image <span class="op">=</span> qrcode.make(data<span class="op">=</span>docurl)</span> <span><a aria-hidden="true" href="#cb34-85"></a> qrpath <span class="op">=</span> gmc.qrpath <span class="op">/</span> stem</span> <span><a aria-hidden="true" href="#cb34-86"></a> qrpath <span class="op">=</span> qrpath.with_suffix(<span class="st">".png"</span>)</span> <span><a aria-hidden="true" href="#cb34-87"></a> qrpath.resolve()</span> <span><a aria-hidden="true" href="#cb34-88"></a> image.save(qrpath)</span> <span><a aria-hidden="true" href="#cb34-89"></a> <span class="bu">print</span>(<span class="st">'wrote file </span><span class="sc">{0}</span><span class="st">'</span>.<span class="bu">format</span>(qrpath))</span> <span><a aria-hidden="true" href="#cb34-90"></a></span> <span><a aria-hidden="true" href="#cb34-91"></a> <span class="at">@staticmethod</span></span> <span><a aria-hidden="true" href="#cb34-92"></a> <span class="kw">def</span> __make_portal_page__(soup, urn, create_pdf):</span> <span><a aria-hidden="true" href="#cb34-93"></a> <span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb34-94"></a><span class="co"> Inject the portal into prepared HTML.</span></span> <span><a aria-hidden="true" href="#cb34-95"></a></span> <span><a aria-hidden="true" href="#cb34-96"></a><span class="co"> Function:</span></span> <span><a aria-hidden="true" href="#cb34-97"></a><span class="co"> The tag &lt;header&gt; in the context of &lt;body&gt;</span></span> <span><a aria-hidden="true" href="#cb34-98"></a><span class="co"> is replaced with the portal header.</span></span> <span><a aria-hidden="true" href="#cb34-99"></a></span> <span><a aria-hidden="true" href="#cb34-100"></a><span class="co"> Parameters</span></span> <span><a aria-hidden="true" href="#cb34-101"></a><span class="co"> ----------</span></span> <span><a aria-hidden="true" href="#cb34-102"></a><span class="co"> soup : BeautifulSoup, require</span></span> <span><a aria-hidden="true" href="#cb34-103"></a><span class="co"> DESCRIPTION. HTML page as BeautifulSoup Opject.</span></span> <span><a aria-hidden="true" href="#cb34-104"></a><span class="co"> urn : Str</span></span> <span><a aria-hidden="true" href="#cb34-105"></a><span class="co"> Unique Resource Identifier also used as stem in related files</span></span> <span><a aria-hidden="true" href="#cb34-106"></a></span> <span><a aria-hidden="true" href="#cb34-107"></a><span class="co"> Returns</span></span> <span><a aria-hidden="true" href="#cb34-108"></a><span class="co"> -------</span></span> <span><a aria-hidden="true" href="#cb34-109"></a><span class="co"> soup.</span></span> <span><a aria-hidden="true" href="#cb34-110"></a></span> <span><a aria-hidden="true" href="#cb34-111"></a><span class="co"> """</span></span> <span><a aria-hidden="true" href="#cb34-112"></a> <span class="co"># include the favicon just behind the css link</span></span> <span><a aria-hidden="true" href="#cb34-113"></a> csslink <span class="op">=</span> soup.find(<span class="st">"link"</span>)</span> <span><a aria-hidden="true" href="#cb34-114"></a> newtag <span class="op">=</span> soup.new_tag(<span class="st">"link"</span>)</span> <span><a aria-hidden="true" href="#cb34-115"></a> newtag.attrs.update({<span class="st">"rel"</span>: <span class="st">"icon"</span>,</span> <span><a aria-hidden="true" href="#cb34-116"></a> <span class="st">"href"</span>: <span class="vs">r"../image/favicon.ico"</span>,</span> <span><a aria-hidden="true" href="#cb34-117"></a> <span class="st">"type"</span>: <span class="st">"image/x-icon"</span></span> <span><a aria-hidden="true" href="#cb34-118"></a> })</span> <span><a aria-hidden="true" href="#cb34-119"></a> csslink.insert_after(newtag)</span> <span><a aria-hidden="true" href="#cb34-120"></a></span> <span><a aria-hidden="true" href="#cb34-121"></a> <span class="co"># inject article artefacts</span></span> <span><a aria-hidden="true" href="#cb34-122"></a> tag <span class="op">=</span> soup.find(<span class="st">"article"</span>)</span> <span><a aria-hidden="true" href="#cb34-123"></a> tag <span class="op">=</span> tag.find(<span class="st">"header"</span>)</span> <span><a aria-hidden="true" href="#cb34-124"></a></span> <span><a aria-hidden="true" href="#cb34-125"></a> headermedia <span class="op">=</span> soup.new_tag(<span class="st">"div"</span>)</span> <span><a aria-hidden="true" href="#cb34-126"></a> tag.append(headermedia)</span> <span><a aria-hidden="true" href="#cb34-127"></a></span> <span><a aria-hidden="true" href="#cb34-128"></a> <span class="co"># Move Article -&gt; Div Artefacts to Article -&gt; Header -&gt; Div</span></span> <span><a aria-hidden="true" href="#cb34-129"></a> tag <span class="op">=</span> soup.find(<span class="st">'article'</span>)</span> <span><a aria-hidden="true" href="#cb34-130"></a> tag <span class="op">=</span> tag.find(<span class="st">"div"</span>)</span> <span><a aria-hidden="true" href="#cb34-131"></a> <span class="cf">if</span> tag:</span> <span><a aria-hidden="true" href="#cb34-132"></a> headermedia.replace_with(tag)</span> <span><a aria-hidden="true" href="#cb34-133"></a> headermedia <span class="op">=</span> tag</span> <span><a aria-hidden="true" href="#cb34-134"></a></span> <span><a aria-hidden="true" href="#cb34-135"></a> tag <span class="op">=</span> soup.find(<span class="st">"article"</span>)</span> <span><a aria-hidden="true" href="#cb34-136"></a> tag <span class="op">=</span> tag.find(<span class="st">"div"</span>)</span> <span><a aria-hidden="true" href="#cb34-137"></a></span> <span><a aria-hidden="true" href="#cb34-138"></a> <span class="cf">if</span> create_pdf:</span> <span><a aria-hidden="true" href="#cb34-139"></a> newtag <span class="op">=</span> soup.new_tag(<span class="st">"figure"</span>)</span> <span><a aria-hidden="true" href="#cb34-140"></a> headermedia.insert(<span class="dv">1</span>, newtag)</span> <span><a aria-hidden="true" href="#cb34-141"></a> tag <span class="op">=</span> newtag</span> <span><a aria-hidden="true" href="#cb34-142"></a> newtag <span class="op">=</span> soup.new_tag(<span class="st">"a"</span>)</span> <span><a aria-hidden="true" href="#cb34-143"></a> tag.append(newtag)</span> <span><a aria-hidden="true" href="#cb34-144"></a></span> <span><a aria-hidden="true" href="#cb34-145"></a> newtag.attrs.update({<span class="st">"accesskey"</span>: <span class="st">"p"</span>,</span> <span><a aria-hidden="true" href="#cb34-146"></a> <span class="co"># "download": "",</span></span> <span><a aria-hidden="true" href="#cb34-147"></a> <span class="st">"href"</span>: <span class="vs">r"../pdf/"</span> <span class="op">+</span> urn <span class="op">+</span> <span class="st">".pdf"</span>,</span> <span><a aria-hidden="true" href="#cb34-148"></a> <span class="st">"target"</span>: <span class="st">"_blank"</span>,</span> <span><a aria-hidden="true" href="#cb34-149"></a> <span class="st">"type"</span>: <span class="st">"application/pdf"</span></span> <span><a aria-hidden="true" href="#cb34-150"></a> })</span> <span><a aria-hidden="true" href="#cb34-151"></a></span> <span><a aria-hidden="true" href="#cb34-152"></a> <span class="co"># Inject the PDF Icon</span></span> <span><a aria-hidden="true" href="#cb34-153"></a> tag <span class="op">=</span> newtag</span> <span><a aria-hidden="true" href="#cb34-154"></a> newtag <span class="op">=</span> soup.new_tag(<span class="st">"img"</span>)</span> <span><a aria-hidden="true" href="#cb34-155"></a> tag.append(newtag)</span> <span><a aria-hidden="true" href="#cb34-156"></a> newtag.attrs.update({<span class="st">"src"</span>: <span class="st">"../image/"</span> <span class="op">+</span> gmc.pdfimage})</span> <span><a aria-hidden="true" href="#cb34-157"></a></span> <span><a aria-hidden="true" href="#cb34-158"></a> <span class="co"># Inject the Audio Player, if an audio does exist</span></span> <span><a aria-hidden="true" href="#cb34-159"></a> audio <span class="op">=</span> gmc.audiopath <span class="op">/</span> (urn <span class="op">+</span> <span class="st">".mp3"</span>)</span> <span><a aria-hidden="true" href="#cb34-160"></a> audio.resolve()</span> <span><a aria-hidden="true" href="#cb34-161"></a> <span class="cf">if</span> audio.exists():</span> <span><a aria-hidden="true" href="#cb34-162"></a> audio <span class="op">=</span> <span class="vs">r"../audio/"</span> <span class="op">+</span> urn <span class="op">+</span> <span class="st">".mp3"</span></span> <span><a aria-hidden="true" href="#cb34-163"></a> newtag <span class="op">=</span> soup.new_tag(<span class="st">"figure"</span>)</span> <span><a aria-hidden="true" href="#cb34-164"></a> headermedia.append(newtag)</span> <span><a aria-hidden="true" href="#cb34-165"></a> tag <span class="op">=</span> newtag</span> <span><a aria-hidden="true" href="#cb34-166"></a> newtag <span class="op">=</span> soup.new_tag(<span class="st">"audio"</span>)</span> <span><a aria-hidden="true" href="#cb34-167"></a> tag.append(newtag)</span> <span><a aria-hidden="true" href="#cb34-168"></a> newtag.attrs.update({<span class="st">"accesskey"</span>: <span class="st">"a"</span>,</span> <span><a aria-hidden="true" href="#cb34-169"></a> <span class="st">"type"</span>: <span class="st">"audio/mp3"</span>,</span> <span><a aria-hidden="true" href="#cb34-170"></a> <span class="st">"preload"</span>: <span class="st">"none"</span>,</span> <span><a aria-hidden="true" href="#cb34-171"></a> <span class="st">"controls"</span>: <span class="st">"true"</span>,</span> <span><a aria-hidden="true" href="#cb34-172"></a> <span class="st">"src"</span>: audio})</span> <span><a aria-hidden="true" href="#cb34-173"></a></span> <span><a aria-hidden="true" href="#cb34-174"></a> <span class="co"># Finally, no more additions expected,</span></span> <span><a aria-hidden="true" href="#cb34-175"></a> <span class="co"># We give every anchor a tabindex</span></span> <span><a aria-hidden="true" href="#cb34-176"></a> <span class="co"># 5 (or less) Tabindexes are in the portal header</span></span> <span><a aria-hidden="true" href="#cb34-177"></a> index <span class="op">=</span> <span class="dv">6</span></span> <span><a aria-hidden="true" href="#cb34-178"></a> tags <span class="op">=</span> soup.find_all(re.<span class="bu">compile</span>(<span class="vs">r"^a$|^audio$|^input$"</span>))</span> <span><a aria-hidden="true" href="#cb34-179"></a> <span class="cf">for</span> tag <span class="kw">in</span> tags:</span> <span><a aria-hidden="true" href="#cb34-180"></a> tag.attrs.update({<span class="st">"tabindex"</span>: index})</span> <span><a aria-hidden="true" href="#cb34-181"></a> index <span class="op">+=</span> <span class="dv">1</span></span> <span><a aria-hidden="true" href="#cb34-182"></a></span> <span><a aria-hidden="true" href="#cb34-183"></a> <span class="cf">return</span> soup</span> <span><a aria-hidden="true" href="#cb34-184"></a></span> <span><a aria-hidden="true" href="#cb34-185"></a> <span class="kw">def</span> process(<span class="va">self</span>):</span> <span><a aria-hidden="true" href="#cb34-186"></a> <span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb34-187"></a><span class="co"> Process the plain HTML files into article HTML files.</span></span> <span><a aria-hidden="true" href="#cb34-188"></a></span> <span><a aria-hidden="true" href="#cb34-189"></a><span class="co"> Returns</span></span> <span><a aria-hidden="true" href="#cb34-190"></a><span class="co"> -------</span></span> <span><a aria-hidden="true" href="#cb34-191"></a><span class="co"> None.</span></span> <span><a aria-hidden="true" href="#cb34-192"></a><span class="co"> """</span></span> <span><a aria-hidden="true" href="#cb34-193"></a> <span class="co"># inject meta information from commit message</span></span> <span><a aria-hidden="true" href="#cb34-194"></a> <span class="co"># Creates the single instance of PubMetaData</span></span> <span><a aria-hidden="true" href="#cb34-195"></a> PubMetaData(<span class="va">self</span>.dispatcher.parameters.values)</span> <span><a aria-hidden="true" href="#cb34-196"></a></span> <span><a aria-hidden="true" href="#cb34-197"></a> <span class="co"># compose the output path</span></span> <span><a aria-hidden="true" href="#cb34-198"></a> <span class="va">self</span>.outpath <span class="op">=</span> gmc.articlepath <span class="op">/</span> <span class="va">self</span>.inpath.stem</span> <span><a aria-hidden="true" href="#cb34-199"></a> <span class="va">self</span>.outpath <span class="op">=</span> <span class="va">self</span>.outpath.with_suffix(<span class="st">".html"</span>)</span> <span><a aria-hidden="true" href="#cb34-200"></a> <span class="va">self</span>.outpath.resolve()</span> <span><a aria-hidden="true" href="#cb34-201"></a></span> <span><a aria-hidden="true" href="#cb34-202"></a> <span class="co"># The plain html contains a publising date.</span></span> <span><a aria-hidden="true" href="#cb34-203"></a> <span class="co"># But this might be the date the plain html was created,</span></span> <span><a aria-hidden="true" href="#cb34-204"></a> <span class="co"># and not the real publishing date, if no previous publishing</span></span> <span><a aria-hidden="true" href="#cb34-205"></a> <span class="co"># took place.</span></span> <span><a aria-hidden="true" href="#cb34-206"></a> <span class="co"># We need to read the plain html and use the title to search</span></span> <span><a aria-hidden="true" href="#cb34-207"></a> <span class="co"># for a publishing date of previous publishings.</span></span> <span><a aria-hidden="true" href="#cb34-208"></a> <span class="co"># If we do not find a previous publishing date, we need</span></span> <span><a aria-hidden="true" href="#cb34-209"></a> <span class="co"># to change the publishing date entries to the current date.</span></span> <span><a aria-hidden="true" href="#cb34-210"></a></span> <span><a aria-hidden="true" href="#cb34-211"></a> <span class="cf">with</span> <span class="bu">open</span>(<span class="va">self</span>.inpath, <span class="st">'r'</span>) <span class="im">as</span> infile:</span> <span><a aria-hidden="true" href="#cb34-212"></a> html_doc <span class="op">=</span> infile.read()</span> <span><a aria-hidden="true" href="#cb34-213"></a> infile.flush()</span> <span><a aria-hidden="true" href="#cb34-214"></a> infile.close()</span> <span><a aria-hidden="true" href="#cb34-215"></a></span> <span><a aria-hidden="true" href="#cb34-216"></a> builder <span class="op">=</span> HTMLParserTreeBuilder()</span> <span><a aria-hidden="true" href="#cb34-217"></a> soup <span class="op">=</span> BeautifulSoup(html_doc, builder<span class="op">=</span>builder)</span> <span><a aria-hidden="true" href="#cb34-218"></a></span> <span><a aria-hidden="true" href="#cb34-219"></a> <span class="co"># Own magic words:</span></span> <span><a aria-hidden="true" href="#cb34-220"></a> <span class="co"># __NOPDF__ Do not create PDF</span></span> <span><a aria-hidden="true" href="#cb34-221"></a> <span class="co"># __NOLIC__ Place no own CC0 license information</span></span> <span><a aria-hidden="true" href="#cb34-222"></a> <span class="co"># Noting but whitespaces and magic word in one line</span></span> <span><a aria-hidden="true" href="#cb34-223"></a> create_pdf <span class="op">=</span> <span class="va">True</span></span> <span><a aria-hidden="true" href="#cb34-224"></a> tag <span class="op">=</span> soup.find(<span class="st">"p"</span>, string<span class="op">=</span>re.<span class="bu">compile</span>(<span class="vs">r'^\s*__NOPDF__\s*$'</span>))</span> <span><a aria-hidden="true" href="#cb34-225"></a> <span class="cf">if</span> tag:</span> <span><a aria-hidden="true" href="#cb34-226"></a> create_pdf <span class="op">=</span> <span class="va">False</span></span> <span><a aria-hidden="true" href="#cb34-227"></a> tag.decompose()</span> <span><a aria-hidden="true" href="#cb34-228"></a></span> <span><a aria-hidden="true" href="#cb34-229"></a> show_lic <span class="op">=</span> <span class="va">True</span></span> <span><a aria-hidden="true" href="#cb34-230"></a> tag <span class="op">=</span> soup.find(<span class="st">"p"</span>, string<span class="op">=</span>re.<span class="bu">compile</span>(<span class="vs">r'^\s*__NOLIC__\s*$'</span>))</span> <span><a aria-hidden="true" href="#cb34-231"></a> <span class="cf">if</span> tag:</span> <span><a aria-hidden="true" href="#cb34-232"></a> show_lic <span class="op">=</span> <span class="va">False</span></span> <span><a aria-hidden="true" href="#cb34-233"></a> tag.decompose()</span> <span><a aria-hidden="true" href="#cb34-234"></a></span> <span><a aria-hidden="true" href="#cb34-235"></a> title <span class="op">=</span> soup.find(<span class="st">"title"</span>).text.strip()</span> <span><a aria-hidden="true" href="#cb34-236"></a></span> <span><a aria-hidden="true" href="#cb34-237"></a> article_data <span class="op">=</span> PubMetaData.instance.get_new_revision(</span> <span><a aria-hidden="true" href="#cb34-238"></a> title<span class="op">=</span>title,</span> <span><a aria-hidden="true" href="#cb34-239"></a> urn<span class="op">=</span><span class="va">self</span>.inpath.stem <span class="co"># takes preference before title</span></span> <span><a aria-hidden="true" href="#cb34-240"></a> )</span> <span><a aria-hidden="true" href="#cb34-241"></a></span> <span><a aria-hidden="true" href="#cb34-242"></a> tag <span class="op">=</span> soup.find(<span class="st">"meta"</span>, attrs<span class="op">=</span>{<span class="st">"property"</span>: PubMetaData.pubdate})</span> <span><a aria-hidden="true" href="#cb34-243"></a> tag.attrs.update({</span> <span><a aria-hidden="true" href="#cb34-244"></a> <span class="st">"property"</span>: PubMetaData.pubdate,</span> <span><a aria-hidden="true" href="#cb34-245"></a> <span class="st">"content"</span>: article_data[PubMetaData.pubdate]})</span> <span><a aria-hidden="true" href="#cb34-246"></a></span> <span><a aria-hidden="true" href="#cb34-247"></a> tag <span class="op">=</span> soup.find(<span class="st">"time"</span>)</span> <span><a aria-hidden="true" href="#cb34-248"></a> tag.clear()</span> <span><a aria-hidden="true" href="#cb34-249"></a> tag.append(article_data[PubMetaData.pubdate][:<span class="dv">10</span>])</span> <span><a aria-hidden="true" href="#cb34-250"></a> tag.attrs.update({<span class="st">"datetime"</span>: article_data[PubMetaData.pubdate][:<span class="dv">19</span>]})</span> <span><a aria-hidden="true" href="#cb34-251"></a> <span class="co"># probably deprecated by itemprop alternative</span></span> <span><a aria-hidden="true" href="#cb34-252"></a> tag.attrs.update({<span class="st">"pubdate"</span>: <span class="st">"true"</span>})</span> <span><a aria-hidden="true" href="#cb34-253"></a></span> <span><a aria-hidden="true" href="#cb34-254"></a> tag <span class="op">=</span> soup.find(</span> <span><a aria-hidden="true" href="#cb34-255"></a> <span class="st">"meta"</span>, attrs<span class="op">=</span>{<span class="st">"property"</span>: PubMetaData.revdate})</span> <span><a aria-hidden="true" href="#cb34-256"></a> <span class="cf">if</span> <span class="kw">not</span> tag:</span> <span><a aria-hidden="true" href="#cb34-257"></a> <span class="co"># inject the modified_time as meta tag</span></span> <span><a aria-hidden="true" href="#cb34-258"></a> head <span class="op">=</span> soup.find(<span class="st">"head"</span>)</span> <span><a aria-hidden="true" href="#cb34-259"></a> tag <span class="op">=</span> soup.new_tag(<span class="st">"meta"</span>)</span> <span><a aria-hidden="true" href="#cb34-260"></a> head.insert(<span class="dv">6</span>, tag)</span> <span><a aria-hidden="true" href="#cb34-261"></a> tag.attrs.update({</span> <span><a aria-hidden="true" href="#cb34-262"></a> <span class="st">"property"</span>: PubMetaData.revdate,</span> <span><a aria-hidden="true" href="#cb34-263"></a> <span class="st">"content"</span>: article_data[PubMetaData.revdate]})</span> <span><a aria-hidden="true" href="#cb34-264"></a></span> <span><a aria-hidden="true" href="#cb34-265"></a> <span class="co"># take care for links</span></span> <span><a aria-hidden="true" href="#cb34-266"></a> <span class="co"># For a start we know, that "../website/" becomes "../".</span></span> <span><a aria-hidden="true" href="#cb34-267"></a> tags <span class="op">=</span> soup.find_all(re.<span class="bu">compile</span>(<span class="st">"link|a"</span>),</span> <span><a aria-hidden="true" href="#cb34-268"></a> attrs<span class="op">=</span>{<span class="st">"href"</span>: re.<span class="bu">compile</span>(<span class="vs">r"../website/"</span>)})</span> <span><a aria-hidden="true" href="#cb34-269"></a> <span class="cf">for</span> tag <span class="kw">in</span> tags:</span> <span><a aria-hidden="true" href="#cb34-270"></a> shref <span class="op">=</span> tag[<span class="st">"href"</span>]</span> <span><a aria-hidden="true" href="#cb34-271"></a> shref <span class="op">=</span> shref.replace(<span class="st">"../website/"</span>, <span class="st">"../"</span>)</span> <span><a aria-hidden="true" href="#cb34-272"></a> tag.attrs.update({<span class="st">"href"</span>: shref})</span> <span><a aria-hidden="true" href="#cb34-273"></a></span> <span><a aria-hidden="true" href="#cb34-274"></a> tags <span class="op">=</span> soup.find_all(<span class="st">"img"</span>,</span> <span><a aria-hidden="true" href="#cb34-275"></a> attrs<span class="op">=</span>{<span class="st">"src"</span>: re.<span class="bu">compile</span>(<span class="vs">r"../website/"</span>)})</span> <span><a aria-hidden="true" href="#cb34-276"></a> <span class="cf">for</span> tag <span class="kw">in</span> tags:</span> <span><a aria-hidden="true" href="#cb34-277"></a> shref <span class="op">=</span> tag[<span class="st">"src"</span>]</span> <span><a aria-hidden="true" href="#cb34-278"></a> shref <span class="op">=</span> shref.replace(<span class="st">"../website/"</span>, <span class="st">"../"</span>)</span> <span><a aria-hidden="true" href="#cb34-279"></a> tag.attrs.update({<span class="st">"src"</span>: shref})</span> <span><a aria-hidden="true" href="#cb34-280"></a></span> <span><a aria-hidden="true" href="#cb34-281"></a> <span class="co"># Insert header div for article artefacts</span></span> <span><a aria-hidden="true" href="#cb34-282"></a> <span class="co"># Embedd it into the article.</span></span> <span><a aria-hidden="true" href="#cb34-283"></a> tag <span class="op">=</span> soup.find(<span class="st">"article"</span>)</span> <span><a aria-hidden="true" href="#cb34-284"></a> headerdiv <span class="op">=</span> soup.new_tag(<span class="st">"div"</span>)</span> <span><a aria-hidden="true" href="#cb34-285"></a> tag.insert(<span class="dv">1</span>, headerdiv)</span> <span><a aria-hidden="true" href="#cb34-286"></a></span> <span><a aria-hidden="true" href="#cb34-287"></a> <span class="co"># Create QR code for the document and the site.</span></span> <span><a aria-hidden="true" href="#cb34-288"></a> <span class="co"># Embedd it into header div.</span></span> <span><a aria-hidden="true" href="#cb34-289"></a> <span class="va">self</span>.__make_qrcode__(<span class="va">self</span>.inpath.stem)</span> <span><a aria-hidden="true" href="#cb34-290"></a> qruri <span class="op">=</span> <span class="st">"../qrcode/"</span> <span class="op">+</span> <span class="va">self</span>.inpath.stem <span class="op">+</span> <span class="st">".png"</span></span> <span><a aria-hidden="true" href="#cb34-291"></a> newtag <span class="op">=</span> soup.new_tag(<span class="st">"figure"</span>)</span> <span><a aria-hidden="true" href="#cb34-292"></a> headerdiv.append(newtag)</span> <span><a aria-hidden="true" href="#cb34-293"></a> tag <span class="op">=</span> newtag</span> <span><a aria-hidden="true" href="#cb34-294"></a> newtag <span class="op">=</span> soup.new_tag(<span class="st">"figcaption"</span>)</span> <span><a aria-hidden="true" href="#cb34-295"></a> <span class="co"># Decided in the end to get rid of text for the RQ Code</span></span> <span><a aria-hidden="true" href="#cb34-296"></a> <span class="co"># newtag.append(soup.new_string("URL"))</span></span> <span><a aria-hidden="true" href="#cb34-297"></a> tag.insert(<span class="dv">0</span>, newtag)</span> <span><a aria-hidden="true" href="#cb34-298"></a></span> <span><a aria-hidden="true" href="#cb34-299"></a> newtag <span class="op">=</span> soup.new_tag(<span class="st">"a"</span>)</span> <span><a aria-hidden="true" href="#cb34-300"></a> newtag.attrs.update({<span class="st">"href"</span>: qruri})</span> <span><a aria-hidden="true" href="#cb34-301"></a> tag.insert(<span class="dv">0</span>, newtag)</span> <span><a aria-hidden="true" href="#cb34-302"></a> tag <span class="op">=</span> newtag</span> <span><a aria-hidden="true" href="#cb34-303"></a> newtag <span class="op">=</span> soup.new_tag(<span class="st">"img"</span>)</span> <span><a aria-hidden="true" href="#cb34-304"></a> newtag.attrs.update({<span class="st">"width"</span>: <span class="st">"150px"</span>, <span class="st">"height"</span>: <span class="st">"150px"</span>})</span> <span><a aria-hidden="true" href="#cb34-305"></a> newtag.attrs.update({<span class="st">"src"</span>: qruri})</span> <span><a aria-hidden="true" href="#cb34-306"></a> newtag.attrs.update({<span class="st">"alt"</span>: <span class="st">"QR Code"</span>})</span> <span><a aria-hidden="true" href="#cb34-307"></a> tag.insert(<span class="dv">0</span>, newtag)</span> <span><a aria-hidden="true" href="#cb34-308"></a></span> <span><a aria-hidden="true" href="#cb34-309"></a> <span class="cf">if</span> show_lic:</span> <span><a aria-hidden="true" href="#cb34-310"></a> newtag <span class="op">=</span> soup.new_tag(<span class="st">"a"</span>)</span> <span><a aria-hidden="true" href="#cb34-311"></a> headerdiv.append(newtag)</span> <span><a aria-hidden="true" href="#cb34-312"></a> newtag.attrs.update({<span class="st">"href"</span>: PlainWorker.licence})</span> <span><a aria-hidden="true" href="#cb34-313"></a> tag <span class="op">=</span> newtag</span> <span><a aria-hidden="true" href="#cb34-314"></a> newtag <span class="op">=</span> soup.new_tag(<span class="st">"img"</span>)</span> <span><a aria-hidden="true" href="#cb34-315"></a> <span class="co"># The following scaling is for the PDF</span></span> <span><a aria-hidden="true" href="#cb34-316"></a> <span class="co"># In the browser the CSS overwrites this scaling:</span></span> <span><a aria-hidden="true" href="#cb34-317"></a> newtag.attrs.update({<span class="st">"width"</span>: <span class="st">"28px"</span>, <span class="st">"height"</span>: <span class="st">"28px"</span>})</span> <span><a aria-hidden="true" href="#cb34-318"></a> newtag.attrs.update({<span class="st">"src"</span>: PlainWorker.ccimg})</span> <span><a aria-hidden="true" href="#cb34-319"></a> newtag.attrs.update({<span class="st">"alt"</span>: <span class="st">"Creative Commons"</span>})</span> <span><a aria-hidden="true" href="#cb34-320"></a> tag.insert(<span class="dv">0</span>, newtag)</span> <span><a aria-hidden="true" href="#cb34-321"></a></span> <span><a aria-hidden="true" href="#cb34-322"></a> newtag <span class="op">=</span> soup.new_tag(<span class="st">"a"</span>)</span> <span><a aria-hidden="true" href="#cb34-323"></a> headerdiv.append(newtag)</span> <span><a aria-hidden="true" href="#cb34-324"></a> newtag.attrs.update({<span class="st">"href"</span>: PlainWorker.licence})</span> <span><a aria-hidden="true" href="#cb34-325"></a> tag <span class="op">=</span> newtag</span> <span><a aria-hidden="true" href="#cb34-326"></a> newtag <span class="op">=</span> soup.new_tag(<span class="st">"img"</span>)</span> <span><a aria-hidden="true" href="#cb34-327"></a> <span class="co"># The following scaling is for the PDF</span></span> <span><a aria-hidden="true" href="#cb34-328"></a> <span class="co"># In the browser the CSS overwrites this scaling:</span></span> <span><a aria-hidden="true" href="#cb34-329"></a> newtag.attrs.update({<span class="st">"width"</span>: <span class="st">"28px"</span>, <span class="st">"height"</span>: <span class="st">"28px"</span>})</span> <span><a aria-hidden="true" href="#cb34-330"></a> newtag.attrs.update({<span class="st">"src"</span>: PlainWorker.cc0img})</span> <span><a aria-hidden="true" href="#cb34-331"></a> newtag.attrs.update({<span class="st">"alt"</span>: <span class="st">"Zero"</span>})</span> <span><a aria-hidden="true" href="#cb34-332"></a> tag.insert(<span class="dv">0</span>, newtag)</span> <span><a aria-hidden="true" href="#cb34-333"></a></span> <span><a aria-hidden="true" href="#cb34-334"></a> <span class="co"># Make a portal page from the html</span></span> <span><a aria-hidden="true" href="#cb34-335"></a> soup <span class="op">=</span> <span class="va">self</span>.__make_portal_page__(soup, <span class="va">self</span>.inpath.stem, create_pdf)</span> <span><a aria-hidden="true" href="#cb34-336"></a> html_doc <span class="op">=</span> soup.prettify()</span> <span><a aria-hidden="true" href="#cb34-337"></a></span> <span><a aria-hidden="true" href="#cb34-338"></a> <span class="co"># Save the article.</span></span> <span><a aria-hidden="true" href="#cb34-339"></a> <span class="cf">with</span> <span class="bu">open</span>(<span class="va">self</span>.outpath, <span class="st">'w'</span>) <span class="im">as</span> outfile:</span> <span><a aria-hidden="true" href="#cb34-340"></a> <span class="bu">print</span>(html_doc, <span class="bu">file</span><span class="op">=</span>outfile)</span> <span><a aria-hidden="true" href="#cb34-341"></a> outfile.flush()</span> <span><a aria-hidden="true" href="#cb34-342"></a> outfile.close()</span> <span><a aria-hidden="true" href="#cb34-343"></a> <span class="bu">print</span>(<span class="st">'wrote file </span><span class="sc">{0}</span><span class="st">'</span>.<span class="bu">format</span>(<span class="va">self</span>.outpath))</span> <span><a aria-hidden="true" href="#cb34-344"></a> subprocess.run([<span class="st">"firefox"</span>, <span class="va">self</span>.outpath], capture_output<span class="op">=</span><span class="va">False</span>)</span> <span><a aria-hidden="true" href="#cb34-345"></a></span> <span><a aria-hidden="true" href="#cb34-346"></a> <span class="co"># Flag a metadata update</span></span> <span><a aria-hidden="true" href="#cb34-347"></a> PubMetaData.instance.update(article_data)</span> <span><a aria-hidden="true" href="#cb34-348"></a></span> <span><a aria-hidden="true" href="#cb34-349"></a> <span class="cf">if</span> create_pdf:</span> <span><a aria-hidden="true" href="#cb34-350"></a> <span class="co"># Placing a worklist item for the PdfWorker</span></span> <span><a aria-hidden="true" href="#cb34-351"></a> <span class="va">self</span>.dispatcher.worklist.append(</span> <span><a aria-hidden="true" href="#cb34-352"></a> PdfWorker.make_pdf_worklist_item(</span> <span><a aria-hidden="true" href="#cb34-353"></a> article_data.name,</span> <span><a aria-hidden="true" href="#cb34-354"></a> html_doc,</span> <span><a aria-hidden="true" href="#cb34-355"></a> gmc.articlepath,</span> <span><a aria-hidden="true" href="#cb34-356"></a> MsgWorker.task_create,</span> <span><a aria-hidden="true" href="#cb34-357"></a> draft<span class="op">=</span><span class="va">False</span></span> <span><a aria-hidden="true" href="#cb34-358"></a> )</span> <span><a aria-hidden="true" href="#cb34-359"></a> )</span> <span><a aria-hidden="true" href="#cb34-360"></a></span> <span><a aria-hidden="true" href="#cb34-361"></a> <span class="kw">def</span> delete(<span class="va">self</span>):</span> <span><a aria-hidden="true" href="#cb34-362"></a> <span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb34-363"></a><span class="co"> Delete the generated HTML.</span></span> <span><a aria-hidden="true" href="#cb34-364"></a></span> <span><a aria-hidden="true" href="#cb34-365"></a><span class="co"> Resources used by the HTML need additional care.</span></span> <span><a aria-hidden="true" href="#cb34-366"></a><span class="co"> If the delete was triggered by rename, no resources have to be deleted.</span></span> <span><a aria-hidden="true" href="#cb34-367"></a><span class="co"> If it was triggered by a delete, a check is required,</span></span> <span><a aria-hidden="true" href="#cb34-368"></a><span class="co"> whether the resources are used by other pages as well.</span></span> <span><a aria-hidden="true" href="#cb34-369"></a><span class="co"> But resources are place anyhow in the final website location.</span></span> <span><a aria-hidden="true" href="#cb34-370"></a><span class="co"> They must not be deleted by the PlainWorker.</span></span> <span><a aria-hidden="true" href="#cb34-371"></a><span class="co"> """</span></span> <span><a aria-hidden="true" href="#cb34-372"></a></span> <span><a aria-hidden="true" href="#cb34-373"></a></span> <span><a aria-hidden="true" href="#cb34-374"></a><span class="cf">if</span> <span class="va">__name__</span> <span class="op">==</span> <span class="st">"__main__"</span>:</span> <span><a aria-hidden="true" href="#cb34-375"></a> <span class="im">from</span> mwworker <span class="im">import</span> MwWorker</span> <span><a aria-hidden="true" href="#cb34-376"></a> <span class="bu">print</span>(<span class="st">"Running Test-Cases"</span>)</span> <span><a aria-hidden="true" href="#cb34-377"></a></span> <span><a aria-hidden="true" href="#cb34-378"></a> mwworker <span class="op">=</span> MwWorker(<span class="vs">r".*modified.*author[/].*\.mediawiki"</span>)</span> <span><a aria-hidden="true" href="#cb34-379"></a> plainworker <span class="op">=</span> PlainWorker(<span class="vs">r".*[modified|new file].*plain[/].*\.html"</span>)</span> <span><a aria-hidden="true" href="#cb34-380"></a> pdfworker <span class="op">=</span> PdfWorker(<span class="vs">r""</span> <span class="op">+</span> PdfWorker.pdfworkitem)</span> <span><a aria-hidden="true" href="#cb34-381"></a></span> <span><a aria-hidden="true" href="#cb34-382"></a> <span class="co"># MESSAGEFILE = "test/PDF-Icon-TestCase-2"</span></span> <span><a aria-hidden="true" href="#cb34-383"></a> <span class="co"># MESSAGEFILE = "test/cc-plain-testcase"</span></span> <span><a aria-hidden="true" href="#cb34-384"></a> <span class="co"># MESSAGEFILE = "test/englands-gesamttodesraten-TestCase-2"</span></span> <span><a aria-hidden="true" href="#cb34-385"></a> <span class="co"># MESSAGEFILE = "test/endlich-TestCase-2"</span></span> <span><a aria-hidden="true" href="#cb34-386"></a> <span class="co"># MESSAGEFILE = "test/ich-denke-TestCase-2"</span></span> <span><a aria-hidden="true" href="#cb34-387"></a> <span class="co"># MESSAGEFILE = "test/astravacz-TestCase-2"</span></span> <span><a aria-hidden="true" href="#cb34-388"></a> MESSAGEFILE <span class="op">=</span> <span class="st">"test/allesaufdentisch-TestCase-2"</span></span> <span><a aria-hidden="true" href="#cb34-389"></a> disp <span class="op">=</span> GitMsgDispatcher(MESSAGEFILE, [mwworker, plainworker, pdfworker])</span></code></pre> </div> <h4> Portal Page Conversion Recapitulation </h4> <p> With this code part included we can create the final article HTML. We can also view it in the Browser with its QRCode, PDF, license information and audio. Thanks to relative paths everything works in locally viewed HTML file. But to see it as a portal page, we need to setup nginx to perform the include. </p> <p> To trigger this conversion, another git add and git commit sequence is required. This does make sense, since the scenario sees the plain HTML version a base for copy-editing and audio recording. </p> <h3> Idee Website Server Setup </h3> <h4> User and Group git </h4> <p> A user named git is used and the server git repository resides in /home/git/idee.git/. </p> <h4> Create git </h4> <p> The following command creates an empty git repository without working directory (--bare), which is supossed to be shared between multiple users (--share=group). </p> <div class="sourceCode"> <pre class="sourceCode bash"><code class="sourceCode bash"><span><a aria-hidden="true" href="#cb35-1"></a><span class="ex">git@sol</span>:~$ git init --bare --share=group idee.git</span></code></pre> </div> <p> Git initializes the folders with a sticky group permission flag, which inherits down the directory tree. </p> <h4> Push from client git </h4> <p> Since I started without a server git, I need to connect my client git with the server. I did this by changing the conf file the clients .git/ directory, providing information about the remote "origin". </p> <h5> .git/conf </h5> <div class="sourceCode"> <pre class="sourceCode ini"><code class="sourceCode ini"><span><a aria-hidden="true" href="#cb36-1"></a><span class="kw">[core]</span></span> <span><a aria-hidden="true" href="#cb36-2"></a><span class="dt"> repositoryformatversion </span><span class="ot">=</span><span class="st"> </span><span class="dv">0</span></span> <span><a aria-hidden="true" href="#cb36-3"></a><span class="dt"> filemode </span><span class="ot">=</span><span class="st"> </span><span class="kw">true</span></span> <span><a aria-hidden="true" href="#cb36-4"></a><span class="dt"> bare </span><span class="ot">=</span><span class="st"> </span><span class="kw">false</span></span> <span><a aria-hidden="true" href="#cb36-5"></a><span class="dt"> logallrefupdates </span><span class="ot">=</span><span class="st"> </span><span class="kw">true</span></span> <span><a aria-hidden="true" href="#cb36-6"></a><span class="dt"> hooksPath </span><span class="ot">=</span><span class="st"> ./config/hooks</span></span> <span><a aria-hidden="true" href="#cb36-7"></a><span class="dt"> quotepath </span><span class="ot">=</span><span class="st"> </span><span class="kw">off</span></span> <span><a aria-hidden="true" href="#cb36-8"></a><span class="kw">[remote "origin"]</span></span> <span><a aria-hidden="true" href="#cb36-9"></a><span class="dt"> url </span><span class="ot">=</span><span class="st"> ssh://git@sol/home/git/idee.git</span></span> <span><a aria-hidden="true" href="#cb36-10"></a><span class="dt"> fetch </span><span class="ot">=</span><span class="st"> +refs/heads/*:refs/remotes/origin/*</span></span> <span><a aria-hidden="true" href="#cb36-11"></a><span class="kw">[branch "master"]</span></span> <span><a aria-hidden="true" href="#cb36-12"></a><span class="dt"> remote </span><span class="ot">=</span><span class="st"> origin</span></span> <span><a aria-hidden="true" href="#cb36-13"></a><span class="dt"> merge </span><span class="ot">=</span><span class="st"> refs/heads/master</span></span> <span><a aria-hidden="true" href="#cb36-14"></a><span class="kw">[commit]</span></span> <span><a aria-hidden="true" href="#cb36-15"></a><span class="dt"> template </span><span class="ot">=</span><span class="st"> ./config/commit-message</span></span> <span><a aria-hidden="true" href="#cb36-16"></a><span class="kw">[status]</span></span> <span><a aria-hidden="true" href="#cb36-17"></a><span class="dt"> relativePaths </span><span class="ot">=</span><span class="st"> </span><span class="kw">false</span></span></code></pre> </div> <p> As can be seen, I use the user git for ssh access. </p> <h4> initial push </h4> <div class="sourceCode"> <pre class="sourceCode bash"><code class="sourceCode bash"><span><a aria-hidden="true" href="#cb37-1"></a><span class="ex">frank</span> @Asimov:~/projects/idee$ git push</span> <span><a aria-hidden="true" href="#cb37-2"></a><span class="ex">Enter</span> passphrase for key <span class="st">'/home/frank/.ssh/id_rsa'</span>: </span> <span><a aria-hidden="true" href="#cb37-3"></a><span class="ex">Enumerating</span> objects: 1156, done.</span> <span><a aria-hidden="true" href="#cb37-4"></a><span class="ex">Counting</span> objects: 100% (1156/1156), <span class="kw">done</span><span class="ex">.</span></span> <span><a aria-hidden="true" href="#cb37-5"></a><span class="ex">Delta</span> compression using up to 4 threads</span> <span><a aria-hidden="true" href="#cb37-6"></a><span class="ex">Compressing</span> objects: 100% (496/496), <span class="kw">done</span><span class="ex">.</span></span> <span><a aria-hidden="true" href="#cb37-7"></a><span class="ex">Writing</span> objects: 100% (1156/1156), <span class="ex">28.32</span> MiB <span class="kw">|</span> <span class="ex">7.89</span> MiB/s, done.</span> <span><a aria-hidden="true" href="#cb37-8"></a><span class="ex">Total</span> 1156 (delta 675), <span class="ex">reused</span> 1064 (delta 616), <span class="ex">pack-reused</span> 0</span> <span><a aria-hidden="true" href="#cb37-9"></a><span class="ex">remote</span>: Resolving deltas: 100% (675/675), <span class="kw">done</span><span class="ex">.</span></span> <span><a aria-hidden="true" href="#cb37-10"></a><span class="ex">To</span> ssh://sol/home/git/idee.git</span> <span><a aria-hidden="true" href="#cb37-11"></a> <span class="ex">*</span> [new branch] master -<span class="op">&gt;</span> master</span></code></pre> </div> <h4> /home/git/idee.git/hooks/post-receive </h4> <div class="sourceCode"> <pre class="sourceCode bash"><code class="sourceCode bash"><span><a aria-hidden="true" href="#cb38-1"></a><span class="co">#!/bin/bash</span></span> <span><a aria-hidden="true" href="#cb38-2"></a><span class="co">#</span></span> <span><a aria-hidden="true" href="#cb38-3"></a><span class="co"># The hook "post-receive" takes care for the</span></span> <span><a aria-hidden="true" href="#cb38-4"></a><span class="co"># deployment after all pushed files where</span></span> <span><a aria-hidden="true" href="#cb38-5"></a><span class="co"># successfully stored.</span></span> <span><a aria-hidden="true" href="#cb38-6"></a><span class="co">#</span></span> <span><a aria-hidden="true" href="#cb38-7"></a><span class="co"># The deployment is implemented as pull </span></span> <span><a aria-hidden="true" href="#cb38-8"></a><span class="co"># from a client git on the servers wwww folder.</span></span> <span><a aria-hidden="true" href="#cb38-9"></a></span> <span><a aria-hidden="true" href="#cb38-10"></a><span class="co"># prevent message: "fatal: Not a git repository: '.'"</span></span> <span><a aria-hidden="true" href="#cb38-11"></a><span class="bu">unset</span> <span class="va">$(</span><span class="fu">git</span> rev-parse --local-env-vars<span class="va">)</span></span> <span><a aria-hidden="true" href="#cb38-12"></a></span> <span><a aria-hidden="true" href="#cb38-13"></a><span class="bu">cd</span> /var/www/idee/</span> <span><a aria-hidden="true" href="#cb38-14"></a><span class="fu">git</span> pull</span></code></pre> </div> <p> I found the solution for the error message at "Git Hook Pull After Push - remote: fatal: Not a git repository: '.' · Joe Januszkiewicz" <sup> ( 19 ) </sup> </p> <h4> /var/www/idee </h4> <p> I create the server side client git also as shared git, making sure that www-data will have sufficient rights to read everything as member of the group git. </p> <div class="sourceCode"> <pre class="sourceCode bash"><code class="sourceCode bash"><span><a aria-hidden="true" href="#cb39-1"></a><span class="ex">git@sol</span>:/var/www$ git init --share=group idee</span> <span><a aria-hidden="true" href="#cb39-2"></a><span class="ex">Initialized</span> empty shared Git repository in /mnt/data/www/idee/.git/</span> <span><a aria-hidden="true" href="#cb39-3"></a><span class="ex">git@sol</span>:/var/www/idee/.git$ git remote add origin /home/git/idee.git</span></code></pre> </div> <p> The branch master was set in the ini file by text editor. </p> <div class="sourceCode"> <pre class="sourceCode ini"><code class="sourceCode ini"><span><a aria-hidden="true" href="#cb40-1"></a><span class="kw">[core]</span></span> <span><a aria-hidden="true" href="#cb40-2"></a><span class="dt"> repositoryformatversion </span><span class="ot">=</span><span class="st"> </span><span class="dv">0</span></span> <span><a aria-hidden="true" href="#cb40-3"></a><span class="dt"> filemode </span><span class="ot">=</span><span class="st"> </span><span class="kw">true</span></span> <span><a aria-hidden="true" href="#cb40-4"></a><span class="dt"> bare </span><span class="ot">=</span><span class="st"> </span><span class="kw">false</span></span> <span><a aria-hidden="true" href="#cb40-5"></a><span class="dt"> logallrefupdates </span><span class="ot">=</span><span class="st"> </span><span class="kw">true</span></span> <span><a aria-hidden="true" href="#cb40-6"></a><span class="dt"> sharedrepository </span><span class="ot">=</span><span class="st"> </span><span class="dv">1</span></span> <span><a aria-hidden="true" href="#cb40-7"></a><span class="kw">[receive]</span></span> <span><a aria-hidden="true" href="#cb40-8"></a><span class="dt"> denyNonFastforwards </span><span class="ot">=</span><span class="st"> </span><span class="kw">true</span></span> <span><a aria-hidden="true" href="#cb40-9"></a><span class="kw">[remote "origin"]</span></span> <span><a aria-hidden="true" href="#cb40-10"></a><span class="dt"> url </span><span class="ot">=</span><span class="st"> /home/git/idee.git</span></span> <span><a aria-hidden="true" href="#cb40-11"></a><span class="dt"> fetch </span><span class="ot">=</span><span class="st"> +refs/heads/*:refs/remotes/origin/*</span></span> <span><a aria-hidden="true" href="#cb40-12"></a><span class="kw">[branch "master"]</span></span> <span><a aria-hidden="true" href="#cb40-13"></a><span class="dt"> remote </span><span class="ot">=</span><span class="st"> origin</span></span> <span><a aria-hidden="true" href="#cb40-14"></a><span class="dt"> merge </span><span class="ot">=</span><span class="st"> refs/heads/master</span></span></code></pre> </div> <h4> Testing the pull </h4> <div class="sourceCode"> <pre class="sourceCode bash"><code class="sourceCode bash"><span><a aria-hidden="true" href="#cb41-1"></a><span class="ex">git@sol</span>:/var/www/idee$ git pull</span> <span><a aria-hidden="true" href="#cb41-2"></a><span class="ex">git@sol</span>:/var/www/idee$ ls -la</span> <span><a aria-hidden="true" href="#cb41-3"></a><span class="ex">total</span> 32</span> <span><a aria-hidden="true" href="#cb41-4"></a><span class="ex">drwxrwxr-x</span> 8 www-data www-data 4096 Feb 3 20:11 .</span> <span><a aria-hidden="true" href="#cb41-5"></a><span class="ex">drwxr-xr-x</span> 10 root root 4096 Jan 12 23:33 ..</span> <span><a aria-hidden="true" href="#cb41-6"></a><span class="ex">drwxr-xr-x</span> 2 git git 4096 Feb 3 20:11 author</span> <span><a aria-hidden="true" href="#cb41-7"></a><span class="ex">drwxr-xr-x</span> 3 git git 4096 Feb 3 20:11 config</span> <span><a aria-hidden="true" href="#cb41-8"></a><span class="ex">drwxr-xr-x</span> 2 git git 4096 Feb 3 20:11 generator</span> <span><a aria-hidden="true" href="#cb41-9"></a><span class="ex">drwxrwsr-x</span> 8 git git 4096 Feb 3 20:11 .git</span> <span><a aria-hidden="true" href="#cb41-10"></a><span class="ex">drwxr-xr-x</span> 2 git git 4096 Feb 3 20:11 plain</span> <span><a aria-hidden="true" href="#cb41-11"></a><span class="ex">drwxr-xr-x</span> 11 git git 4096 Feb 3 20:11 website</span></code></pre> </div> <p> Since the connection runs via the same user and the remote location is in reality local, no password is asked and we need not setup anything to feed something into a password request. </p> <h4> Providing www-data with group permission </h4> <div class="sourceCode"> <pre class="sourceCode bash"><code class="sourceCode bash"><span><a aria-hidden="true" href="#cb42-1"></a><span class="ex">root</span> @sol:/home/git/idee.git/hooks# adduser www-data git</span> <span><a aria-hidden="true" href="#cb42-2"></a><span class="ex">Adding</span> user <span class="kw">`</span><span class="ex">www-data</span><span class="st">' to group `git'</span> ...</span> <span><a aria-hidden="true" href="#cb42-3"></a><span class="ex">Adding</span> user www-data to group git</span> <span><a aria-hidden="true" href="#cb42-4"></a><span class="ex">Done.</span></span></code></pre> </div> <h4> Creating a nginx site </h4> <p> The following server definition for nginx uses http instead of https. That's not a problem, it is for testing and migration only in the local network. </p> <p> <strong> /etc/nginx/sites-available/idee_88 </strong> </p> <pre class="nginx"><code># Idee Server Configuration # server { listen 88 default_server; listen [::]:88 default_server; disable_symlinks off; root /var/www/idee/website; # Add index.php to the list if you are using PHP index index.html index.htm index.nginx-debian.html; server_name _; location / { # First attempt to serve request as file, then # as directory, then fall back to displaying a 404. try_files $uri %uri.html $uri/ =404; } location /yacysearch.html { set $myquery ''; set $other ''; if ($args ~* query=([^&amp;]*)(.*)){ set $myquery $1; set $other $2; } if ($myquery !~* (site(%3a|:)idee\.frank-siebert\.de)) { set $args query=$myquery+site:idee.frank-siebert.de$other; } proxy_pass https://yacy.frank-siebert.de/yacysearch.html; } }</code></pre> <p> This configuration does also the heavy lifting for the yacy search integration. The main effort was the part, which enforces that the site: filter is passed on to YaCy, restricting search results to my own web page. </p> <h5> Enabling the new site </h5> <div class="sourceCode"> <pre class="sourceCode bash"><code class="sourceCode bash"><span><a aria-hidden="true" href="#cb44-1"></a><span class="ex">root</span> @sol:/etc/nginx/sites-enabled# ln -s ../sites-available/idee_88 .</span> <span><a aria-hidden="true" href="#cb44-2"></a><span class="ex">root</span> @sol:/etc/nginx/sites-enabled# nginx -t</span> <span><a aria-hidden="true" href="#cb44-3"></a><span class="ex">nginx</span>: the configuration file /etc/nginx/nginx.conf syntax is ok</span> <span><a aria-hidden="true" href="#cb44-4"></a><span class="ex">nginx</span>: configuration file /etc/nginx/nginx.conf test is successful</span> <span><a aria-hidden="true" href="#cb44-5"></a><span class="ex">root</span> @sol:/etc/nginx/sites-enabled# nginx -s reload</span></code></pre> </div> <h5> Test-URL </h5> <p> <a href="http://sol:88/article/verstehen.html"> http://sol:88/article/verstehen.html </a> </p> <p> The server works, the YaCy search works as well, but naturally the links still point to the wordpress instance. A redirect from the old to the new URL pattern is required, and the migration of the content is still pending. </p> <p> But sitemap, rss and index page are the next most important parts to be implemented. </p> <h4> Test run on this article </h4> <p> Doing a test run this article, while it is obviously still work in progress, reveals that it renders nicely, even the source code sections are very pretty, without investing time to make them look nice. </p> <h5> Every source code line is a reference </h5> <p> That is really nice for a number of use cases. </p> <p> <em> TODO: I have to take care, that these source code references do not get a tabindex each, or blind people will start to hate me. </em> </p> <h5> Source code in the PDF </h5> <p> Source code in the PDF gets colored very nicely. DONE: I have to take care that the source code does not flow out of the page. </p> <p> After refactoring the program <em> export.py </em> , where I took care to restrict the code to 80 characters per line, the PDF print of this program stays inside the page borders. </p> <h3> Sitemap Implementation </h3> <p> The "Sitemaps XML format " <sup> ( 20 ) </sup> description explains the concept and the XML document structure of sitemaps. It seems to be quite simple, if I just write down some code to create the respective xml-elements and to persist the document afterwards. </p> <p> Sometimes knowledge makes everything a bit more complicated. I know that I should validate the resulting XML against its schema, and, committed to high quality, I started to dig into question, how this validation has to be set up on a Linux system to work, lets say, first of all in vim. </p> <p> This theme turns out to be quite complex, and it is independent enough to get its own article: . </p> <p> Fortunately the setup done for the validation in vim will provide also everything required for a validation without vim. </p> <h4> Python3-lxml </h4> <p> The module lxml is required to create xml via beautiful soup. </p> <div class="sourceCode"> <pre class="sourceCode bash"><code class="sourceCode bash"><span><a aria-hidden="true" href="#cb45-1"></a><span class="ex">frank</span> @Asimov:~$ sudo apt-get install python3-lxml</span> <span><a aria-hidden="true" href="#cb45-2"></a><span class="ex">Reading</span> package lists... Done</span> <span><a aria-hidden="true" href="#cb45-3"></a><span class="ex">Building</span> dependency tree... Done</span> <span><a aria-hidden="true" href="#cb45-4"></a><span class="ex">Reading</span> state information... Done</span> <span><a aria-hidden="true" href="#cb45-5"></a><span class="ex">python3-lxml</span> is already the newest version (4.6.3+dfsg-0.1+deb11u1)<span class="ex">.</span></span> <span><a aria-hidden="true" href="#cb45-6"></a><span class="ex">python3-lxml</span> set to manually installed.</span> <span><a aria-hidden="true" href="#cb45-7"></a><span class="ex">0</span> upgraded, 0 newly installed, 0 to remove and 1 not upgraded.</span></code></pre> </div> <h4> Requirements Draft </h4> <p> Sitemaps will be split into monthly maps. Content will be listed in the month of its first publishing. If content from earlier months need an update (e.g. when I migrate the content) the respective older sitemaps are updated accordingly. </p> <p> I'll not implement the hreflang link stuff, since I do not expect much overlap between English and German content. However, since I plan to use two different site names, "Concept" in English, "Idee" in German, I think I should have two different sitemap trees. </p> <p> My sitemap-tree will start with one sitemap.xml, referencing idee-mao.xml and concept-map.xml, referencing down to idee-yyyy-MM.xml and concept-yyyy-MM.xml files. Since the sitemap specification does not provide itself a language information, Google my figure out itself the page languages by content. </p> <p> Since the content I provide on my German site is anyhow heavily suppressed by Google, I do not really care to optimize much to ease Googles live. </p> <h4> Solution Specification </h4> <p> Every sitemap update involves 3 sitemap files, the monthly file, the site file and the top file. The information about the required sitemap changes are found in PubMetaData.instance._updates and PubMetaData.instance._deletions. </p> <div class="sourceCode"> <pre class="sourceCode xml"><code class="sourceCode xml"><span><a aria-hidden="true" href="#cb46-1"></a><span class="kw">&lt;?xml</span> version="1.0" encoding="UTF-8"<span class="kw">?&gt;</span></span> <span><a aria-hidden="true" href="#cb46-2"></a><span class="kw">&lt;sitemapindex</span><span class="ot"> xmlns=</span><span class="st">"http://www.sitemaps.org/schemas/sitemap/0.9"</span><span class="kw">&gt;</span></span> <span><a aria-hidden="true" href="#cb46-3"></a> <span class="kw">&lt;sitemap&gt;</span></span> <span><a aria-hidden="true" href="#cb46-4"></a> <span class="kw">&lt;loc&gt;</span>https://idee.frank-siebert.de/idee-map.xml<span class="kw">&lt;/loc&gt;</span></span> <span><a aria-hidden="true" href="#cb46-5"></a> <span class="kw">&lt;lastmod&gt;</span>2021-03-31T18:23:17+00:00<span class="kw">&lt;/lastmod&gt;</span></span> <span><a aria-hidden="true" href="#cb46-6"></a> <span class="kw">&lt;/sitemap&gt;</span></span> <span><a aria-hidden="true" href="#cb46-7"></a> <span class="kw">&lt;sitemap&gt;</span></span> <span><a aria-hidden="true" href="#cb46-8"></a> <span class="kw">&lt;loc&gt;</span>https://idee.frank-siebert.de/concept-map.xml<span class="kw">&lt;/loc&gt;</span></span> <span><a aria-hidden="true" href="#cb46-9"></a> <span class="kw">&lt;lastmod&gt;</span>2005-01-01<span class="kw">&lt;/lastmod&gt;</span></span> <span><a aria-hidden="true" href="#cb46-10"></a> <span class="kw">&lt;/sitemap&gt;</span></span> <span><a aria-hidden="true" href="#cb46-11"></a><span class="kw">&lt;/sitemapindex&gt;</span></span></code></pre> </div> <p> The modification of sitemaps start at the leaves of the sitemap tree, which is easily possible since the respective map can be found by the article:modified_time information, and the og:site_name information. og:site_name is either "idee" or "concept". </p> <p> The monthly sitemaps are stored in a dedicated folder named sitemaps to keep the root directory clean. </p> <p> A class SiteMap applies all the changes. The timestamps for all changes done in the 3 top-level sitemaps during one publishing commit will always be the same. </p> <p> The sitemap.xml, idee-map.xml and concept-map.xml are created in the root directory and pre-created via text editor to provide the general structure. </p> <p> A monthly-map.xml template is created in text editor and provided in in the portal folder next to other already existing templates. It simply contains the top level element and the xmlns information. </p> <h4> Implementation Result </h4> <p> <strong> ~/project/idee/generator/sitemap.py </strong> </p> <div class="sourceCode"> <pre class="sourceCode Python"><code class="sourceCode python"><span><a aria-hidden="true" href="#cb47-1"></a><span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb47-2"></a><span class="co">Update the sitemap of the webseite.</span></span> <span><a aria-hidden="true" href="#cb47-3"></a></span> <span><a aria-hidden="true" href="#cb47-4"></a><span class="co">@author: Frank Siebert</span></span> <span><a aria-hidden="true" href="#cb47-5"></a><span class="co">@license: https://creativecommons.org/publicdomain/zero/1.0/deed.en</span></span> <span><a aria-hidden="true" href="#cb47-6"></a><span class="co">@date: 2022-03-15</span></span> <span><a aria-hidden="true" href="#cb47-7"></a></span> <span><a aria-hidden="true" href="#cb47-8"></a><span class="co">@author: Frank Siebert</span></span> <span><a aria-hidden="true" href="#cb47-9"></a><span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb47-10"></a><span class="im">import</span> re</span> <span><a aria-hidden="true" href="#cb47-11"></a><span class="im">import</span> datetime</span> <span><a aria-hidden="true" href="#cb47-12"></a><span class="im">from</span> pubmetadata <span class="im">import</span> PubMetaData</span> <span><a aria-hidden="true" href="#cb47-13"></a><span class="im">from</span> gitmsgconstants <span class="im">import</span> GitMsgConstants <span class="im">as</span> gmc</span> <span><a aria-hidden="true" href="#cb47-14"></a></span> <span><a aria-hidden="true" href="#cb47-15"></a><span class="im">from</span> bs4 <span class="im">import</span> BeautifulSoup</span> <span><a aria-hidden="true" href="#cb47-16"></a><span class="im">from</span> bs4.builder._lxml <span class="im">import</span> LXMLTreeBuilderForXML</span> <span><a aria-hidden="true" href="#cb47-17"></a></span> <span><a aria-hidden="true" href="#cb47-18"></a>URLSET_TAG <span class="op">=</span> <span class="st">"urlset"</span></span> <span><a aria-hidden="true" href="#cb47-19"></a>URL_TAG <span class="op">=</span> <span class="st">"url"</span></span> <span><a aria-hidden="true" href="#cb47-20"></a>LOC_TAG <span class="op">=</span> <span class="st">"loc"</span></span> <span><a aria-hidden="true" href="#cb47-21"></a>LASTMOD_TAG <span class="op">=</span> <span class="st">"lastmod"</span></span> <span><a aria-hidden="true" href="#cb47-22"></a><span class="co"># Not used:</span></span> <span><a aria-hidden="true" href="#cb47-23"></a><span class="co"># CHANGEFREQ_TAG = "changefreq"</span></span> <span><a aria-hidden="true" href="#cb47-24"></a><span class="co"># PRIORITY_TAG = "priority"</span></span> <span><a aria-hidden="true" href="#cb47-25"></a></span> <span><a aria-hidden="true" href="#cb47-26"></a>INDEX_TAG <span class="op">=</span> <span class="st">"sitemapindex"</span></span> <span><a aria-hidden="true" href="#cb47-27"></a>SIDEMAP_TAG <span class="op">=</span> <span class="st">"sitemap"</span></span> <span><a aria-hidden="true" href="#cb47-28"></a></span> <span><a aria-hidden="true" href="#cb47-29"></a></span> <span><a aria-hidden="true" href="#cb47-30"></a><span class="kw">class</span> SiteMap():</span> <span><a aria-hidden="true" href="#cb47-31"></a> <span class="co">"""Manage all changees in the sitemaps."""</span></span> <span><a aria-hidden="true" href="#cb47-32"></a></span> <span><a aria-hidden="true" href="#cb47-33"></a> <span class="kw">def</span> <span class="fu">__init__</span>(<span class="va">self</span>):</span> <span><a aria-hidden="true" href="#cb47-34"></a> <span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb47-35"></a><span class="co"> Initialize changelists.</span></span> <span><a aria-hidden="true" href="#cb47-36"></a></span> <span><a aria-hidden="true" href="#cb47-37"></a><span class="co"> Returns</span></span> <span><a aria-hidden="true" href="#cb47-38"></a><span class="co"> -------</span></span> <span><a aria-hidden="true" href="#cb47-39"></a><span class="co"> None.</span></span> <span><a aria-hidden="true" href="#cb47-40"></a></span> <span><a aria-hidden="true" href="#cb47-41"></a><span class="co"> """</span></span> <span><a aria-hidden="true" href="#cb47-42"></a> <span class="co"># map information for German page changes on site "Idee".</span></span> <span><a aria-hidden="true" href="#cb47-43"></a> <span class="va">self</span>.de_list <span class="op">=</span> []</span> <span><a aria-hidden="true" href="#cb47-44"></a> <span class="co"># map information for English page changes on site "Concept".</span></span> <span><a aria-hidden="true" href="#cb47-45"></a> <span class="va">self</span>.en_list <span class="op">=</span> []</span> <span><a aria-hidden="true" href="#cb47-46"></a> <span class="co"># The time of the update</span></span> <span><a aria-hidden="true" href="#cb47-47"></a> <span class="va">self</span>._nowdate <span class="op">=</span> datetime.datetime.now().isoformat()</span> <span><a aria-hidden="true" href="#cb47-48"></a></span> <span><a aria-hidden="true" href="#cb47-49"></a> <span class="kw">def</span> update(<span class="va">self</span>):</span> <span><a aria-hidden="true" href="#cb47-50"></a> <span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb47-51"></a><span class="co"> Iterate over changes and update respective sitemaps.</span></span> <span><a aria-hidden="true" href="#cb47-52"></a></span> <span><a aria-hidden="true" href="#cb47-53"></a><span class="co"> Add the respective sitemaps to their respective change list.</span></span> <span><a aria-hidden="true" href="#cb47-54"></a><span class="co"> The information about the changed html pages comes from</span></span> <span><a aria-hidden="true" href="#cb47-55"></a><span class="co"> PubMetaData.instance._updates and</span></span> <span><a aria-hidden="true" href="#cb47-56"></a><span class="co"> PubMetaData.instance._deletions .</span></span> <span><a aria-hidden="true" href="#cb47-57"></a></span> <span><a aria-hidden="true" href="#cb47-58"></a><span class="co"> Returns</span></span> <span><a aria-hidden="true" href="#cb47-59"></a><span class="co"> -------</span></span> <span><a aria-hidden="true" href="#cb47-60"></a><span class="co"> None.</span></span> <span><a aria-hidden="true" href="#cb47-61"></a></span> <span><a aria-hidden="true" href="#cb47-62"></a><span class="co"> """</span></span> <span><a aria-hidden="true" href="#cb47-63"></a> <span class="cf">for</span> article_data <span class="kw">in</span> PubMetaData.instance._updates:</span> <span><a aria-hidden="true" href="#cb47-64"></a> creation_month <span class="op">=</span> article_data[PubMetaData.pubdate][<span class="dv">0</span>:<span class="dv">7</span>]</span> <span><a aria-hidden="true" href="#cb47-65"></a> site <span class="op">=</span> article_data[PubMetaData.site]</span> <span><a aria-hidden="true" href="#cb47-66"></a> sitemap_path <span class="op">=</span> site.lower() <span class="op">+</span> <span class="st">"-"</span> <span class="op">+</span> creation_month <span class="op">+</span> <span class="st">".xml"</span></span> <span><a aria-hidden="true" href="#cb47-67"></a> sitemap_path <span class="op">=</span> gmc.sitemappath <span class="op">/</span> sitemap_path</span> <span><a aria-hidden="true" href="#cb47-68"></a></span> <span><a aria-hidden="true" href="#cb47-69"></a> <span class="cf">if</span> site <span class="op">==</span> <span class="st">"Idee"</span>:</span> <span><a aria-hidden="true" href="#cb47-70"></a> <span class="cf">if</span> sitemap_path <span class="kw">not</span> <span class="kw">in</span> <span class="va">self</span>.de_list:</span> <span><a aria-hidden="true" href="#cb47-71"></a> <span class="va">self</span>.de_list.append(sitemap_path)</span> <span><a aria-hidden="true" href="#cb47-72"></a> <span class="cf">else</span>:</span> <span><a aria-hidden="true" href="#cb47-73"></a> <span class="cf">if</span> sitemap_path <span class="kw">not</span> <span class="kw">in</span> <span class="va">self</span>.en_list:</span> <span><a aria-hidden="true" href="#cb47-74"></a> <span class="va">self</span>.en_list.append(sitemap_path)</span> <span><a aria-hidden="true" href="#cb47-75"></a></span> <span><a aria-hidden="true" href="#cb47-76"></a> <span class="cf">if</span> article_data.name <span class="op">!=</span> <span class="st">"rechtliches"</span> \</span> <span><a aria-hidden="true" href="#cb47-77"></a> <span class="kw">and</span> article_data.name <span class="op">!=</span> <span class="st">"legal"</span>:</span> <span><a aria-hidden="true" href="#cb47-78"></a> <span class="va">self</span>._update(sitemap_path, article_data)</span> <span><a aria-hidden="true" href="#cb47-79"></a></span> <span><a aria-hidden="true" href="#cb47-80"></a> <span class="cf">for</span> article_data <span class="kw">in</span> PubMetaData.instance._deletions:</span> <span><a aria-hidden="true" href="#cb47-81"></a> <span class="co"># </span><span class="al">TODO</span></span> <span><a aria-hidden="true" href="#cb47-82"></a> <span class="cf">pass</span></span> <span><a aria-hidden="true" href="#cb47-83"></a></span> <span><a aria-hidden="true" href="#cb47-84"></a> <span class="va">self</span>._update_de()</span> <span><a aria-hidden="true" href="#cb47-85"></a> <span class="va">self</span>._update_en()</span> <span><a aria-hidden="true" href="#cb47-86"></a> <span class="va">self</span>._update_main()</span> <span><a aria-hidden="true" href="#cb47-87"></a></span> <span><a aria-hidden="true" href="#cb47-88"></a> <span class="kw">def</span> _update_de(<span class="va">self</span>):</span> <span><a aria-hidden="true" href="#cb47-89"></a> <span class="co">"""Update idee-map.xml."""</span></span> <span><a aria-hidden="true" href="#cb47-90"></a> <span class="cf">if</span> <span class="bu">len</span>(<span class="va">self</span>.de_list) <span class="op">==</span> <span class="dv">0</span>:</span> <span><a aria-hidden="true" href="#cb47-91"></a> <span class="cf">return</span></span> <span><a aria-hidden="true" href="#cb47-92"></a></span> <span><a aria-hidden="true" href="#cb47-93"></a> <span class="cf">with</span> <span class="bu">open</span>(gmc.idee_map, <span class="st">'r'</span>) <span class="im">as</span> sitemap_file:</span> <span><a aria-hidden="true" href="#cb47-94"></a> xml_doc <span class="op">=</span> sitemap_file.read()</span> <span><a aria-hidden="true" href="#cb47-95"></a> sitemap_file.flush()</span> <span><a aria-hidden="true" href="#cb47-96"></a> sitemap_file.close()</span> <span><a aria-hidden="true" href="#cb47-97"></a></span> <span><a aria-hidden="true" href="#cb47-98"></a> builder <span class="op">=</span> LXMLTreeBuilderForXML</span> <span><a aria-hidden="true" href="#cb47-99"></a> soup <span class="op">=</span> BeautifulSoup(xml_doc, builder<span class="op">=</span>builder, features<span class="op">=</span><span class="st">'xml'</span>)</span> <span><a aria-hidden="true" href="#cb47-100"></a></span> <span><a aria-hidden="true" href="#cb47-101"></a> <span class="cf">for</span> sitemap_path <span class="kw">in</span> <span class="va">self</span>.de_list:</span> <span><a aria-hidden="true" href="#cb47-102"></a> url <span class="op">=</span> gmc.website <span class="op">+</span> <span class="st">"/"</span> <span class="op">+</span> sitemap_path.name</span> <span><a aria-hidden="true" href="#cb47-103"></a> tag <span class="op">=</span> soup.find(LOC_TAG, text<span class="op">=</span>re.<span class="bu">compile</span>(<span class="vs">r""</span> <span class="op">+</span> url))</span> <span><a aria-hidden="true" href="#cb47-104"></a></span> <span><a aria-hidden="true" href="#cb47-105"></a> <span class="cf">if</span> <span class="kw">not</span> tag:</span> <span><a aria-hidden="true" href="#cb47-106"></a> tag <span class="op">=</span> soup.find(INDEX_TAG)</span> <span><a aria-hidden="true" href="#cb47-107"></a> new_tag <span class="op">=</span> soup.new_tag(SIDEMAP_TAG)</span> <span><a aria-hidden="true" href="#cb47-108"></a> tag.append(new_tag)</span> <span><a aria-hidden="true" href="#cb47-109"></a> tag <span class="op">=</span> new_tag</span> <span><a aria-hidden="true" href="#cb47-110"></a> new_tag <span class="op">=</span> soup.new_tag(LOC_TAG)</span> <span><a aria-hidden="true" href="#cb47-111"></a> new_tag.string <span class="op">=</span> url</span> <span><a aria-hidden="true" href="#cb47-112"></a> tag.append(new_tag)</span> <span><a aria-hidden="true" href="#cb47-113"></a> new_tag <span class="op">=</span> soup.new_tag(LASTMOD_TAG)</span> <span><a aria-hidden="true" href="#cb47-114"></a> tag.append(new_tag)</span> <span><a aria-hidden="true" href="#cb47-115"></a> <span class="cf">else</span>:</span> <span><a aria-hidden="true" href="#cb47-116"></a> tag <span class="op">=</span> tag.parent</span> <span><a aria-hidden="true" href="#cb47-117"></a></span> <span><a aria-hidden="true" href="#cb47-118"></a> <span class="co"># tag holds now the correct SIDEMAP_TAG.</span></span> <span><a aria-hidden="true" href="#cb47-119"></a> <span class="co"># Either it had been found or created.</span></span> <span><a aria-hidden="true" href="#cb47-120"></a> <span class="co"># All used child tags exist also.</span></span> <span><a aria-hidden="true" href="#cb47-121"></a></span> <span><a aria-hidden="true" href="#cb47-122"></a> tag <span class="op">=</span> tag.find(LASTMOD_TAG)</span> <span><a aria-hidden="true" href="#cb47-123"></a> tag.string <span class="op">=</span> <span class="va">self</span>._nowdate</span> <span><a aria-hidden="true" href="#cb47-124"></a></span> <span><a aria-hidden="true" href="#cb47-125"></a> xml_doc <span class="op">=</span> soup.prettify()</span> <span><a aria-hidden="true" href="#cb47-126"></a></span> <span><a aria-hidden="true" href="#cb47-127"></a> <span class="cf">with</span> <span class="bu">open</span>(gmc.idee_map, <span class="st">'w'</span>) <span class="im">as</span> sitemap_file:</span> <span><a aria-hidden="true" href="#cb47-128"></a> <span class="bu">print</span>(xml_doc, <span class="bu">file</span><span class="op">=</span>sitemap_file)</span> <span><a aria-hidden="true" href="#cb47-129"></a> sitemap_file.flush()</span> <span><a aria-hidden="true" href="#cb47-130"></a> sitemap_file.close()</span> <span><a aria-hidden="true" href="#cb47-131"></a></span> <span><a aria-hidden="true" href="#cb47-132"></a> <span class="kw">def</span> _update_en(<span class="va">self</span>):</span> <span><a aria-hidden="true" href="#cb47-133"></a> <span class="co">"""Update concept-map.xml."""</span></span> <span><a aria-hidden="true" href="#cb47-134"></a> <span class="cf">if</span> <span class="bu">len</span>(<span class="va">self</span>.en_list) <span class="op">==</span> <span class="dv">0</span>:</span> <span><a aria-hidden="true" href="#cb47-135"></a> <span class="cf">return</span></span> <span><a aria-hidden="true" href="#cb47-136"></a></span> <span><a aria-hidden="true" href="#cb47-137"></a> <span class="cf">with</span> <span class="bu">open</span>(gmc.concept_map, <span class="st">'r'</span>) <span class="im">as</span> sitemap_file:</span> <span><a aria-hidden="true" href="#cb47-138"></a> xml_doc <span class="op">=</span> sitemap_file.read()</span> <span><a aria-hidden="true" href="#cb47-139"></a> sitemap_file.flush()</span> <span><a aria-hidden="true" href="#cb47-140"></a> sitemap_file.close()</span> <span><a aria-hidden="true" href="#cb47-141"></a></span> <span><a aria-hidden="true" href="#cb47-142"></a> builder <span class="op">=</span> LXMLTreeBuilderForXML</span> <span><a aria-hidden="true" href="#cb47-143"></a> soup <span class="op">=</span> BeautifulSoup(xml_doc, builder<span class="op">=</span>builder, features<span class="op">=</span><span class="st">'xml'</span>)</span> <span><a aria-hidden="true" href="#cb47-144"></a></span> <span><a aria-hidden="true" href="#cb47-145"></a> <span class="cf">for</span> sitemap_path <span class="kw">in</span> <span class="va">self</span>.en_list:</span> <span><a aria-hidden="true" href="#cb47-146"></a> url <span class="op">=</span> gmc.website <span class="op">+</span> <span class="st">"/"</span> <span class="op">+</span> sitemap_path.name</span> <span><a aria-hidden="true" href="#cb47-147"></a> tag <span class="op">=</span> soup.find(LOC_TAG, text<span class="op">=</span>re.<span class="bu">compile</span>(<span class="vs">r""</span> <span class="op">+</span> url))</span> <span><a aria-hidden="true" href="#cb47-148"></a></span> <span><a aria-hidden="true" href="#cb47-149"></a> <span class="cf">if</span> <span class="kw">not</span> tag:</span> <span><a aria-hidden="true" href="#cb47-150"></a> tag <span class="op">=</span> soup.find(INDEX_TAG)</span> <span><a aria-hidden="true" href="#cb47-151"></a> new_tag <span class="op">=</span> soup.new_tag(SIDEMAP_TAG)</span> <span><a aria-hidden="true" href="#cb47-152"></a> tag.append(new_tag)</span> <span><a aria-hidden="true" href="#cb47-153"></a> tag <span class="op">=</span> new_tag</span> <span><a aria-hidden="true" href="#cb47-154"></a> new_tag <span class="op">=</span> soup.new_tag(LOC_TAG)</span> <span><a aria-hidden="true" href="#cb47-155"></a> new_tag.string <span class="op">=</span> url</span> <span><a aria-hidden="true" href="#cb47-156"></a> tag.append(new_tag)</span> <span><a aria-hidden="true" href="#cb47-157"></a> new_tag <span class="op">=</span> soup.new_tag(LASTMOD_TAG)</span> <span><a aria-hidden="true" href="#cb47-158"></a> tag.append(new_tag)</span> <span><a aria-hidden="true" href="#cb47-159"></a> <span class="cf">else</span>:</span> <span><a aria-hidden="true" href="#cb47-160"></a> tag <span class="op">=</span> tag.parent</span> <span><a aria-hidden="true" href="#cb47-161"></a></span> <span><a aria-hidden="true" href="#cb47-162"></a> <span class="co"># tag holds now the correct SIDEMAP_TAG.</span></span> <span><a aria-hidden="true" href="#cb47-163"></a> <span class="co"># Either it had been found or created.</span></span> <span><a aria-hidden="true" href="#cb47-164"></a> <span class="co"># All used child tags exist also.</span></span> <span><a aria-hidden="true" href="#cb47-165"></a></span> <span><a aria-hidden="true" href="#cb47-166"></a> tag <span class="op">=</span> tag.find(LASTMOD_TAG)</span> <span><a aria-hidden="true" href="#cb47-167"></a> tag.string <span class="op">=</span> <span class="va">self</span>._nowdate</span> <span><a aria-hidden="true" href="#cb47-168"></a></span> <span><a aria-hidden="true" href="#cb47-169"></a> xml_doc <span class="op">=</span> soup.prettify()</span> <span><a aria-hidden="true" href="#cb47-170"></a></span> <span><a aria-hidden="true" href="#cb47-171"></a> <span class="cf">with</span> <span class="bu">open</span>(gmc.concept_map, <span class="st">'w'</span>) <span class="im">as</span> sitemap_file:</span> <span><a aria-hidden="true" href="#cb47-172"></a> <span class="bu">print</span>(xml_doc, <span class="bu">file</span><span class="op">=</span>sitemap_file)</span> <span><a aria-hidden="true" href="#cb47-173"></a> sitemap_file.flush()</span> <span><a aria-hidden="true" href="#cb47-174"></a> sitemap_file.close()</span> <span><a aria-hidden="true" href="#cb47-175"></a></span> <span><a aria-hidden="true" href="#cb47-176"></a> <span class="kw">def</span> _update_main(<span class="va">self</span>):</span> <span><a aria-hidden="true" href="#cb47-177"></a> <span class="co">"""Update sitemap.xml."""</span></span> <span><a aria-hidden="true" href="#cb47-178"></a> <span class="cf">if</span> <span class="bu">len</span>(<span class="va">self</span>.de_list) <span class="op">==</span> <span class="dv">0</span> <span class="kw">and</span> <span class="bu">len</span>(<span class="va">self</span>.en_list) <span class="op">==</span> <span class="dv">0</span>:</span> <span><a aria-hidden="true" href="#cb47-179"></a> <span class="cf">return</span></span> <span><a aria-hidden="true" href="#cb47-180"></a></span> <span><a aria-hidden="true" href="#cb47-181"></a> <span class="cf">with</span> <span class="bu">open</span>(gmc.sitemap, <span class="st">'r'</span>) <span class="im">as</span> sitemap_file:</span> <span><a aria-hidden="true" href="#cb47-182"></a> xml_doc <span class="op">=</span> sitemap_file.read()</span> <span><a aria-hidden="true" href="#cb47-183"></a> sitemap_file.flush()</span> <span><a aria-hidden="true" href="#cb47-184"></a> sitemap_file.close()</span> <span><a aria-hidden="true" href="#cb47-185"></a></span> <span><a aria-hidden="true" href="#cb47-186"></a> builder <span class="op">=</span> LXMLTreeBuilderForXML</span> <span><a aria-hidden="true" href="#cb47-187"></a> soup <span class="op">=</span> BeautifulSoup(xml_doc, builder<span class="op">=</span>builder, features<span class="op">=</span><span class="st">'xml'</span>)</span> <span><a aria-hidden="true" href="#cb47-188"></a></span> <span><a aria-hidden="true" href="#cb47-189"></a> <span class="cf">if</span> <span class="bu">len</span>(<span class="va">self</span>.de_list) <span class="op">&gt;</span> <span class="dv">0</span>:</span> <span><a aria-hidden="true" href="#cb47-190"></a> url <span class="op">=</span> gmc.website <span class="op">+</span> <span class="st">"/"</span> <span class="op">+</span> gmc.idee_map.name</span> <span><a aria-hidden="true" href="#cb47-191"></a> tag <span class="op">=</span> soup.find(LOC_TAG, text<span class="op">=</span>re.<span class="bu">compile</span>(<span class="vs">r""</span> <span class="op">+</span> url))</span> <span><a aria-hidden="true" href="#cb47-192"></a> <span class="co"># We know in this case, that the tag exists</span></span> <span><a aria-hidden="true" href="#cb47-193"></a> tag <span class="op">=</span> tag.parent</span> <span><a aria-hidden="true" href="#cb47-194"></a> tag <span class="op">=</span> tag.find(LASTMOD_TAG)</span> <span><a aria-hidden="true" href="#cb47-195"></a> tag.string <span class="op">=</span> <span class="va">self</span>._nowdate</span> <span><a aria-hidden="true" href="#cb47-196"></a></span> <span><a aria-hidden="true" href="#cb47-197"></a> <span class="cf">if</span> <span class="bu">len</span>(<span class="va">self</span>.en_list) <span class="op">&gt;</span> <span class="dv">0</span>:</span> <span><a aria-hidden="true" href="#cb47-198"></a> url <span class="op">=</span> gmc.website <span class="op">+</span> <span class="st">"/"</span> <span class="op">+</span> gmc.concept_map.name</span> <span><a aria-hidden="true" href="#cb47-199"></a> tag <span class="op">=</span> soup.find(LOC_TAG, text<span class="op">=</span>re.<span class="bu">compile</span>(<span class="vs">r""</span> <span class="op">+</span> url))</span> <span><a aria-hidden="true" href="#cb47-200"></a> <span class="co"># We know in this case, that the tag exists</span></span> <span><a aria-hidden="true" href="#cb47-201"></a> tag <span class="op">=</span> tag.parent</span> <span><a aria-hidden="true" href="#cb47-202"></a> tag <span class="op">=</span> tag.find(LASTMOD_TAG)</span> <span><a aria-hidden="true" href="#cb47-203"></a> tag.string <span class="op">=</span> <span class="va">self</span>._nowdate</span> <span><a aria-hidden="true" href="#cb47-204"></a></span> <span><a aria-hidden="true" href="#cb47-205"></a> xml_doc <span class="op">=</span> soup.prettify()</span> <span><a aria-hidden="true" href="#cb47-206"></a></span> <span><a aria-hidden="true" href="#cb47-207"></a> <span class="cf">with</span> <span class="bu">open</span>(gmc.sitemap, <span class="st">'w'</span>) <span class="im">as</span> sitemap_file:</span> <span><a aria-hidden="true" href="#cb47-208"></a> <span class="bu">print</span>(xml_doc, <span class="bu">file</span><span class="op">=</span>sitemap_file)</span> <span><a aria-hidden="true" href="#cb47-209"></a> sitemap_file.flush()</span> <span><a aria-hidden="true" href="#cb47-210"></a> sitemap_file.close()</span> <span><a aria-hidden="true" href="#cb47-211"></a></span> <span><a aria-hidden="true" href="#cb47-212"></a> <span class="at">@staticmethod</span></span> <span><a aria-hidden="true" href="#cb47-213"></a> <span class="kw">def</span> _update(sitemap_path, article_data):</span> <span><a aria-hidden="true" href="#cb47-214"></a> sitemap_path.resolve()</span> <span><a aria-hidden="true" href="#cb47-215"></a> <span class="cf">if</span> sitemap_path.exists():</span> <span><a aria-hidden="true" href="#cb47-216"></a> <span class="cf">with</span> <span class="bu">open</span>(sitemap_path, <span class="st">'r'</span>) <span class="im">as</span> sitemap_file:</span> <span><a aria-hidden="true" href="#cb47-217"></a> xml_doc <span class="op">=</span> sitemap_file.read()</span> <span><a aria-hidden="true" href="#cb47-218"></a> sitemap_file.flush()</span> <span><a aria-hidden="true" href="#cb47-219"></a> sitemap_file.close()</span> <span><a aria-hidden="true" href="#cb47-220"></a> <span class="cf">else</span>:</span> <span><a aria-hidden="true" href="#cb47-221"></a> gmc.map_template.resolve()</span> <span><a aria-hidden="true" href="#cb47-222"></a> <span class="cf">with</span> <span class="bu">open</span>(gmc.map_template, <span class="st">'r'</span>) <span class="im">as</span> sitemap_file:</span> <span><a aria-hidden="true" href="#cb47-223"></a> xml_doc <span class="op">=</span> sitemap_file.read()</span> <span><a aria-hidden="true" href="#cb47-224"></a> sitemap_file.flush()</span> <span><a aria-hidden="true" href="#cb47-225"></a> sitemap_file.close()</span> <span><a aria-hidden="true" href="#cb47-226"></a></span> <span><a aria-hidden="true" href="#cb47-227"></a> builder <span class="op">=</span> LXMLTreeBuilderForXML</span> <span><a aria-hidden="true" href="#cb47-228"></a> soup <span class="op">=</span> BeautifulSoup(xml_doc, builder<span class="op">=</span>builder, features<span class="op">=</span><span class="st">'xml'</span>)</span> <span><a aria-hidden="true" href="#cb47-229"></a></span> <span><a aria-hidden="true" href="#cb47-230"></a> article_url <span class="op">=</span> gmc.website <span class="op">+</span> <span class="st">"/"</span>\</span> <span><a aria-hidden="true" href="#cb47-231"></a> <span class="op">+</span> <span class="st">"article"</span> <span class="op">+</span> <span class="st">"/"</span> <span class="op">+</span> article_data.name <span class="op">+</span> <span class="st">".html"</span></span> <span><a aria-hidden="true" href="#cb47-232"></a></span> <span><a aria-hidden="true" href="#cb47-233"></a> tag <span class="op">=</span> soup.find(LOC_TAG, text<span class="op">=</span>re.<span class="bu">compile</span>(<span class="vs">r""</span> <span class="op">+</span> article_url))</span> <span><a aria-hidden="true" href="#cb47-234"></a> <span class="cf">if</span> <span class="kw">not</span> tag:</span> <span><a aria-hidden="true" href="#cb47-235"></a> tag <span class="op">=</span> soup.find(URLSET_TAG)</span> <span><a aria-hidden="true" href="#cb47-236"></a> new_tag <span class="op">=</span> soup.new_tag(URL_TAG)</span> <span><a aria-hidden="true" href="#cb47-237"></a> tag.append(new_tag)</span> <span><a aria-hidden="true" href="#cb47-238"></a> tag <span class="op">=</span> new_tag</span> <span><a aria-hidden="true" href="#cb47-239"></a> new_tag <span class="op">=</span> soup.new_tag(LOC_TAG)</span> <span><a aria-hidden="true" href="#cb47-240"></a> new_tag.string <span class="op">=</span> article_url</span> <span><a aria-hidden="true" href="#cb47-241"></a> tag.append(new_tag)</span> <span><a aria-hidden="true" href="#cb47-242"></a> new_tag <span class="op">=</span> soup.new_tag(LASTMOD_TAG)</span> <span><a aria-hidden="true" href="#cb47-243"></a> tag.append(new_tag)</span> <span><a aria-hidden="true" href="#cb47-244"></a> <span class="cf">else</span>:</span> <span><a aria-hidden="true" href="#cb47-245"></a> tag <span class="op">=</span> tag.parent</span> <span><a aria-hidden="true" href="#cb47-246"></a></span> <span><a aria-hidden="true" href="#cb47-247"></a> <span class="co"># tag holds now the correct URL_TAG.</span></span> <span><a aria-hidden="true" href="#cb47-248"></a> <span class="co"># Either it had been found or created.</span></span> <span><a aria-hidden="true" href="#cb47-249"></a> <span class="co"># All used child tags exist also.</span></span> <span><a aria-hidden="true" href="#cb47-250"></a></span> <span><a aria-hidden="true" href="#cb47-251"></a> tag <span class="op">=</span> tag.find(LASTMOD_TAG)</span> <span><a aria-hidden="true" href="#cb47-252"></a> tag.string <span class="op">=</span> article_data[PubMetaData.revdate]</span> <span><a aria-hidden="true" href="#cb47-253"></a></span> <span><a aria-hidden="true" href="#cb47-254"></a> xml_doc <span class="op">=</span> soup.prettify()</span> <span><a aria-hidden="true" href="#cb47-255"></a></span> <span><a aria-hidden="true" href="#cb47-256"></a> <span class="cf">with</span> <span class="bu">open</span>(sitemap_path, <span class="st">'w'</span>) <span class="im">as</span> sitemap_file:</span> <span><a aria-hidden="true" href="#cb47-257"></a> <span class="bu">print</span>(xml_doc, <span class="bu">file</span><span class="op">=</span>sitemap_file)</span> <span><a aria-hidden="true" href="#cb47-258"></a> sitemap_file.flush()</span> <span><a aria-hidden="true" href="#cb47-259"></a> sitemap_file.close()</span></code></pre> </div> <h3> RSS - Really Simple Syndication </h3> <p> The RSS will be based on the standard described by the "Feed Validation Service" <sup> ( 21 ) </sup> and "RSS 2.0 Specification" <sup> ( 22 ) </sup> . </p> <div class="sourceCode"> <pre class="sourceCode xml"><code class="sourceCode xml"><span><a aria-hidden="true" href="#cb48-1"></a><span class="kw">&lt;?xml</span> version="1.0" encoding="UTF-8"<span class="kw">?&gt;</span></span> <span><a aria-hidden="true" href="#cb48-2"></a><span class="kw">&lt;rss</span><span class="ot"> version=</span><span class="st">"2.0"</span></span> <span><a aria-hidden="true" href="#cb48-3"></a><span class="ot"> xmlns:content=</span><span class="st">"http://purl.org/rss/1.0/modules/content/"</span></span> <span><a aria-hidden="true" href="#cb48-4"></a><span class="ot"> xmlns:atom=</span><span class="st">"http://www.w3.org/2005/Atom"</span></span> <span><a aria-hidden="true" href="#cb48-5"></a> <span class="kw">&gt;</span></span> <span><a aria-hidden="true" href="#cb48-6"></a> <span class="kw">&lt;channel&gt;</span></span> <span><a aria-hidden="true" href="#cb48-7"></a> <span class="kw">&lt;title&gt;</span>Idee der eigenen Erkenntnis<span class="kw">&lt;/title&gt;</span></span> <span><a aria-hidden="true" href="#cb48-8"></a> <span class="kw">&lt;atom:link</span><span class="ot"> href=</span><span class="st">"https://idee.frank-siebert.de/idee-rss.xml"</span><span class="ot"> rel=</span><span class="st">"self"</span> </span> <span><a aria-hidden="true" href="#cb48-9"></a><span class="ot"> type=</span><span class="st">"application/rss+xml"</span> <span class="kw">/&gt;</span></span> <span><a aria-hidden="true" href="#cb48-10"></a> <span class="kw">&lt;link&gt;</span>https://idee.frank-siebert.de<span class="kw">&lt;/link&gt;</span></span> <span><a aria-hidden="true" href="#cb48-11"></a> <span class="kw">&lt;description&gt;</span>Idee<span class="kw">&lt;/description&gt;</span></span> <span><a aria-hidden="true" href="#cb48-12"></a> <span class="kw">&lt;lastBuildDate&gt;</span>Tue, 11 Jan 2022 07:54:24 +0000<span class="kw">&lt;/lastBuildDate&gt;</span></span> <span><a aria-hidden="true" href="#cb48-13"></a> <span class="kw">&lt;language&gt;</span>de-DE<span class="kw">&lt;/language&gt;</span></span> <span><a aria-hidden="true" href="#cb48-14"></a> <span class="kw">&lt;generator&gt;</span>pandoc, fs-commit-msg-hook 1.0<span class="kw">&lt;/generator&gt;</span></span> <span><a aria-hidden="true" href="#cb48-15"></a> <span class="kw">&lt;image&gt;</span></span> <span><a aria-hidden="true" href="#cb48-16"></a> <span class="kw">&lt;url&gt;</span>https://idee.frank-siebert.de/image/favicon-256x256-150x150.png<span class="kw">&lt;/url&gt;</span></span> <span><a aria-hidden="true" href="#cb48-17"></a> <span class="kw">&lt;title&gt;</span>Idee der eigenen Erkenntnis<span class="kw">&lt;/title&gt;</span></span> <span><a aria-hidden="true" href="#cb48-18"></a> <span class="kw">&lt;link&gt;</span>https://idee.frank-siebert.de<span class="kw">&lt;/link&gt;</span></span> <span><a aria-hidden="true" href="#cb48-19"></a> <span class="kw">&lt;width&gt;</span>32<span class="kw">&lt;/width&gt;</span></span> <span><a aria-hidden="true" href="#cb48-20"></a> <span class="kw">&lt;height&gt;</span>32<span class="kw">&lt;/height&gt;</span></span> <span><a aria-hidden="true" href="#cb48-21"></a> <span class="kw">&lt;/image&gt;</span> </span> <span><a aria-hidden="true" href="#cb48-22"></a> <span class="kw">&lt;item&gt;</span></span> <span><a aria-hidden="true" href="#cb48-23"></a> <span class="kw">&lt;title&gt;</span>Best Article Ever Written<span class="kw">&lt;/title&gt;</span></span> <span><a aria-hidden="true" href="#cb48-24"></a> <span class="kw">&lt;link&gt;</span></span> <span><a aria-hidden="true" href="#cb48-25"></a> https://idee.frank-siebert.de/article/best-article-ever-written.html</span> <span><a aria-hidden="true" href="#cb48-26"></a> <span class="kw">&lt;/link&gt;</span></span> <span><a aria-hidden="true" href="#cb48-27"></a> <span class="kw">&lt;pubDate&gt;</span>Tue, 11 Jan 2022 07:50:11 +0000<span class="kw">&lt;/pubDate&gt;</span></span> <span><a aria-hidden="true" href="#cb48-28"></a> <span class="kw">&lt;category&gt;</span><span class="bn">&lt;![CDATA[</span>Uncategorized<span class="bn">]]&gt;</span><span class="kw">&lt;/category&gt;</span></span> <span><a aria-hidden="true" href="#cb48-29"></a> <span class="kw">&lt;guid</span><span class="ot"> isPermaLink=</span><span class="st">"false"</span><span class="kw">&gt;</span></span> <span><a aria-hidden="true" href="#cb48-30"></a>https://idee.frank-siebert.de/article/best-article-ever-written.html-2022-01-11T07:50:11</span> <span><a aria-hidden="true" href="#cb48-31"></a> <span class="kw">&lt;/guid&gt;</span></span> <span><a aria-hidden="true" href="#cb48-32"></a> <span class="kw">&lt;description&gt;</span></span> <span><a aria-hidden="true" href="#cb48-33"></a> <span class="bn">&lt;![CDATA[</span>First 406 characters of the article, followed by ...<span class="bn">]]&gt;</span></span> <span><a aria-hidden="true" href="#cb48-34"></a> <span class="kw">&lt;/description&gt;</span></span> <span><a aria-hidden="true" href="#cb48-35"></a> <span class="kw">&lt;content:encoded&gt;</span><span class="bn">&lt;![CDATA[</span>&lt;article&gt;......&lt;/article&gt;<span class="bn">]]&gt;</span><span class="kw">&lt;/content:encoded&gt;</span></span> <span><a aria-hidden="true" href="#cb48-36"></a> <span class="kw">&lt;enclosure</span> </span> <span><a aria-hidden="true" href="#cb48-37"></a><span class="ot"> url=</span><span class="st">"https://idee.frank-siebert.de/audio/best-article-ever-written.mp3"</span></span> <span><a aria-hidden="true" href="#cb48-38"></a><span class="ot"> length=</span><span class="st">"9090090"</span><span class="ot"> type=</span><span class="st">"audio/mpeg"</span> <span class="kw">/&gt;</span></span> <span><a aria-hidden="true" href="#cb48-39"></a> <span class="kw">&lt;/item&gt;</span></span> <span><a aria-hidden="true" href="#cb48-40"></a> <span class="kw">&lt;/channel&gt;</span></span> <span><a aria-hidden="true" href="#cb48-41"></a><span class="kw">&lt;/rss&gt;</span></span></code></pre> </div> <p> The article content will be embedded completely into the RSS, enclosed in a CDATA tag and encoded in utf-8. To be able to include the complete content, the extension "RDF Site Summary 1.0 Modules: Content" <sup> ( 23 ) </sup> with the namespace declaration <em> xmlns:content=" <a href="http://purl.org/rss/1.0/modules/content/"> http://purl.org/rss/1.0/modules/content/ </a> " </em> needs to be used. </p> <p> The RSS file will reference its web location via atom:link, which needs the inclusion of the namespace entry <em> xmlns:atom=" <a href="http://www.w3.org/2005/Atom"> http://www.w3.org/2005/Atom </a> " </em> for the line <em> <atom:link href="https://idee.frank-siebert.de/idee-rss.xml" rel="self" type="application/rss+xml"> </atom:link> </em> . A quite brief excursion into the "The Atom Syndication Format" <sup> ( 24 ) </sup> , which is in itself a concurrent syndication format specification. The current WordPress generated RSS indicates that the mixing of the two concurrent specifications does at least not create problems with feed consumers. </p> <p> To make sure existing feed consumers are served the rss feed without any need to change the link, the nginx configuration needs a location /feed/ to redirect this location to the RSS file. </p> <p> Since the RSS consumers most likely will use the channels title to present the feed items, and this title is the site title, two different RSS xml files are required, one for the site <em> Idee </em> and one for the site <em> Concept </em> . An additional reason to create two RSS xml files is the language information, which can be provided only once in the language tag of the channel. </p> <p> The specification, according to the post "Multiple channels in a single RSS xml - is it ever appropriate?" <sup> ( 25 ) </sup> does not allow more than one channel in one RSS xml file. </p> <p> The question left: What's the unit if the length attribute in the enclosure tag? </p> <p> I found that WordPress provides the size of the file in bytes as value for this attribute, which was also the most probable answer to this question. </p> <h4> RSS feeds to create </h4> <ul class="incremental"> <li> idee-rss.xml <ul class="incremental"> <li> title: Idee der eigenen Erkenntnis </li> <li> link: <a href=".."> https://idee.frank-siebert.de </a> </li> <li> description: Idee </li> <li> language; de (or de-DE) </li> </ul> </li> <li> concept-rss.xml <ul class="incremental"> <li> title: Concept of new cognition elicitation personally thinking </li> <li> link: <a href=".."> https://idee.frank-siebert.de </a> </li> <li> description: Concept </li> <li> language; en (or en-US) </li> </ul> </li> </ul> <h4> Article Updates </h4> <p> If articles are updated after publishing, RSS does not provide any option to inform about the date of revision. The best idea, how such an update could be communicated to consumers is described in the post "RSS update single item" <sup> ( 26 ) </sup> . </p> <p> The idea is to change the guid of the item to inform that the item contains changed content. The answer was not marked as correct, but it was the only answer provided, </p> <p> The implementation choice is to use the link and timestamp of the update as combined guid string. </p> <p> The GUID change resulted in GPodder sometimes in duplicate entries shown for one article, which is not the result intended. However, GPodder recognized changes in the content without any additional signaling. At least that's the current impression. </p> <h4> Number of RSS items </h4> <p> The rss files will contain the last 10 articles, the last created/update first. Since I plan to migrate articles in the sequence of their original publishing, I'll come out of the migration with my latest articles automatically being featured in the rss feed, with the only difference that I will have two feets in the new solution. </p> <h4> Templates </h4> <p> RSS feed implementation will start off with two templates, one for the english and one for the german version, in folder portal, containing the channel information only items to be added. </p> <p> After initial feed creation the templates are no longer required, but I'll keep them anyhow, Supposedly the implementation will be very similar to the sitemap implementation. </p> <h4> Implementation </h4> <p> The implementation of the RSS feed generator turned out to be much more cumbersome than expected. Pythons Module BeautifulSoup gives you the alternatives to use the LXMLTreeBuilderForXML, which will nicely write CDATA sections, but will remove and HTML encode them (you know &amp;gt; and such), when it reads the XML. </p> <p> The alternative HTMLParserTreeBuilder works nicely for xml as well, as long as all XML tags are lower-case. But since this was not mentioned anywhere I looked for solutions of the first problem, I had to find out this second problem by myself. </p> <p> Just with luck I found out before I tried it in an implementation, that using the lxml package without BeautifulSoup will not solve problem number one. </p> <p> After careful reading I based my third implementation on the module xml.dom.minidom. This is a pretty low-level implementation requiring some more lines of code, but it provides the required control over the CDATA sections and does not overwrite my implementation decision when it reads the XML. </p> <p> It just has the annoying habit of adding empty lines with white-spaces only, with its method toprettyxml(). Every time you read and save it will add an additional line between otherwise untouched lines of the XML. But this is at least easily solved by two regex pattern substitutions without any risk to alter mistakenly also the HTML fragments embedded via CDATA. </p> <p> The following code shows the current implementation. The result has been tested with GPodder, Liferea and Tidings, where GPodder cares only for items with a media reference in the enclosure tag, while Tidings and Liferea show items regardless of the presence of an enclosure. </p> <p> <strong> ~/projects/idee/generator/rssbuilder.py </strong> </p> <div class="sourceCode"> <pre class="sourceCode Python"><code class="sourceCode python"><span><a aria-hidden="true" href="#cb49-1"></a><span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb49-2"></a><span class="co">Update the rss feed of the webseite.</span></span> <span><a aria-hidden="true" href="#cb49-3"></a></span> <span><a aria-hidden="true" href="#cb49-4"></a><span class="co">@author: Frank Siebert</span></span> <span><a aria-hidden="true" href="#cb49-5"></a><span class="co">@license: https://creativecommons.org/publicdomain/zero/1.0/deed.en</span></span> <span><a aria-hidden="true" href="#cb49-6"></a><span class="co">@date: 2022-03-15</span></span> <span><a aria-hidden="true" href="#cb49-7"></a></span> <span><a aria-hidden="true" href="#cb49-8"></a><span class="co">All links provided relative to the /article/ folder</span></span> <span><a aria-hidden="true" href="#cb49-9"></a></span> <span><a aria-hidden="true" href="#cb49-10"></a><span class="co">@author: Frank Siebert</span></span> <span><a aria-hidden="true" href="#cb49-11"></a><span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb49-12"></a><span class="im">import</span> re</span> <span><a aria-hidden="true" href="#cb49-13"></a><span class="im">import</span> datetime</span> <span><a aria-hidden="true" href="#cb49-14"></a><span class="im">import</span> xml.dom.minidom</span> <span><a aria-hidden="true" href="#cb49-15"></a><span class="im">from</span> bs4 <span class="im">import</span> BeautifulSoup</span> <span><a aria-hidden="true" href="#cb49-16"></a><span class="im">from</span> bs4.builder._htmlparser <span class="im">import</span> HTMLParserTreeBuilder</span> <span><a aria-hidden="true" href="#cb49-17"></a><span class="im">from</span> pubmetadata <span class="im">import</span> PubMetaData</span> <span><a aria-hidden="true" href="#cb49-18"></a><span class="im">from</span> pubmetadata <span class="im">import</span> pageurn</span> <span><a aria-hidden="true" href="#cb49-19"></a><span class="im">from</span> gitmsgconstants <span class="im">import</span> GitMsgConstants <span class="im">as</span> gmc</span> <span><a aria-hidden="true" href="#cb49-20"></a></span> <span><a aria-hidden="true" href="#cb49-21"></a>CHANNEL_TAG <span class="op">=</span> <span class="st">"channel"</span></span> <span><a aria-hidden="true" href="#cb49-22"></a>LASTBUILD_TAG <span class="op">=</span> <span class="st">"lastBuildDate"</span></span> <span><a aria-hidden="true" href="#cb49-23"></a>ITEM_TAG <span class="op">=</span> <span class="st">"item"</span></span> <span><a aria-hidden="true" href="#cb49-24"></a>TITLE_TAG <span class="op">=</span> <span class="st">"title"</span></span> <span><a aria-hidden="true" href="#cb49-25"></a>LINK_TAG <span class="op">=</span> <span class="st">"link"</span></span> <span><a aria-hidden="true" href="#cb49-26"></a>PUBDATE_TAG <span class="op">=</span> <span class="st">"pubDate"</span></span> <span><a aria-hidden="true" href="#cb49-27"></a>GUID_TAG <span class="op">=</span> <span class="st">"guid"</span></span> <span><a aria-hidden="true" href="#cb49-28"></a>DESCRIPTION_TAG <span class="op">=</span> <span class="st">"description"</span></span> <span><a aria-hidden="true" href="#cb49-29"></a>CONTENT_TAG <span class="op">=</span> <span class="st">"content:encoded"</span></span> <span><a aria-hidden="true" href="#cb49-30"></a>ENCLOSURE_TAG <span class="op">=</span> <span class="st">"enclosure"</span></span> <span><a aria-hidden="true" href="#cb49-31"></a>AUDIO_TAG <span class="op">=</span> <span class="st">"audio"</span></span> <span><a aria-hidden="true" href="#cb49-32"></a></span> <span><a aria-hidden="true" href="#cb49-33"></a><span class="co"># for the testing on server sol</span></span> <span><a aria-hidden="true" href="#cb49-34"></a><span class="co"># HOST = "http://sol:88/"</span></span> <span><a aria-hidden="true" href="#cb49-35"></a><span class="co"># for the website</span></span> <span><a aria-hidden="true" href="#cb49-36"></a>HOST <span class="op">=</span> gmc.website <span class="op">+</span> <span class="st">"/"</span></span> <span><a aria-hidden="true" href="#cb49-37"></a></span> <span><a aria-hidden="true" href="#cb49-38"></a><span class="co"># Number of items to included into the RSS feed</span></span> <span><a aria-hidden="true" href="#cb49-39"></a>ITEM_COUNT <span class="op">=</span> <span class="dv">15</span></span> <span><a aria-hidden="true" href="#cb49-40"></a></span> <span><a aria-hidden="true" href="#cb49-41"></a></span> <span><a aria-hidden="true" href="#cb49-42"></a><span class="kw">def</span> by_pub_date(article_data):</span> <span><a aria-hidden="true" href="#cb49-43"></a> <span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb49-44"></a><span class="co"> Return the publishing date as sort criteria.</span></span> <span><a aria-hidden="true" href="#cb49-45"></a></span> <span><a aria-hidden="true" href="#cb49-46"></a><span class="co"> Parameters</span></span> <span><a aria-hidden="true" href="#cb49-47"></a><span class="co"> ----------</span></span> <span><a aria-hidden="true" href="#cb49-48"></a><span class="co"> e : Series</span></span> <span><a aria-hidden="true" href="#cb49-49"></a><span class="co"> article_data.</span></span> <span><a aria-hidden="true" href="#cb49-50"></a></span> <span><a aria-hidden="true" href="#cb49-51"></a><span class="co"> Returns</span></span> <span><a aria-hidden="true" href="#cb49-52"></a><span class="co"> -------</span></span> <span><a aria-hidden="true" href="#cb49-53"></a><span class="co"> TYPE</span></span> <span><a aria-hidden="true" href="#cb49-54"></a><span class="co"> Date as Str</span></span> <span><a aria-hidden="true" href="#cb49-55"></a></span> <span><a aria-hidden="true" href="#cb49-56"></a><span class="co"> """</span></span> <span><a aria-hidden="true" href="#cb49-57"></a> <span class="cf">return</span> article_data[PubMetaData.pubdate]</span> <span><a aria-hidden="true" href="#cb49-58"></a></span> <span><a aria-hidden="true" href="#cb49-59"></a></span> <span><a aria-hidden="true" href="#cb49-60"></a><span class="kw">class</span> RSSBuilder():</span> <span><a aria-hidden="true" href="#cb49-61"></a> <span class="co">"""Manage all changees in the sitemaps."""</span></span> <span><a aria-hidden="true" href="#cb49-62"></a></span> <span><a aria-hidden="true" href="#cb49-63"></a> <span class="kw">def</span> <span class="fu">__init__</span>(<span class="va">self</span>):</span> <span><a aria-hidden="true" href="#cb49-64"></a> <span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb49-65"></a><span class="co"> Initialize changelists.</span></span> <span><a aria-hidden="true" href="#cb49-66"></a></span> <span><a aria-hidden="true" href="#cb49-67"></a><span class="co"> The information about the changed html pages comes from</span></span> <span><a aria-hidden="true" href="#cb49-68"></a><span class="co"> PubMetaData.instance._updates and</span></span> <span><a aria-hidden="true" href="#cb49-69"></a><span class="co"> PubMetaData.instance._deletions .</span></span> <span><a aria-hidden="true" href="#cb49-70"></a></span> <span><a aria-hidden="true" href="#cb49-71"></a><span class="co"> Returns</span></span> <span><a aria-hidden="true" href="#cb49-72"></a><span class="co"> -------</span></span> <span><a aria-hidden="true" href="#cb49-73"></a><span class="co"> None.</span></span> <span><a aria-hidden="true" href="#cb49-74"></a></span> <span><a aria-hidden="true" href="#cb49-75"></a><span class="co"> """</span></span> <span><a aria-hidden="true" href="#cb49-76"></a> <span class="co"># information for German page changes on site "Idee".</span></span> <span><a aria-hidden="true" href="#cb49-77"></a> <span class="va">self</span>.de_list <span class="op">=</span> []</span> <span><a aria-hidden="true" href="#cb49-78"></a> <span class="co"># information for English page changes on site "Concept".</span></span> <span><a aria-hidden="true" href="#cb49-79"></a> <span class="va">self</span>.en_list <span class="op">=</span> []</span> <span><a aria-hidden="true" href="#cb49-80"></a> <span class="co"># The time of the update</span></span> <span><a aria-hidden="true" href="#cb49-81"></a> <span class="va">self</span>._nowdate <span class="op">=</span> datetime.datetime.now().isoformat()</span> <span><a aria-hidden="true" href="#cb49-82"></a> <span class="co"># soup of currently processed RSS xml</span></span> <span><a aria-hidden="true" href="#cb49-83"></a> <span class="va">self</span>._rss_xml <span class="op">=</span> <span class="va">None</span></span> <span><a aria-hidden="true" href="#cb49-84"></a> <span class="co"># soup tag of currently processed article</span></span> <span><a aria-hidden="true" href="#cb49-85"></a> <span class="va">self</span>._article_tag <span class="op">=</span> <span class="va">None</span></span> <span><a aria-hidden="true" href="#cb49-86"></a></span> <span><a aria-hidden="true" href="#cb49-87"></a> <span class="cf">for</span> article_data <span class="kw">in</span> PubMetaData.instance._updates:</span> <span><a aria-hidden="true" href="#cb49-88"></a> <span class="cf">if</span> article_data[PubMetaData.site] <span class="op">==</span> <span class="st">"Idee"</span> \</span> <span><a aria-hidden="true" href="#cb49-89"></a> <span class="kw">and</span> article_data.name <span class="op">!=</span> <span class="st">"rechtliches"</span>:</span> <span><a aria-hidden="true" href="#cb49-90"></a> <span class="va">self</span>.de_list.append(article_data)</span> <span><a aria-hidden="true" href="#cb49-91"></a> <span class="cf">else</span>:</span> <span><a aria-hidden="true" href="#cb49-92"></a> <span class="cf">if</span> article_data.name <span class="op">!=</span> <span class="st">"legal"</span>:</span> <span><a aria-hidden="true" href="#cb49-93"></a> <span class="va">self</span>.en_list.append(article_data)</span> <span><a aria-hidden="true" href="#cb49-94"></a></span> <span><a aria-hidden="true" href="#cb49-95"></a> <span class="cf">for</span> article_data <span class="kw">in</span> PubMetaData.instance._deletions:</span> <span><a aria-hidden="true" href="#cb49-96"></a> <span class="co"># </span><span class="al">TODO</span></span> <span><a aria-hidden="true" href="#cb49-97"></a> <span class="cf">pass</span></span> <span><a aria-hidden="true" href="#cb49-98"></a></span> <span><a aria-hidden="true" href="#cb49-99"></a> <span class="co"># Default sort is ascending, oldest posts first in list</span></span> <span><a aria-hidden="true" href="#cb49-100"></a> <span class="va">self</span>.de_list.sort(key<span class="op">=</span>by_pub_date)</span> <span><a aria-hidden="true" href="#cb49-101"></a> <span class="va">self</span>.en_list.sort(key<span class="op">=</span>by_pub_date)</span> <span><a aria-hidden="true" href="#cb49-102"></a></span> <span><a aria-hidden="true" href="#cb49-103"></a> <span class="kw">def</span> update(<span class="va">self</span>):</span> <span><a aria-hidden="true" href="#cb49-104"></a> <span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb49-105"></a><span class="co"> Iterate over changes and update respective rss files.</span></span> <span><a aria-hidden="true" href="#cb49-106"></a></span> <span><a aria-hidden="true" href="#cb49-107"></a><span class="co"> Returns</span></span> <span><a aria-hidden="true" href="#cb49-108"></a><span class="co"> -------</span></span> <span><a aria-hidden="true" href="#cb49-109"></a><span class="co"> None.</span></span> <span><a aria-hidden="true" href="#cb49-110"></a></span> <span><a aria-hidden="true" href="#cb49-111"></a><span class="co"> """</span></span> <span><a aria-hidden="true" href="#cb49-112"></a> <span class="co"># Update idee-rss.xml.</span></span> <span><a aria-hidden="true" href="#cb49-113"></a> <span class="cf">if</span> <span class="bu">len</span>(<span class="va">self</span>.de_list) <span class="op">&gt;</span> <span class="dv">0</span>:</span> <span><a aria-hidden="true" href="#cb49-114"></a> <span class="va">self</span>._update(<span class="va">self</span>.de_list, gmc.idee_rss)</span> <span><a aria-hidden="true" href="#cb49-115"></a></span> <span><a aria-hidden="true" href="#cb49-116"></a> <span class="co"># Update concept-rss.xml.</span></span> <span><a aria-hidden="true" href="#cb49-117"></a> <span class="cf">if</span> <span class="bu">len</span>(<span class="va">self</span>.en_list) <span class="op">&gt;</span> <span class="dv">0</span>:</span> <span><a aria-hidden="true" href="#cb49-118"></a> <span class="va">self</span>._update(<span class="va">self</span>.en_list, gmc.concept_rss)</span> <span><a aria-hidden="true" href="#cb49-119"></a></span> <span><a aria-hidden="true" href="#cb49-120"></a> <span class="kw">def</span> _read_article_tag(<span class="va">self</span>, article_data):</span> <span><a aria-hidden="true" href="#cb49-121"></a> <span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb49-122"></a><span class="co"> Read the article tag of the processed article.</span></span> <span><a aria-hidden="true" href="#cb49-123"></a></span> <span><a aria-hidden="true" href="#cb49-124"></a><span class="co"> The article tag gets assigned to self._article_tag</span></span> <span><a aria-hidden="true" href="#cb49-125"></a></span> <span><a aria-hidden="true" href="#cb49-126"></a><span class="co"> Returns</span></span> <span><a aria-hidden="true" href="#cb49-127"></a><span class="co"> -------</span></span> <span><a aria-hidden="true" href="#cb49-128"></a><span class="co"> None.</span></span> <span><a aria-hidden="true" href="#cb49-129"></a></span> <span><a aria-hidden="true" href="#cb49-130"></a><span class="co"> """</span></span> <span><a aria-hidden="true" href="#cb49-131"></a> articlepath <span class="op">=</span> gmc.articlepath <span class="op">/</span> article_data.name</span> <span><a aria-hidden="true" href="#cb49-132"></a> articlepath <span class="op">=</span> articlepath.with_suffix(<span class="st">".html"</span>)</span> <span><a aria-hidden="true" href="#cb49-133"></a> articlepath.resolve()</span> <span><a aria-hidden="true" href="#cb49-134"></a></span> <span><a aria-hidden="true" href="#cb49-135"></a> <span class="cf">with</span> <span class="bu">open</span>(articlepath, <span class="st">'r'</span>) <span class="im">as</span> infile:</span> <span><a aria-hidden="true" href="#cb49-136"></a> html_doc <span class="op">=</span> infile.read()</span> <span><a aria-hidden="true" href="#cb49-137"></a> infile.flush()</span> <span><a aria-hidden="true" href="#cb49-138"></a> infile.close()</span> <span><a aria-hidden="true" href="#cb49-139"></a></span> <span><a aria-hidden="true" href="#cb49-140"></a> builder <span class="op">=</span> HTMLParserTreeBuilder()</span> <span><a aria-hidden="true" href="#cb49-141"></a> soup <span class="op">=</span> BeautifulSoup(html_doc, builder<span class="op">=</span>builder)</span> <span><a aria-hidden="true" href="#cb49-142"></a></span> <span><a aria-hidden="true" href="#cb49-143"></a> <span class="va">self</span>._article_tag <span class="op">=</span> soup.find(<span class="st">"article"</span>)</span> <span><a aria-hidden="true" href="#cb49-144"></a></span> <span><a aria-hidden="true" href="#cb49-145"></a> <span class="co"># RSS is downloaded, there is no use case for relatvie links</span></span> <span><a aria-hidden="true" href="#cb49-146"></a> <span class="co"># even if RSS consumer theoritically could compute them</span></span> <span><a aria-hidden="true" href="#cb49-147"></a> <span class="co"># to absolute links</span></span> <span><a aria-hidden="true" href="#cb49-148"></a></span> <span><a aria-hidden="true" href="#cb49-149"></a> <span class="co"># "../" becomes "https://idee.frank-siebert.de/"</span></span> <span><a aria-hidden="true" href="#cb49-150"></a> tags <span class="op">=</span> <span class="va">self</span>._article_tag.find_all(re.<span class="bu">compile</span>(<span class="vs">r".*"</span>), attrs<span class="op">=</span>{</span> <span><a aria-hidden="true" href="#cb49-151"></a> <span class="st">"href"</span>: re.<span class="bu">compile</span>(<span class="vs">r"^\.\./"</span>)})</span> <span><a aria-hidden="true" href="#cb49-152"></a> <span class="cf">for</span> tag <span class="kw">in</span> tags:</span> <span><a aria-hidden="true" href="#cb49-153"></a> href <span class="op">=</span> tag.attrs[<span class="st">"href"</span>]</span> <span><a aria-hidden="true" href="#cb49-154"></a> href <span class="op">=</span> href.replace(<span class="st">"../"</span>, HOST)</span> <span><a aria-hidden="true" href="#cb49-155"></a> tag.attrs.update({<span class="st">"href"</span>: href})</span> <span><a aria-hidden="true" href="#cb49-156"></a></span> <span><a aria-hidden="true" href="#cb49-157"></a> tags <span class="op">=</span> <span class="va">self</span>._article_tag.find_all(re.<span class="bu">compile</span>(<span class="vs">r".*"</span>), attrs<span class="op">=</span>{</span> <span><a aria-hidden="true" href="#cb49-158"></a> <span class="st">"src"</span>: re.<span class="bu">compile</span>(<span class="vs">r"^\.\./"</span>)})</span> <span><a aria-hidden="true" href="#cb49-159"></a> <span class="cf">for</span> tag <span class="kw">in</span> tags:</span> <span><a aria-hidden="true" href="#cb49-160"></a> href <span class="op">=</span> tag.attrs[<span class="st">"src"</span>]</span> <span><a aria-hidden="true" href="#cb49-161"></a> href <span class="op">=</span> href.replace(<span class="st">"../"</span>, HOST)</span> <span><a aria-hidden="true" href="#cb49-162"></a> tag.attrs.update({<span class="st">"src"</span>: href})</span> <span><a aria-hidden="true" href="#cb49-163"></a></span> <span><a aria-hidden="true" href="#cb49-164"></a> <span class="co"># "./" becomes "https://idee.frank-siebert.de/article/"</span></span> <span><a aria-hidden="true" href="#cb49-165"></a> tags <span class="op">=</span> <span class="va">self</span>._article_tag.find_all(<span class="st">"a"</span>, attrs<span class="op">=</span>{</span> <span><a aria-hidden="true" href="#cb49-166"></a> <span class="st">"href"</span>: re.<span class="bu">compile</span>(<span class="vs">r"^\./"</span>)})</span> <span><a aria-hidden="true" href="#cb49-167"></a> <span class="cf">for</span> tag <span class="kw">in</span> tags:</span> <span><a aria-hidden="true" href="#cb49-168"></a> href <span class="op">=</span> tag.attrs[<span class="st">"href"</span>]</span> <span><a aria-hidden="true" href="#cb49-169"></a> href <span class="op">=</span> href.replace(<span class="st">"./"</span>, HOST <span class="op">+</span> <span class="st">"article/"</span>)</span> <span><a aria-hidden="true" href="#cb49-170"></a> tag.attrs.update({<span class="st">"href"</span>: href})</span> <span><a aria-hidden="true" href="#cb49-171"></a></span> <span><a aria-hidden="true" href="#cb49-172"></a> <span class="va">self</span>._article_tag.prettify()</span> <span><a aria-hidden="true" href="#cb49-173"></a></span> <span><a aria-hidden="true" href="#cb49-174"></a> <span class="kw">def</span> _article_cleanup(<span class="va">self</span>):</span> <span><a aria-hidden="true" href="#cb49-175"></a> <span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb49-176"></a><span class="co"> Remove some things from the articles BeautifulSoup model.</span></span> <span><a aria-hidden="true" href="#cb49-177"></a></span> <span><a aria-hidden="true" href="#cb49-178"></a><span class="co"> Remove those things, which are not rendered nicely in the</span></span> <span><a aria-hidden="true" href="#cb49-179"></a><span class="co"> RSS feed consumer, or which are simply dysfunctional there.</span></span> <span><a aria-hidden="true" href="#cb49-180"></a></span> <span><a aria-hidden="true" href="#cb49-181"></a><span class="co"> Changes are applied to the currently processed article</span></span> <span><a aria-hidden="true" href="#cb49-182"></a><span class="co"> referenced by self._article_tag</span></span> <span><a aria-hidden="true" href="#cb49-183"></a></span> <span><a aria-hidden="true" href="#cb49-184"></a><span class="co"> Consumers tested: GPodder, Liferea, Tidings</span></span> <span><a aria-hidden="true" href="#cb49-185"></a></span> <span><a aria-hidden="true" href="#cb49-186"></a><span class="co"> Returns</span></span> <span><a aria-hidden="true" href="#cb49-187"></a><span class="co"> -------</span></span> <span><a aria-hidden="true" href="#cb49-188"></a><span class="co"> None.</span></span> <span><a aria-hidden="true" href="#cb49-189"></a></span> <span><a aria-hidden="true" href="#cb49-190"></a><span class="co"> """</span></span> <span><a aria-hidden="true" href="#cb49-191"></a> <span class="co"># peel out sections</span></span> <span><a aria-hidden="true" href="#cb49-192"></a> sections <span class="op">=</span> <span class="va">self</span>._article_tag.find_all(<span class="st">"section"</span>)</span> <span><a aria-hidden="true" href="#cb49-193"></a> <span class="cf">for</span> section <span class="kw">in</span> sections:</span> <span><a aria-hidden="true" href="#cb49-194"></a> section.unwrap()</span> <span><a aria-hidden="true" href="#cb49-195"></a></span> <span><a aria-hidden="true" href="#cb49-196"></a> <span class="co"># fallback to more common tags</span></span> <span><a aria-hidden="true" href="#cb49-197"></a> tag <span class="op">=</span> <span class="va">self</span>._article_tag.find(<span class="st">"header"</span>)</span> <span><a aria-hidden="true" href="#cb49-198"></a> tag.name <span class="op">=</span> <span class="st">"div"</span></span> <span><a aria-hidden="true" href="#cb49-199"></a> <span class="va">self</span>._article_tag.name <span class="op">=</span> <span class="st">"div"</span></span> <span><a aria-hidden="true" href="#cb49-200"></a></span> <span><a aria-hidden="true" href="#cb49-201"></a> <span class="co"># Remove toc</span></span> <span><a aria-hidden="true" href="#cb49-202"></a> nav <span class="op">=</span> <span class="va">self</span>._article_tag.find(<span class="st">"nav"</span>)</span> <span><a aria-hidden="true" href="#cb49-203"></a> <span class="cf">if</span> nav:</span> <span><a aria-hidden="true" href="#cb49-204"></a> nav.decompose()</span> <span><a aria-hidden="true" href="#cb49-205"></a></span> <span><a aria-hidden="true" href="#cb49-206"></a> <span class="co"># Remove footnote-back anchors.</span></span> <span><a aria-hidden="true" href="#cb49-207"></a> tags <span class="op">=</span> <span class="va">self</span>._article_tag.find_all(<span class="st">"a"</span>, class_<span class="op">=</span><span class="st">"footnote-back"</span>)</span> <span><a aria-hidden="true" href="#cb49-208"></a> <span class="cf">for</span> tag <span class="kw">in</span> tags:</span> <span><a aria-hidden="true" href="#cb49-209"></a> tag.decompose()</span> <span><a aria-hidden="true" href="#cb49-210"></a></span> <span><a aria-hidden="true" href="#cb49-211"></a> <span class="co"># Remove footnote-ref anchors, preserve the footnote.</span></span> <span><a aria-hidden="true" href="#cb49-212"></a> tags <span class="op">=</span> <span class="va">self</span>._article_tag.find_all(<span class="st">"a"</span>, class_<span class="op">=</span><span class="st">"footnote-ref"</span>)</span> <span><a aria-hidden="true" href="#cb49-213"></a> <span class="cf">for</span> tag <span class="kw">in</span> tags:</span> <span><a aria-hidden="true" href="#cb49-214"></a> suptag <span class="op">=</span> tag.find(<span class="st">"sup"</span>)</span> <span><a aria-hidden="true" href="#cb49-215"></a> <span class="co"># make footnotes more visible</span></span> <span><a aria-hidden="true" href="#cb49-216"></a> suptag.string.replace_with(<span class="st">"("</span> <span class="op">+</span> suptag.text <span class="op">+</span> <span class="st">")"</span>)</span> <span><a aria-hidden="true" href="#cb49-217"></a> tag.unwrap()</span> <span><a aria-hidden="true" href="#cb49-218"></a></span> <span><a aria-hidden="true" href="#cb49-219"></a> <span class="co"># Remove category anchors</span></span> <span><a aria-hidden="true" href="#cb49-220"></a> tags <span class="op">=</span> <span class="va">self</span>._article_tag.find_all(<span class="st">"a"</span>, class_<span class="op">=</span><span class="st">"category"</span>)</span> <span><a aria-hidden="true" href="#cb49-221"></a> <span class="cf">for</span> tag <span class="kw">in</span> tags:</span> <span><a aria-hidden="true" href="#cb49-222"></a> tag.decompose()</span> <span><a aria-hidden="true" href="#cb49-223"></a></span> <span><a aria-hidden="true" href="#cb49-224"></a> <span class="co"># Remove attributes from image preventiong it</span></span> <span><a aria-hidden="true" href="#cb49-225"></a> <span class="co"># to be shown in gpodder</span></span> <span><a aria-hidden="true" href="#cb49-226"></a> images <span class="op">=</span> <span class="va">self</span>._article_tag.find_all(<span class="st">"img"</span>)</span> <span><a aria-hidden="true" href="#cb49-227"></a> <span class="cf">for</span> img <span class="kw">in</span> images:</span> <span><a aria-hidden="true" href="#cb49-228"></a> img.attrs <span class="op">=</span> {<span class="st">"src"</span>: img.attrs[<span class="st">"src"</span>]}</span> <span><a aria-hidden="true" href="#cb49-229"></a></span> <span><a aria-hidden="true" href="#cb49-230"></a> <span class="co"># Remove id attributes or some tags might not</span></span> <span><a aria-hidden="true" href="#cb49-231"></a> <span class="co"># render nicely</span></span> <span><a aria-hidden="true" href="#cb49-232"></a> idtags <span class="op">=</span> <span class="va">self</span>._article_tag.find_all(re.<span class="bu">compile</span>(<span class="vs">r".*"</span>), attrs<span class="op">=</span>{</span> <span><a aria-hidden="true" href="#cb49-233"></a> <span class="st">"id"</span>: <span class="va">True</span>})</span> <span><a aria-hidden="true" href="#cb49-234"></a> <span class="cf">for</span> tag <span class="kw">in</span> idtags:</span> <span><a aria-hidden="true" href="#cb49-235"></a> tag.attrs.pop(<span class="st">"id"</span>)</span> <span><a aria-hidden="true" href="#cb49-236"></a></span> <span><a aria-hidden="true" href="#cb49-237"></a> <span class="co"># Remove role attributes or some tags might not</span></span> <span><a aria-hidden="true" href="#cb49-238"></a> <span class="co"># render nicely</span></span> <span><a aria-hidden="true" href="#cb49-239"></a> idtags <span class="op">=</span> <span class="va">self</span>._article_tag.find_all(re.<span class="bu">compile</span>(<span class="vs">r".*"</span>), attrs<span class="op">=</span>{</span> <span><a aria-hidden="true" href="#cb49-240"></a> <span class="st">"role"</span>: <span class="va">True</span>})</span> <span><a aria-hidden="true" href="#cb49-241"></a> <span class="cf">for</span> tag <span class="kw">in</span> idtags:</span> <span><a aria-hidden="true" href="#cb49-242"></a> tag.attrs.pop(<span class="st">"role"</span>)</span> <span><a aria-hidden="true" href="#cb49-243"></a></span> <span><a aria-hidden="true" href="#cb49-244"></a> <span class="co"># Remove tabindex attributes not working anyhow in gpodder</span></span> <span><a aria-hidden="true" href="#cb49-245"></a> idtags <span class="op">=</span> <span class="va">self</span>._article_tag.find_all(re.<span class="bu">compile</span>(<span class="vs">r".*"</span>), attrs<span class="op">=</span>{</span> <span><a aria-hidden="true" href="#cb49-246"></a> <span class="st">"tabindex"</span>: <span class="va">True</span>})</span> <span><a aria-hidden="true" href="#cb49-247"></a> <span class="cf">for</span> tag <span class="kw">in</span> idtags:</span> <span><a aria-hidden="true" href="#cb49-248"></a> tag.attrs.pop(<span class="st">"tabindex"</span>)</span> <span><a aria-hidden="true" href="#cb49-249"></a></span> <span><a aria-hidden="true" href="#cb49-250"></a> <span class="kw">def</span> _get_item_tag(<span class="va">self</span>, channel_tag, url, article_data):</span> <span><a aria-hidden="true" href="#cb49-251"></a> <span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb49-252"></a><span class="co"> Find the item tag based on the url information.</span></span> <span><a aria-hidden="true" href="#cb49-253"></a></span> <span><a aria-hidden="true" href="#cb49-254"></a><span class="co"> Parameters</span></span> <span><a aria-hidden="true" href="#cb49-255"></a><span class="co"> ----------</span></span> <span><a aria-hidden="true" href="#cb49-256"></a><span class="co"> channel_tag : xml.dom.minidom.Tag</span></span> <span><a aria-hidden="true" href="#cb49-257"></a><span class="co"> The &lt;channel&gt; tag from the minidom document model.</span></span> <span><a aria-hidden="true" href="#cb49-258"></a><span class="co"> url : Str</span></span> <span><a aria-hidden="true" href="#cb49-259"></a><span class="co"> The url of the article, whose item tag is to be returned.</span></span> <span><a aria-hidden="true" href="#cb49-260"></a><span class="co"> article_data : Dict</span></span> <span><a aria-hidden="true" href="#cb49-261"></a><span class="co"> Data dictionary of the currently processed article.</span></span> <span><a aria-hidden="true" href="#cb49-262"></a></span> <span><a aria-hidden="true" href="#cb49-263"></a><span class="co"> Returns</span></span> <span><a aria-hidden="true" href="#cb49-264"></a><span class="co"> -------</span></span> <span><a aria-hidden="true" href="#cb49-265"></a><span class="co"> item_tag : xml.dom.minidom.Tag</span></span> <span><a aria-hidden="true" href="#cb49-266"></a><span class="co"> The pre-existing or created &lt;item&gt; tag for the currently</span></span> <span><a aria-hidden="true" href="#cb49-267"></a><span class="co"> processed article.</span></span> <span><a aria-hidden="true" href="#cb49-268"></a><span class="co"> """</span></span> <span><a aria-hidden="true" href="#cb49-269"></a> item_tag <span class="op">=</span> <span class="va">None</span></span> <span><a aria-hidden="true" href="#cb49-270"></a> tag <span class="op">=</span> <span class="va">None</span></span> <span><a aria-hidden="true" href="#cb49-271"></a> links <span class="op">=</span> channel_tag.getElementsByTagName(LINK_TAG)</span> <span><a aria-hidden="true" href="#cb49-272"></a></span> <span><a aria-hidden="true" href="#cb49-273"></a> <span class="cf">for</span> link <span class="kw">in</span> links:</span> <span><a aria-hidden="true" href="#cb49-274"></a> savedurl <span class="op">=</span> <span class="va">None</span></span> <span><a aria-hidden="true" href="#cb49-275"></a> <span class="cf">if</span> <span class="bu">len</span>(link.childNodes) <span class="op">&gt;</span> <span class="dv">0</span>:</span> <span><a aria-hidden="true" href="#cb49-276"></a> savedurl <span class="op">=</span> link.childNodes[<span class="dv">0</span>].data.strip()</span> <span><a aria-hidden="true" href="#cb49-277"></a> <span class="cf">if</span> url <span class="op">==</span> savedurl:</span> <span><a aria-hidden="true" href="#cb49-278"></a> tag <span class="op">=</span> link</span> <span><a aria-hidden="true" href="#cb49-279"></a> <span class="cf">break</span></span> <span><a aria-hidden="true" href="#cb49-280"></a></span> <span><a aria-hidden="true" href="#cb49-281"></a> <span class="cf">if</span> tag:</span> <span><a aria-hidden="true" href="#cb49-282"></a> item_tag <span class="op">=</span> tag.parentNode</span> <span><a aria-hidden="true" href="#cb49-283"></a> <span class="cf">else</span>:</span> <span><a aria-hidden="true" href="#cb49-284"></a> item_tag <span class="op">=</span> <span class="va">self</span>._rss_xml.createElement(ITEM_TAG)</span> <span><a aria-hidden="true" href="#cb49-285"></a></span> <span><a aria-hidden="true" href="#cb49-286"></a> new_tag <span class="op">=</span> <span class="va">self</span>._rss_xml.createElement(TITLE_TAG)</span> <span><a aria-hidden="true" href="#cb49-287"></a> nodetext <span class="op">=</span> article_data[PubMetaData.title]</span> <span><a aria-hidden="true" href="#cb49-288"></a> textnode <span class="op">=</span> <span class="va">self</span>._rss_xml.createTextNode(nodetext)</span> <span><a aria-hidden="true" href="#cb49-289"></a> new_tag.appendChild(textnode)</span> <span><a aria-hidden="true" href="#cb49-290"></a> item_tag.appendChild(new_tag)</span> <span><a aria-hidden="true" href="#cb49-291"></a></span> <span><a aria-hidden="true" href="#cb49-292"></a> new_tag <span class="op">=</span> <span class="va">self</span>._rss_xml.createElement(LINK_TAG)</span> <span><a aria-hidden="true" href="#cb49-293"></a> nodetext <span class="op">=</span> url</span> <span><a aria-hidden="true" href="#cb49-294"></a> textnode <span class="op">=</span> <span class="va">self</span>._rss_xml.createTextNode(nodetext)</span> <span><a aria-hidden="true" href="#cb49-295"></a> new_tag.appendChild(textnode)</span> <span><a aria-hidden="true" href="#cb49-296"></a> item_tag.appendChild(new_tag)</span> <span><a aria-hidden="true" href="#cb49-297"></a></span> <span><a aria-hidden="true" href="#cb49-298"></a> new_tag <span class="op">=</span> <span class="va">self</span>._rss_xml.createElement(PUBDATE_TAG)</span> <span><a aria-hidden="true" href="#cb49-299"></a> pubdatetime <span class="op">=</span> datetime.datetime.fromisoformat(</span> <span><a aria-hidden="true" href="#cb49-300"></a> article_data[PubMetaData.pubdate])</span> <span><a aria-hidden="true" href="#cb49-301"></a> <span class="co"># running your computer on an english locale</span></span> <span><a aria-hidden="true" href="#cb49-302"></a> <span class="co"># is helpful for the next line.</span></span> <span><a aria-hidden="true" href="#cb49-303"></a> nodetext <span class="op">=</span> pubdatetime.strftime(</span> <span><a aria-hidden="true" href="#cb49-304"></a> <span class="st">"%a, </span><span class="sc">%d</span><span class="st"> %b %Y %H:%M:%S +0000"</span>)</span> <span><a aria-hidden="true" href="#cb49-305"></a> textnode <span class="op">=</span> <span class="va">self</span>._rss_xml.createTextNode(nodetext)</span> <span><a aria-hidden="true" href="#cb49-306"></a> new_tag.appendChild(textnode)</span> <span><a aria-hidden="true" href="#cb49-307"></a> item_tag.appendChild(new_tag)</span> <span><a aria-hidden="true" href="#cb49-308"></a></span> <span><a aria-hidden="true" href="#cb49-309"></a> new_tag <span class="op">=</span> <span class="va">self</span>._rss_xml.createElement(GUID_TAG)</span> <span><a aria-hidden="true" href="#cb49-310"></a> new_tag.setAttribute(<span class="st">"isPermaLink"</span>, <span class="st">"false"</span>)</span> <span><a aria-hidden="true" href="#cb49-311"></a> item_tag.appendChild(new_tag)</span> <span><a aria-hidden="true" href="#cb49-312"></a></span> <span><a aria-hidden="true" href="#cb49-313"></a> new_tag <span class="op">=</span> <span class="va">self</span>._rss_xml.createElement(DESCRIPTION_TAG)</span> <span><a aria-hidden="true" href="#cb49-314"></a> item_tag.appendChild(new_tag)</span> <span><a aria-hidden="true" href="#cb49-315"></a></span> <span><a aria-hidden="true" href="#cb49-316"></a> new_tag <span class="op">=</span> <span class="va">self</span>._rss_xml.createElement(CONTENT_TAG)</span> <span><a aria-hidden="true" href="#cb49-317"></a> item_tag.appendChild(new_tag)</span> <span><a aria-hidden="true" href="#cb49-318"></a></span> <span><a aria-hidden="true" href="#cb49-319"></a> <span class="co"># Processing oldes first, and inserting the items always</span></span> <span><a aria-hidden="true" href="#cb49-320"></a> <span class="co"># before the frst childNode, wie get newest first in the XML.</span></span> <span><a aria-hidden="true" href="#cb49-321"></a> <span class="co"># To become the sepcification compliant, we finalize by moving</span></span> <span><a aria-hidden="true" href="#cb49-322"></a> <span class="co"># all item tags to the end of the channel tag later.</span></span> <span><a aria-hidden="true" href="#cb49-323"></a> channel_tag.insertBefore(item_tag,</span> <span><a aria-hidden="true" href="#cb49-324"></a> channel_tag.childNodes[<span class="dv">0</span>])</span> <span><a aria-hidden="true" href="#cb49-325"></a></span> <span><a aria-hidden="true" href="#cb49-326"></a> <span class="cf">return</span> item_tag</span> <span><a aria-hidden="true" href="#cb49-327"></a></span> <span><a aria-hidden="true" href="#cb49-328"></a> <span class="kw">def</span> _finalize_channel(<span class="va">self</span>, channel_tag):</span> <span><a aria-hidden="true" href="#cb49-329"></a> <span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb49-330"></a><span class="co"> Move the items behind the other channel tags.</span></span> <span><a aria-hidden="true" href="#cb49-331"></a></span> <span><a aria-hidden="true" href="#cb49-332"></a><span class="co"> Take care that the number of items does not exceed ITEM_COUNT.</span></span> <span><a aria-hidden="true" href="#cb49-333"></a><span class="co"> Update the lastBuildDate.</span></span> <span><a aria-hidden="true" href="#cb49-334"></a></span> <span><a aria-hidden="true" href="#cb49-335"></a><span class="co"> Parameters</span></span> <span><a aria-hidden="true" href="#cb49-336"></a><span class="co"> ----------</span></span> <span><a aria-hidden="true" href="#cb49-337"></a><span class="co"> channel_tag : xml.dom.minidom.Tag</span></span> <span><a aria-hidden="true" href="#cb49-338"></a><span class="co"> The &lt;channel&gt; tag from the minidom document model.</span></span> <span><a aria-hidden="true" href="#cb49-339"></a></span> <span><a aria-hidden="true" href="#cb49-340"></a><span class="co"> Returns</span></span> <span><a aria-hidden="true" href="#cb49-341"></a><span class="co"> -------</span></span> <span><a aria-hidden="true" href="#cb49-342"></a><span class="co"> None.</span></span> <span><a aria-hidden="true" href="#cb49-343"></a></span> <span><a aria-hidden="true" href="#cb49-344"></a><span class="co"> """</span></span> <span><a aria-hidden="true" href="#cb49-345"></a> tags <span class="op">=</span> channel_tag.getElementsByTagName(ITEM_TAG)</span> <span><a aria-hidden="true" href="#cb49-346"></a> item_count <span class="op">=</span> <span class="dv">0</span></span> <span><a aria-hidden="true" href="#cb49-347"></a> <span class="cf">for</span> tag <span class="kw">in</span> tags:</span> <span><a aria-hidden="true" href="#cb49-348"></a> <span class="cf">if</span> item_count <span class="op">&lt;</span> ITEM_COUNT:</span> <span><a aria-hidden="true" href="#cb49-349"></a> channel_tag.appendChild(tag)</span> <span><a aria-hidden="true" href="#cb49-350"></a> item_count <span class="op">+=</span> <span class="dv">1</span></span> <span><a aria-hidden="true" href="#cb49-351"></a> <span class="cf">else</span>:</span> <span><a aria-hidden="true" href="#cb49-352"></a> channel_tag.removeChild(tag)</span> <span><a aria-hidden="true" href="#cb49-353"></a></span> <span><a aria-hidden="true" href="#cb49-354"></a> <span class="co"># change last build date</span></span> <span><a aria-hidden="true" href="#cb49-355"></a> <span class="co"># running your computer on an english locale</span></span> <span><a aria-hidden="true" href="#cb49-356"></a> <span class="co"># is helpful for this.</span></span> <span><a aria-hidden="true" href="#cb49-357"></a> tag <span class="op">=</span> channel_tag.getElementsByTagName(LASTBUILD_TAG)[<span class="dv">0</span>]</span> <span><a aria-hidden="true" href="#cb49-358"></a> pubdatetime <span class="op">=</span> datetime.datetime.fromisoformat(</span> <span><a aria-hidden="true" href="#cb49-359"></a> <span class="va">self</span>._nowdate)</span> <span><a aria-hidden="true" href="#cb49-360"></a> nodetext <span class="op">=</span> pubdatetime.strftime(</span> <span><a aria-hidden="true" href="#cb49-361"></a> <span class="st">"%a, </span><span class="sc">%d</span><span class="st"> %b %Y %H:%M:%S +0000"</span>)</span> <span><a aria-hidden="true" href="#cb49-362"></a> tag.childNodes[<span class="dv">0</span>].nodeValue <span class="op">=</span> nodetext</span> <span><a aria-hidden="true" href="#cb49-363"></a></span> <span><a aria-hidden="true" href="#cb49-364"></a> <span class="at">@staticmethod</span></span> <span><a aria-hidden="true" href="#cb49-365"></a> <span class="kw">def</span> _remove_empty_lines(xml_doc):</span> <span><a aria-hidden="true" href="#cb49-366"></a> <span class="co">"""Remove empty lines with and without whitespaces."""</span></span> <span><a aria-hidden="true" href="#cb49-367"></a> pattern <span class="op">=</span> re.<span class="bu">compile</span>(<span class="vs">r"^\s*$"</span>, re.MULTILINE)</span> <span><a aria-hidden="true" href="#cb49-368"></a> xml_doc <span class="op">=</span> pattern.sub(<span class="st">""</span>, xml_doc)</span> <span><a aria-hidden="true" href="#cb49-369"></a> pattern <span class="op">=</span> re.<span class="bu">compile</span>(<span class="vs">r"\n\n"</span>, re.MULTILINE)</span> <span><a aria-hidden="true" href="#cb49-370"></a> xml_doc <span class="op">=</span> pattern.sub(<span class="st">"</span><span class="ch">\n</span><span class="st">"</span>, xml_doc)</span> <span><a aria-hidden="true" href="#cb49-371"></a> <span class="cf">return</span> xml_doc</span> <span><a aria-hidden="true" href="#cb49-372"></a></span> <span><a aria-hidden="true" href="#cb49-373"></a> <span class="kw">def</span> _update(<span class="va">self</span>, article_list, rss_path):</span> <span><a aria-hidden="true" href="#cb49-374"></a> <span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb49-375"></a><span class="co"> Update the RSS file based on the list of changed or added articles.</span></span> <span><a aria-hidden="true" href="#cb49-376"></a></span> <span><a aria-hidden="true" href="#cb49-377"></a><span class="co"> Parameters</span></span> <span><a aria-hidden="true" href="#cb49-378"></a><span class="co"> ----------</span></span> <span><a aria-hidden="true" href="#cb49-379"></a><span class="co"> article_list : List</span></span> <span><a aria-hidden="true" href="#cb49-380"></a><span class="co"> The list of article_data entries of changed or added articles.</span></span> <span><a aria-hidden="true" href="#cb49-381"></a><span class="co"> Oldest posts are first in the list.</span></span> <span><a aria-hidden="true" href="#cb49-382"></a><span class="co"> rss_path : Path</span></span> <span><a aria-hidden="true" href="#cb49-383"></a><span class="co"> The Path to the RSS file.</span></span> <span><a aria-hidden="true" href="#cb49-384"></a></span> <span><a aria-hidden="true" href="#cb49-385"></a><span class="co"> Returns</span></span> <span><a aria-hidden="true" href="#cb49-386"></a><span class="co"> -------</span></span> <span><a aria-hidden="true" href="#cb49-387"></a><span class="co"> None.</span></span> <span><a aria-hidden="true" href="#cb49-388"></a></span> <span><a aria-hidden="true" href="#cb49-389"></a><span class="co"> """</span></span> <span><a aria-hidden="true" href="#cb49-390"></a> <span class="cf">with</span> <span class="bu">open</span>(rss_path, <span class="st">'r'</span>) <span class="im">as</span> rss_file:</span> <span><a aria-hidden="true" href="#cb49-391"></a> <span class="va">self</span>._rss_xml <span class="op">=</span> xml.dom.minidom.parse(rss_file)</span> <span><a aria-hidden="true" href="#cb49-392"></a> channel_tag <span class="op">=</span> <span class="va">self</span>._rss_xml.getElementsByTagName(CHANNEL_TAG)[<span class="dv">0</span>]</span> <span><a aria-hidden="true" href="#cb49-393"></a></span> <span><a aria-hidden="true" href="#cb49-394"></a> <span class="cf">for</span> article_data <span class="kw">in</span> article_list:</span> <span><a aria-hidden="true" href="#cb49-395"></a> <span class="va">self</span>._read_article_tag(article_data)</span> <span><a aria-hidden="true" href="#cb49-396"></a></span> <span><a aria-hidden="true" href="#cb49-397"></a> url <span class="op">=</span> HOST <span class="op">+</span> <span class="st">"article"</span> <span class="op">+\</span></span> <span><a aria-hidden="true" href="#cb49-398"></a> <span class="co">"/"</span> <span class="op">+</span> article_data.name <span class="op">+</span> <span class="st">".html"</span></span> <span><a aria-hidden="true" href="#cb49-399"></a></span> <span><a aria-hidden="true" href="#cb49-400"></a> item_tag <span class="op">=</span> <span class="va">self</span>._get_item_tag(channel_tag, url, article_data)</span> <span><a aria-hidden="true" href="#cb49-401"></a></span> <span><a aria-hidden="true" href="#cb49-402"></a> tag <span class="op">=</span> item_tag.getElementsByTagName(GUID_TAG)[<span class="dv">0</span>]</span> <span><a aria-hidden="true" href="#cb49-403"></a> <span class="co"># Changing the guid on update creates problems with some</span></span> <span><a aria-hidden="true" href="#cb49-404"></a> <span class="co"># consumers</span></span> <span><a aria-hidden="true" href="#cb49-405"></a> nodetext <span class="op">=</span> url <span class="co"># + "-" + self._nowdate</span></span> <span><a aria-hidden="true" href="#cb49-406"></a> <span class="cf">if</span> <span class="kw">not</span> tag.hasChildNodes():</span> <span><a aria-hidden="true" href="#cb49-407"></a> textnode <span class="op">=</span> <span class="va">self</span>._rss_xml.createTextNode(nodetext)</span> <span><a aria-hidden="true" href="#cb49-408"></a> tag.appendChild(textnode)</span> <span><a aria-hidden="true" href="#cb49-409"></a> <span class="cf">else</span>:</span> <span><a aria-hidden="true" href="#cb49-410"></a> tag.childNodes[<span class="dv">0</span>].nodeValue <span class="op">=</span> nodetext</span> <span><a aria-hidden="true" href="#cb49-411"></a></span> <span><a aria-hidden="true" href="#cb49-412"></a> tag <span class="op">=</span> item_tag.getElementsByTagName(DESCRIPTION_TAG)[<span class="dv">0</span>]</span> <span><a aria-hidden="true" href="#cb49-413"></a> nodetext <span class="op">=</span> <span class="st">" "</span>.join(</span> <span><a aria-hidden="true" href="#cb49-414"></a> <span class="va">self</span>._article_tag.find(<span class="st">"p"</span>).text.split())[:<span class="dv">406</span>] <span class="op">+</span> <span class="st">" ..."</span></span> <span><a aria-hidden="true" href="#cb49-415"></a> <span class="cf">if</span> <span class="kw">not</span> tag.hasChildNodes():</span> <span><a aria-hidden="true" href="#cb49-416"></a> textnode <span class="op">=</span> <span class="va">self</span>._rss_xml.createCDATASection(nodetext)</span> <span><a aria-hidden="true" href="#cb49-417"></a> tag.appendChild(textnode)</span> <span><a aria-hidden="true" href="#cb49-418"></a> <span class="cf">else</span>:</span> <span><a aria-hidden="true" href="#cb49-419"></a> tag.childNodes[<span class="dv">0</span>].nodeValue <span class="op">=</span> nodetext</span> <span><a aria-hidden="true" href="#cb49-420"></a></span> <span><a aria-hidden="true" href="#cb49-421"></a> <span class="co"># save the audio uri before the removal</span></span> <span><a aria-hidden="true" href="#cb49-422"></a> <span class="co"># of the header tag</span></span> <span><a aria-hidden="true" href="#cb49-423"></a> url <span class="op">=</span> <span class="va">None</span></span> <span><a aria-hidden="true" href="#cb49-424"></a> tag <span class="op">=</span> <span class="va">self</span>._article_tag.find(AUDIO_TAG)</span> <span><a aria-hidden="true" href="#cb49-425"></a> <span class="cf">if</span> tag:</span> <span><a aria-hidden="true" href="#cb49-426"></a> url <span class="op">=</span> tag.attrs[<span class="st">"src"</span>]</span> <span><a aria-hidden="true" href="#cb49-427"></a></span> <span><a aria-hidden="true" href="#cb49-428"></a> <span class="va">self</span>._article_cleanup()</span> <span><a aria-hidden="true" href="#cb49-429"></a></span> <span><a aria-hidden="true" href="#cb49-430"></a> tag <span class="op">=</span> item_tag.getElementsByTagName(CONTENT_TAG)[<span class="dv">0</span>]</span> <span><a aria-hidden="true" href="#cb49-431"></a> <span class="cf">if</span> tag.hasChildNodes():</span> <span><a aria-hidden="true" href="#cb49-432"></a> tag.removeChild(tag.childNodes[<span class="dv">0</span>])</span> <span><a aria-hidden="true" href="#cb49-433"></a> nodetext <span class="op">=</span> <span class="va">self</span>._article_tag.prettify()</span> <span><a aria-hidden="true" href="#cb49-434"></a> nodetext <span class="op">=</span> <span class="st">" "</span>.join(nodetext.split())</span> <span><a aria-hidden="true" href="#cb49-435"></a> textnode <span class="op">=</span> <span class="va">self</span>._rss_xml.createCDATASection(nodetext)</span> <span><a aria-hidden="true" href="#cb49-436"></a> tag.appendChild(textnode)</span> <span><a aria-hidden="true" href="#cb49-437"></a></span> <span><a aria-hidden="true" href="#cb49-438"></a> <span class="co"># An update might add or update the audio</span></span> <span><a aria-hidden="true" href="#cb49-439"></a> tags <span class="op">=</span> item_tag.getElementsByTagName(ENCLOSURE_TAG)</span> <span><a aria-hidden="true" href="#cb49-440"></a> tag <span class="op">=</span> <span class="va">None</span></span> <span><a aria-hidden="true" href="#cb49-441"></a> <span class="cf">if</span> url <span class="kw">and</span> <span class="bu">len</span>(tags) <span class="op">==</span> <span class="dv">0</span>:</span> <span><a aria-hidden="true" href="#cb49-442"></a> tag <span class="op">=</span> <span class="va">self</span>._rss_xml.createElement(ENCLOSURE_TAG)</span> <span><a aria-hidden="true" href="#cb49-443"></a> item_tag.appendChild(tag)</span> <span><a aria-hidden="true" href="#cb49-444"></a> <span class="cf">elif</span> <span class="bu">len</span>(tags) <span class="op">&gt;</span> <span class="dv">0</span> <span class="kw">and</span> <span class="kw">not</span> url:</span> <span><a aria-hidden="true" href="#cb49-445"></a> item_tag.removeChild(tags[<span class="dv">0</span>])</span> <span><a aria-hidden="true" href="#cb49-446"></a></span> <span><a aria-hidden="true" href="#cb49-447"></a> <span class="co"># Update enclosure tag</span></span> <span><a aria-hidden="true" href="#cb49-448"></a> <span class="cf">if</span> tag:</span> <span><a aria-hidden="true" href="#cb49-449"></a> audio <span class="op">=</span> gmc.audiopath <span class="op">/</span> (pageurn(</span> <span><a aria-hidden="true" href="#cb49-450"></a> article_data[PubMetaData.title]) <span class="op">+</span> <span class="st">".mp3"</span>)</span> <span><a aria-hidden="true" href="#cb49-451"></a> filelength <span class="op">=</span> <span class="dv">0</span></span> <span><a aria-hidden="true" href="#cb49-452"></a> audio.resolve()</span> <span><a aria-hidden="true" href="#cb49-453"></a> <span class="cf">if</span> audio.exists():</span> <span><a aria-hidden="true" href="#cb49-454"></a> filelength <span class="op">=</span> audio.stat().st_size</span> <span><a aria-hidden="true" href="#cb49-455"></a> tag.setAttribute(<span class="st">"url"</span>, url)</span> <span><a aria-hidden="true" href="#cb49-456"></a> tag.setAttribute(<span class="st">"length"</span>, <span class="st">"</span><span class="sc">{}</span><span class="st">"</span>.<span class="bu">format</span>(filelength))</span> <span><a aria-hidden="true" href="#cb49-457"></a> tag.setAttribute(<span class="st">"type"</span>, <span class="st">"audio/mpeg"</span>)</span> <span><a aria-hidden="true" href="#cb49-458"></a></span> <span><a aria-hidden="true" href="#cb49-459"></a> <span class="va">self</span>._finalize_channel(channel_tag)</span> <span><a aria-hidden="true" href="#cb49-460"></a></span> <span><a aria-hidden="true" href="#cb49-461"></a> xml_doc <span class="op">=</span> <span class="va">self</span>._rss_xml.toprettyxml(indent<span class="op">=</span><span class="st">" "</span>, encoding<span class="op">=</span><span class="st">"utf-8"</span>)</span> <span><a aria-hidden="true" href="#cb49-462"></a> xml_doc <span class="op">=</span> <span class="va">self</span>._remove_empty_lines(xml_doc.decode(<span class="st">"utf-8"</span>))</span> <span><a aria-hidden="true" href="#cb49-463"></a></span> <span><a aria-hidden="true" href="#cb49-464"></a> <span class="cf">with</span> <span class="bu">open</span>(rss_path, <span class="st">'w'</span>) <span class="im">as</span> rss_file:</span> <span><a aria-hidden="true" href="#cb49-465"></a> <span class="bu">print</span>(xml_doc, <span class="bu">file</span><span class="op">=</span>rss_file)</span> <span><a aria-hidden="true" href="#cb49-466"></a> rss_file.flush()</span> <span><a aria-hidden="true" href="#cb49-467"></a> rss_file.close()</span></code></pre> </div> <h4> Error Search </h4> <p> The initial code worked nicely, but I didn't get my audio episodes shown in my favorite podcast catcher GPodder on my sailfish OS device. I found out that GPodder contains a lot of python as well, and that the module to parse the rss feed is named podcastparser. </p> <p> Since I didn't see the cause of error with my blinded eyes, I ended up to investigate this with the following test code. </p> <div class="sourceCode"> <pre class="sourceCode python"><code class="sourceCode python"><span><a aria-hidden="true" href="#cb50-1"></a><span class="co">#!/usr/bin/env python3</span></span> <span><a aria-hidden="true" href="#cb50-2"></a><span class="co"># -*- coding: utf-8 -*-</span></span> <span><a aria-hidden="true" href="#cb50-3"></a><span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb50-4"></a><span class="co">Created on Tue Feb 22 13:02:51 2022</span></span> <span><a aria-hidden="true" href="#cb50-5"></a></span> <span><a aria-hidden="true" href="#cb50-6"></a><span class="co">@author: frank</span></span> <span><a aria-hidden="true" href="#cb50-7"></a><span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb50-8"></a></span> <span><a aria-hidden="true" href="#cb50-9"></a><span class="im">import</span> podcastparser</span> <span><a aria-hidden="true" href="#cb50-10"></a><span class="im">import</span> urllib</span> <span><a aria-hidden="true" href="#cb50-11"></a></span> <span><a aria-hidden="true" href="#cb50-12"></a>feedurl <span class="op">=</span> <span class="st">'http://sol:88/idee-rss.xml'</span></span> <span><a aria-hidden="true" href="#cb50-13"></a></span> <span><a aria-hidden="true" href="#cb50-14"></a>parsed <span class="op">=</span> podcastparser.parse(feedurl, urllib.request.urlopen(feedurl))</span> <span><a aria-hidden="true" href="#cb50-15"></a></span> <span><a aria-hidden="true" href="#cb50-16"></a><span class="co"># parsed is a dict</span></span> <span><a aria-hidden="true" href="#cb50-17"></a><span class="im">import</span> pprint</span> <span><a aria-hidden="true" href="#cb50-18"></a>pprint.pprint(parsed)</span></code></pre> </div> <p> Via this excursion I found out the following things: </p> <ul class="incremental"> <li> The reason of error was me using an uri attribute instead of an url attribute in the enclosure tag. </li> <li> This podcastparser support relative links in the RSS file, so most probably others will support this as well. </li> <li> The testcases in their git repository indicate, that CDATA Sections should work nicely. </li> </ul> <h4> Relative Links in RSS </h4> <p> The RSS Advisory Board declares its opinion, that relative links should be supported. The discussion documented on that page also proposes how it should be done, which seems to fit with the podcastparser.py implementation <sup> ( 27 ) </sup> . </p> <p> The proposal boils down to the notion that the channels link element should provide the location, to which the the links are relative. If required, this can be overwritten with the use of the attribute xml:base. </p> <p> Since the link element of the channel should point to the html of the channels index (or entry) page, I felt more comfortable with the dedicated xml:base attribute. </p> <p> Making everything else in the channel elements relative, the RSS Template for my German channel should look like this: </p> <div class="sourceCode"> <pre class="sourceCode xml"><code class="sourceCode xml"><span><a aria-hidden="true" href="#cb51-1"></a><span class="kw">&lt;?xml</span> version="1.0" encoding="UTF-8"<span class="kw">?&gt;</span></span> <span><a aria-hidden="true" href="#cb51-2"></a><span class="kw">&lt;rss</span><span class="ot"> version=</span><span class="st">"2.0"</span></span> <span><a aria-hidden="true" href="#cb51-3"></a><span class="ot"> xml:base=</span><span class="st">"http://sol:88/"</span></span> <span><a aria-hidden="true" href="#cb51-4"></a><span class="ot"> xmlns:content=</span><span class="st">"http://purl.org/rss/1.0/modules/content/"</span></span> <span><a aria-hidden="true" href="#cb51-5"></a><span class="ot"> xmlns:atom=</span><span class="st">"http://www.w3.org/2005/Atom"</span></span> <span><a aria-hidden="true" href="#cb51-6"></a> <span class="kw">&gt;</span></span> <span><a aria-hidden="true" href="#cb51-7"></a> <span class="kw">&lt;channel&gt;</span></span> <span><a aria-hidden="true" href="#cb51-8"></a> <span class="kw">&lt;title&gt;</span>Idee der eigenen Erkenntnis<span class="kw">&lt;/title&gt;</span></span> <span><a aria-hidden="true" href="#cb51-9"></a> <span class="kw">&lt;atom:link</span><span class="ot"> href=</span><span class="st">"idee-rss.xml"</span><span class="ot"> rel=</span><span class="st">"self"</span> </span> <span><a aria-hidden="true" href="#cb51-10"></a><span class="ot"> type=</span><span class="st">"application/rss+xml"</span> <span class="kw">/&gt;</span></span> <span><a aria-hidden="true" href="#cb51-11"></a> <span class="kw">&lt;link&gt;</span>idee.html<span class="kw">&lt;/link&gt;</span></span> <span><a aria-hidden="true" href="#cb51-12"></a> <span class="kw">&lt;description&gt;</span>Idee<span class="kw">&lt;/description&gt;</span></span> <span><a aria-hidden="true" href="#cb51-13"></a> <span class="kw">&lt;lastBuildDate&gt;</span>Tue, 11 Jan 2022 07:54:24 +0000<span class="kw">&lt;/lastBuildDate&gt;</span></span> <span><a aria-hidden="true" href="#cb51-14"></a> <span class="kw">&lt;language&gt;</span>de<span class="kw">&lt;/language&gt;</span></span> <span><a aria-hidden="true" href="#cb51-15"></a> <span class="kw">&lt;generator&gt;</span>pandoc, fs-commit-msg-hook 1.0<span class="kw">&lt;/generator&gt;</span></span> <span><a aria-hidden="true" href="#cb51-16"></a> <span class="kw">&lt;image&gt;</span></span> <span><a aria-hidden="true" href="#cb51-17"></a> <span class="kw">&lt;url&gt;</span>image/favicon-256x256-150x150.png<span class="kw">&lt;/url&gt;</span></span> <span><a aria-hidden="true" href="#cb51-18"></a> <span class="kw">&lt;title&gt;</span>Idee der eigenen Erkenntnis<span class="kw">&lt;/title&gt;</span></span> <span><a aria-hidden="true" href="#cb51-19"></a> <span class="kw">&lt;link&gt;</span>idee.html<span class="kw">&lt;/link&gt;</span></span> <span><a aria-hidden="true" href="#cb51-20"></a> <span class="kw">&lt;width&gt;</span>64<span class="kw">&lt;/width&gt;</span></span> <span><a aria-hidden="true" href="#cb51-21"></a> <span class="kw">&lt;height&gt;</span>64<span class="kw">&lt;/height&gt;</span></span> <span><a aria-hidden="true" href="#cb51-22"></a> <span class="kw">&lt;/image&gt;</span> </span> <span><a aria-hidden="true" href="#cb51-23"></a> <span class="kw">&lt;/channel&gt;</span></span> <span><a aria-hidden="true" href="#cb51-24"></a><span class="kw">&lt;/rss&gt;</span></span></code></pre> </div> <p> Where the current xml:base value is for the testing period only. </p> <p> Thinking further about this, all my article pages are in the folder article. If use that folder as base, all relative links used in the content part should resolve nicely. </p> <p> The channel part then should look as follows: </p> <div class="sourceCode"> <pre class="sourceCode xml"><code class="sourceCode xml"><span><a aria-hidden="true" href="#cb52-1"></a><span class="kw">&lt;?xml</span> version="1.0" encoding="UTF-8"<span class="kw">?&gt;</span></span> <span><a aria-hidden="true" href="#cb52-2"></a><span class="kw">&lt;rss</span><span class="ot"> version=</span><span class="st">"2.0"</span></span> <span><a aria-hidden="true" href="#cb52-3"></a><span class="ot"> xml:base=</span><span class="st">"http://sol:88/article/"</span></span> <span><a aria-hidden="true" href="#cb52-4"></a><span class="ot"> xmlns:content=</span><span class="st">"http://purl.org/rss/1.0/modules/content/"</span></span> <span><a aria-hidden="true" href="#cb52-5"></a><span class="ot"> xmlns:atom=</span><span class="st">"http://www.w3.org/2005/Atom"</span></span> <span><a aria-hidden="true" href="#cb52-6"></a> <span class="kw">&gt;</span></span> <span><a aria-hidden="true" href="#cb52-7"></a> <span class="kw">&lt;channel&gt;</span></span> <span><a aria-hidden="true" href="#cb52-8"></a> <span class="kw">&lt;title&gt;</span>Idee der eigenen Erkenntnis<span class="kw">&lt;/title&gt;</span></span> <span><a aria-hidden="true" href="#cb52-9"></a> <span class="kw">&lt;atom:link</span><span class="ot"> href=</span><span class="st">"../idee-rss.xml"</span><span class="ot"> rel=</span><span class="st">"self"</span> </span> <span><a aria-hidden="true" href="#cb52-10"></a><span class="ot"> type=</span><span class="st">"application/rss+xml"</span> <span class="kw">/&gt;</span></span> <span><a aria-hidden="true" href="#cb52-11"></a> <span class="kw">&lt;link&gt;</span>../idee.html<span class="kw">&lt;/link&gt;</span></span> <span><a aria-hidden="true" href="#cb52-12"></a> <span class="kw">&lt;description&gt;</span>Idee<span class="kw">&lt;/description&gt;</span></span> <span><a aria-hidden="true" href="#cb52-13"></a> <span class="kw">&lt;lastBuildDate&gt;</span>Tue, 11 Jan 2022 07:54:24 +0000<span class="kw">&lt;/lastBuildDate&gt;</span></span> <span><a aria-hidden="true" href="#cb52-14"></a> <span class="kw">&lt;language&gt;</span>de<span class="kw">&lt;/language&gt;</span></span> <span><a aria-hidden="true" href="#cb52-15"></a> <span class="kw">&lt;generator&gt;</span>pandoc, fs-commit-msg-hook 1.0<span class="kw">&lt;/generator&gt;</span></span> <span><a aria-hidden="true" href="#cb52-16"></a> <span class="kw">&lt;image&gt;</span></span> <span><a aria-hidden="true" href="#cb52-17"></a> <span class="kw">&lt;url&gt;</span>../image/favicon-256x256-150x150.png<span class="kw">&lt;/url&gt;</span></span> <span><a aria-hidden="true" href="#cb52-18"></a> <span class="kw">&lt;title&gt;</span>Idee der eigenen Erkenntnis<span class="kw">&lt;/title&gt;</span></span> <span><a aria-hidden="true" href="#cb52-19"></a> <span class="kw">&lt;link&gt;</span>../idee.html<span class="kw">&lt;/link&gt;</span></span> <span><a aria-hidden="true" href="#cb52-20"></a> <span class="kw">&lt;width&gt;</span>64<span class="kw">&lt;/width&gt;</span></span> <span><a aria-hidden="true" href="#cb52-21"></a> <span class="kw">&lt;height&gt;</span>64<span class="kw">&lt;/height&gt;</span></span> <span><a aria-hidden="true" href="#cb52-22"></a> <span class="kw">&lt;/image&gt;</span> </span> <span><a aria-hidden="true" href="#cb52-23"></a> <span class="kw">&lt;/channel&gt;</span></span> <span><a aria-hidden="true" href="#cb52-24"></a><span class="kw">&lt;/rss&gt;</span></span></code></pre> </div> <p> Nice and good thoughts, but it doesn't work as thought. You might use relative links in the channel tags, and it works fine as far as I tested it. But you cannot rely on that for links in the content tag. How the consumer resolves these links, or whether it bothers to try to do this, is something you may not rely on. To be fair, the specification is really unspecific in this respect. </p> <p> According to the official specification even CDATA sections would not work, as they state that all content needs to HTML-escape all special characters. Using a CDATA section instead is much more convenient and turns out, luckily, to be supported by the feed consumers. But CDATA by definition means: "Character Data" not to be parsed (Character Data to be parsed would be PCDATA). </p> <p> Implementators can now argue, that parsing and processing relative links must not be done for the CDATA section in the context of xml:base, and that would be correct. But they could argue also, that CDATA is not to be parsed and processed in any kind, and just to be displayed, and that would be correct as well. </p> <p> I had some hard time to get my article images shown using relative links. In the end I found that images where not not shown because of issues with relative links, but because of tag attributes like alt and title. Also headline tags are not rendered as headline in GPodder, if the e.g. the h2 tag does feature an id attribute, </p> <p> I extended the code to process a number of attribute removals and some tag replacements, which removed my issues. I did this with a code version which used full qualified links and I did not go back to give the relative links one more try. The full qualified links are anyhow least least likely causing problems with any feed reader. </p> <h2> Include the Portal Fragment </h2> <p> I go back to an early discussion. Obviously I failed to find any HTML means to separate the content of the article from the content of the portal, and it does also not look as if something like html-include will become part of HTML and be supported by browsers. </p> <p> But it turns out, that it can be done by the web server, using one of its extension modules. Some examples exist, where the functions add_before_body and add_after_body from "Module ngx_http_addition_module" <sup> ( 28 ) </sup> are used to inject a header and a footer. </p> <p> The article "nginx: Mitigating the BREACH Vulnerability with Perl and SSI or Addition or Substitution Modules — Wild Wild Wolf" <sup> ( 29 ) </sup> is not really about this topic, but it does show that using these two functions we would end up with invalid HTML. Not a big problem, if it works and if this is everything you do care about. </p> <p> The same article shows that the "Module ngx_http_ssi_module" <sup> ( 30 ) </sup> does exactly what's required to perform such an include on the server side. </p> <p> You could now argue, that this establishes a step back from the goal to be completely plain HTML only. But it is, that's my argument, close enough to the feature I would have hoped to have included into the HTML standard. I'm willing to accept that the feature is now provided by the web server. </p> <p> For the implementation this means, that I have to go some steps back and to modify the code, which makes my plain HTML to portal HTML. This part will no longer include the header, but only an comment line with the process instruction to include the header. </p> <p> Since the nginx site configuration becomes now essential part of the implementation, I'll move that into the git repository as well. </p> <h3> Relocate nginx Site Configuration </h3> <p> In this first step the content of the site configuration stays the same. It is just copied via copy+paste from the file sol:/etc/nginx/sites-available/idee_88 into the new file in the git repository. </p> <p> The file name and the used port will change, when I go live. </p> <div class="sourceCode"> <pre class="sourceCode bash"><code class="sourceCode bash"><span><a aria-hidden="true" href="#cb53-1"></a><span class="ex">frank</span> @Asimov:~/projects/idee$ mkdir nginx</span> <span><a aria-hidden="true" href="#cb53-2"></a><span class="ex">frank</span> @Asimov:~/projects/idee$ cd nginx/</span> <span><a aria-hidden="true" href="#cb53-3"></a><span class="ex">frank</span> @Asimov:~/projects/idee/nginx$ vim idee_88</span> <span><a aria-hidden="true" href="#cb53-4"></a><span class="ex">frank</span> @Asimov:~/projects/idee/nginx$ git add .</span> <span><a aria-hidden="true" href="#cb53-5"></a><span class="ex">frank</span> @Asimov:~/projects/idee/nginx$ git commit</span> <span><a aria-hidden="true" href="#cb53-6"></a><span class="ex">frank</span> @Asimov:~/projects/idee/nginx$ git push</span> <span><a aria-hidden="true" href="#cb53-7"></a><span class="ex">Enter</span> passphrase for key <span class="st">'/home/frank/.ssh/id_rsa'</span>: </span> <span><a aria-hidden="true" href="#cb53-8"></a><span class="ex">Enumerating</span> objects: 5, done.</span> <span><a aria-hidden="true" href="#cb53-9"></a><span class="ex">Counting</span> objects: 100% (5/5), <span class="kw">done</span><span class="ex">.</span></span> <span><a aria-hidden="true" href="#cb53-10"></a><span class="ex">Delta</span> compression using up to 4 threads</span> <span><a aria-hidden="true" href="#cb53-11"></a><span class="ex">Compressing</span> objects: 100% (3/3), <span class="kw">done</span><span class="ex">.</span></span> <span><a aria-hidden="true" href="#cb53-12"></a><span class="ex">Writing</span> objects: 100% (4/4), <span class="ex">750</span> bytes <span class="kw">|</span> <span class="ex">750.00</span> KiB/s, done.</span> <span><a aria-hidden="true" href="#cb53-13"></a><span class="ex">Total</span> 4 (delta 1), <span class="ex">reused</span> 0 (delta 0), <span class="ex">pack-reused</span> 0</span> <span><a aria-hidden="true" href="#cb53-14"></a><span class="ex">remote</span>: From /home/git/idee</span> <span><a aria-hidden="true" href="#cb53-15"></a><span class="ex">remote</span>: 8316cca..eabae10 master -<span class="op">&gt;</span> origin/master</span> <span><a aria-hidden="true" href="#cb53-16"></a><span class="ex">remote</span>: Updating 8316cca..eabae10</span> <span><a aria-hidden="true" href="#cb53-17"></a><span class="ex">remote</span>: Fast-forward</span> <span><a aria-hidden="true" href="#cb53-18"></a><span class="ex">remote</span>: nginx/idee_88 <span class="kw">|</span> <span class="ex">36</span> ++++++++++++++++++++++++++++++++++++</span> <span><a aria-hidden="true" href="#cb53-19"></a><span class="ex">remote</span>: 1 file changed, 36 insertions(+)</span> <span><a aria-hidden="true" href="#cb53-20"></a><span class="ex">remote</span>: create mode 100644 nginx/idee_88</span> <span><a aria-hidden="true" href="#cb53-21"></a><span class="ex">To</span> ssh://sol/home/git/idee.git</span> <span><a aria-hidden="true" href="#cb53-22"></a> <span class="ex">8316cca..eabae10</span> master -<span class="op">&gt;</span> master</span></code></pre> </div> <div class="sourceCode"> <pre class="sourceCode bash"><code class="sourceCode bash"><span><a aria-hidden="true" href="#cb54-1"></a><span class="ex">root</span> @sol:/etc/nginx/sites-available# rm idee_88 </span> <span><a aria-hidden="true" href="#cb54-2"></a><span class="ex">root</span> @sol:/etc/nginx/sites-available# ln -s /var/www/idee/nginx/idee_88 </span> <span><a aria-hidden="true" href="#cb54-3"></a><span class="ex">root</span> @sol:/etc/nginx/sites-available# nginx -t</span> <span><a aria-hidden="true" href="#cb54-4"></a><span class="ex">nginx</span>: the configuration file /etc/nginx/nginx.conf syntax is ok</span> <span><a aria-hidden="true" href="#cb54-5"></a><span class="ex">nginx</span>: configuration file /etc/nginx/nginx.conf test is successful</span> <span><a aria-hidden="true" href="#cb54-6"></a><span class="ex">root</span> @sol:/etc/nginx/sites-available# nginx -s reload</span></code></pre> </div> <h3> SSI Installation </h3> <p> SSI stands for Server Side Injection. </p> <p> The required SSI module is included in the nginx-extras package. But it turns out to be also already part of the nginx-full package, which I already have installed. </p> <div class="sourceCode"> <pre class="sourceCode bash"><code class="sourceCode bash"><span><a aria-hidden="true" href="#cb55-1"></a><span class="ex">root</span> @sol:/etc/nginx/sites-available# apt-cache show nginx-full</span> <span><a aria-hidden="true" href="#cb55-2"></a>[<span class="ex">...</span>]</span> <span><a aria-hidden="true" href="#cb55-3"></a> <span class="ex">OPTIONAL</span> HTTP MODULES: Addition, Auth Request, Charset, WebDAV, GeoIP, Gunzip,</span> <span><a aria-hidden="true" href="#cb55-4"></a> <span class="ex">Gzip</span>, Gzip Precompression, Headers, HTTP/2, Image Filter, Index, Log, Real IP,</span> <span><a aria-hidden="true" href="#cb55-5"></a> <span class="ex">Slice</span>, SSI, SSL, Stream, SSL Preread, Stub Status, Substitution, Thread Pool,</span> <span><a aria-hidden="true" href="#cb55-6"></a> <span class="ex">Upstream</span>, User ID, XSLT.</span> <span><a aria-hidden="true" href="#cb55-7"></a>[<span class="ex">...</span>]</span></code></pre> </div> <p> No further installation required. </p> <h3> SSI Configuration </h3> <p> The nginx site configuration needs one additional line to active SSI for the location. </p> <pre class="nginx"><code> location / { ssi on; # First attempt to serve request as file, then # as directory, then fall back to displaying a 404. try_files $uri %uri.html $uri/ =404; }</code></pre> <h3> Include Instruction in the HTML </h3> <div class="sourceCode"> <pre class="sourceCode html"><code class="sourceCode html"><span><a aria-hidden="true" href="#cb57-1"></a><span class="kw">&lt;html</span><span class="ot"> lang=</span><span class="st">"de-DE"</span><span class="ot"> xml:lang=</span><span class="st">"de-DE"</span><span class="ot"> xmlns=</span><span class="st">"http://www.w3.org/1999/xhtml"</span><span class="kw">&gt;</span></span> <span><a aria-hidden="true" href="#cb57-2"></a> <span class="kw">&lt;head&gt;</span></span> <span><a aria-hidden="true" href="#cb57-3"></a> <span class="kw">&lt;meta</span><span class="ot"> charset=</span><span class="st">"utf-8"</span><span class="kw">/&gt;</span></span> <span><a aria-hidden="true" href="#cb57-4"></a> [...]</span> <span><a aria-hidden="true" href="#cb57-5"></a> <span class="kw">&lt;/head&gt;</span></span> <span><a aria-hidden="true" href="#cb57-6"></a> <span class="kw">&lt;body&gt;</span></span> <span><a aria-hidden="true" href="#cb57-7"></a><span class="co">&lt;!--# include file="portal/idee_header.html" --&gt;</span></span> <span><a aria-hidden="true" href="#cb57-8"></a> <span class="kw">&lt;main&gt;</span></span> <span><a aria-hidden="true" href="#cb57-9"></a> <span class="kw">&lt;article&gt;</span></span> <span><a aria-hidden="true" href="#cb57-10"></a> [...]</span> <span><a aria-hidden="true" href="#cb57-11"></a> <span class="kw">&lt;/article&gt;</span></span> <span><a aria-hidden="true" href="#cb57-12"></a> <span class="kw">&lt;/main&gt;</span></span> <span><a aria-hidden="true" href="#cb57-13"></a> <span class="kw">&lt;/body&gt;</span></span> <span><a aria-hidden="true" href="#cb57-14"></a><span class="kw">&lt;/html&gt;</span></span></code></pre> </div> <p> For english pages the include will reference the file portal/concept_header.html. The documentation is not explaining the reference directory for the include path. Is it simply the webroot, or is it the html document location? As you see I guess it is the webroot, which is also simpler for my implementation. </p> <p> First tests upfront implementation shows that the assumption is not correct. The above sample leads to Error 404 (included header page was not found). Defining it relative to the article is correct, but considered unsave: </p> <p> 2022/03/02 11:07:32 [error] 13045#13045: *396623 unsafe URI "/article/../portal/idee_header.html" was detected while sending response to client, client: 10.19.67.21, server: _, request: "GET /article/endlich.html HTTP/1.1", host: "sol:88" </p> <p> Since there was only on last choice, that one turned out work. </p> <div class="sourceCode"> <pre class="sourceCode html"><code class="sourceCode html"><span><a aria-hidden="true" href="#cb58-1"></a>[...]</span> <span><a aria-hidden="true" href="#cb58-2"></a> <span class="kw">&lt;body&gt;</span></span> <span><a aria-hidden="true" href="#cb58-3"></a><span class="co">&lt;!--# include file="/portal/idee-header.html" --&gt;</span></span> <span><a aria-hidden="true" href="#cb58-4"></a> <span class="kw">&lt;main&gt;</span></span> <span><a aria-hidden="true" href="#cb58-5"></a>[...]</span></code></pre> </div> <p> You might notice also, that I decided to rename the header file to use an hypen instead of an underscore. This was just for consistency in my file names. Note that this is only a chapter in a much longer description. </p> <p> see </p> <p> <onlyinclude> == Create Article Archive Pages == The article archive will be separated into an English and a German archive and be organized by year and month. The archive will not be presented as list in a drop-down, as it is in WordPress the case, instead archive pages will be created. </onlyinclude> </p> <p> <strong> ~/projects/idee/generator/archive.py </strong> </p> <div class="sourceCode"> <pre class="sourceCode Python"><code class="sourceCode python"><span><a aria-hidden="true" href="#cb59-1"></a><span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb59-2"></a><span class="co">Update the archive of the webseite.</span></span> <span><a aria-hidden="true" href="#cb59-3"></a></span> <span><a aria-hidden="true" href="#cb59-4"></a><span class="co">@author: Frank Siebert</span></span> <span><a aria-hidden="true" href="#cb59-5"></a><span class="co">@license: https://creativecommons.org/publicdomain/zero/1.0/deed.en</span></span> <span><a aria-hidden="true" href="#cb59-6"></a><span class="co">@date: 2022-03-15</span></span> <span><a aria-hidden="true" href="#cb59-7"></a><span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb59-8"></a><span class="im">import</span> re</span> <span><a aria-hidden="true" href="#cb59-9"></a><span class="im">import</span> datetime</span> <span><a aria-hidden="true" href="#cb59-10"></a></span> <span><a aria-hidden="true" href="#cb59-11"></a><span class="im">from</span> bs4 <span class="im">import</span> BeautifulSoup</span> <span><a aria-hidden="true" href="#cb59-12"></a><span class="im">from</span> bs4 <span class="im">import</span> Comment</span> <span><a aria-hidden="true" href="#cb59-13"></a><span class="im">from</span> bs4.builder._htmlparser <span class="im">import</span> HTMLParserTreeBuilder</span> <span><a aria-hidden="true" href="#cb59-14"></a></span> <span><a aria-hidden="true" href="#cb59-15"></a><span class="im">from</span> pubmetadata <span class="im">import</span> PubMetaData</span> <span><a aria-hidden="true" href="#cb59-16"></a><span class="im">from</span> gitmsgconstants <span class="im">import</span> GitMsgConstants <span class="im">as</span> gmc</span> <span><a aria-hidden="true" href="#cb59-17"></a></span> <span><a aria-hidden="true" href="#cb59-18"></a><span class="kw">class</span> Archive():</span> <span><a aria-hidden="true" href="#cb59-19"></a> <span class="co">"""Manage all changees in the archive."""</span></span> <span><a aria-hidden="true" href="#cb59-20"></a></span> <span><a aria-hidden="true" href="#cb59-21"></a> <span class="kw">def</span> <span class="fu">__init__</span>(<span class="va">self</span>):</span> <span><a aria-hidden="true" href="#cb59-22"></a> <span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb59-23"></a><span class="co"> Initialize changelists.</span></span> <span><a aria-hidden="true" href="#cb59-24"></a></span> <span><a aria-hidden="true" href="#cb59-25"></a><span class="co"> Returns</span></span> <span><a aria-hidden="true" href="#cb59-26"></a><span class="co"> -------</span></span> <span><a aria-hidden="true" href="#cb59-27"></a><span class="co"> None.</span></span> <span><a aria-hidden="true" href="#cb59-28"></a></span> <span><a aria-hidden="true" href="#cb59-29"></a><span class="co"> """</span></span> <span><a aria-hidden="true" href="#cb59-30"></a> <span class="co"># map information for German page changes on site "Idee".</span></span> <span><a aria-hidden="true" href="#cb59-31"></a> <span class="va">self</span>.de_list <span class="op">=</span> []</span> <span><a aria-hidden="true" href="#cb59-32"></a> <span class="co"># map information for English page changes on site "Concept".</span></span> <span><a aria-hidden="true" href="#cb59-33"></a> <span class="va">self</span>.en_list <span class="op">=</span> []</span> <span><a aria-hidden="true" href="#cb59-34"></a> <span class="co"># The time of the update</span></span> <span><a aria-hidden="true" href="#cb59-35"></a> <span class="va">self</span>._nowdate <span class="op">=</span> datetime.datetime.now().isoformat()</span> <span><a aria-hidden="true" href="#cb59-36"></a></span> <span><a aria-hidden="true" href="#cb59-37"></a> <span class="kw">def</span> update(<span class="va">self</span>):</span> <span><a aria-hidden="true" href="#cb59-38"></a> <span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb59-39"></a><span class="co"> Iterate over changes and update respective archive pages.</span></span> <span><a aria-hidden="true" href="#cb59-40"></a></span> <span><a aria-hidden="true" href="#cb59-41"></a><span class="co"> Add the archive pages to their respective change list.</span></span> <span><a aria-hidden="true" href="#cb59-42"></a></span> <span><a aria-hidden="true" href="#cb59-43"></a><span class="co"> The information about the changed html pages comes from</span></span> <span><a aria-hidden="true" href="#cb59-44"></a><span class="co"> PubMetaData.instance._updates and</span></span> <span><a aria-hidden="true" href="#cb59-45"></a><span class="co"> PubMetaData.instance._deletions .</span></span> <span><a aria-hidden="true" href="#cb59-46"></a></span> <span><a aria-hidden="true" href="#cb59-47"></a><span class="co"> Returns</span></span> <span><a aria-hidden="true" href="#cb59-48"></a><span class="co"> -------</span></span> <span><a aria-hidden="true" href="#cb59-49"></a><span class="co"> None.</span></span> <span><a aria-hidden="true" href="#cb59-50"></a></span> <span><a aria-hidden="true" href="#cb59-51"></a><span class="co"> """</span></span> <span><a aria-hidden="true" href="#cb59-52"></a> <span class="cf">for</span> article_data <span class="kw">in</span> PubMetaData.instance._updates:</span> <span><a aria-hidden="true" href="#cb59-53"></a> creation_month <span class="op">=</span> article_data[PubMetaData.pubdate][<span class="dv">0</span>:<span class="dv">7</span>]</span> <span><a aria-hidden="true" href="#cb59-54"></a> site <span class="op">=</span> article_data[PubMetaData.site]</span> <span><a aria-hidden="true" href="#cb59-55"></a> archive_path <span class="op">=</span> site.lower() <span class="op">+</span> <span class="st">"-"</span> <span class="op">+</span> creation_month <span class="op">+</span> <span class="st">".html"</span></span> <span><a aria-hidden="true" href="#cb59-56"></a> archive_path <span class="op">=</span> gmc.archivepath <span class="op">/</span> archive_path</span> <span><a aria-hidden="true" href="#cb59-57"></a></span> <span><a aria-hidden="true" href="#cb59-58"></a> <span class="cf">if</span> site <span class="op">==</span> <span class="st">"Idee"</span>:</span> <span><a aria-hidden="true" href="#cb59-59"></a> <span class="cf">if</span> archive_path <span class="kw">not</span> <span class="kw">in</span> <span class="va">self</span>.de_list:</span> <span><a aria-hidden="true" href="#cb59-60"></a> <span class="va">self</span>.de_list.append(archive_path)</span> <span><a aria-hidden="true" href="#cb59-61"></a> <span class="cf">else</span>:</span> <span><a aria-hidden="true" href="#cb59-62"></a> <span class="cf">if</span> archive_path <span class="kw">not</span> <span class="kw">in</span> <span class="va">self</span>.en_list:</span> <span><a aria-hidden="true" href="#cb59-63"></a> <span class="va">self</span>.en_list.append(archive_path)</span> <span><a aria-hidden="true" href="#cb59-64"></a></span> <span><a aria-hidden="true" href="#cb59-65"></a> <span class="cf">if</span> article_data.name <span class="op">!=</span> <span class="st">"rechtliches"</span> \</span> <span><a aria-hidden="true" href="#cb59-66"></a> <span class="kw">and</span> article_data.name <span class="op">!=</span> <span class="st">"legal"</span>:</span> <span><a aria-hidden="true" href="#cb59-67"></a> soup <span class="op">=</span> <span class="va">self</span>._update(archive_path, article_data)</span> <span><a aria-hidden="true" href="#cb59-68"></a> html_doc <span class="op">=</span> soup.prettify()</span> <span><a aria-hidden="true" href="#cb59-69"></a></span> <span><a aria-hidden="true" href="#cb59-70"></a> <span class="cf">with</span> <span class="bu">open</span>(archive_path, <span class="st">'w'</span>) <span class="im">as</span> archive_file:</span> <span><a aria-hidden="true" href="#cb59-71"></a> <span class="bu">print</span>(html_doc, <span class="bu">file</span><span class="op">=</span>archive_file)</span> <span><a aria-hidden="true" href="#cb59-72"></a> archive_file.flush()</span> <span><a aria-hidden="true" href="#cb59-73"></a> archive_file.close()</span> <span><a aria-hidden="true" href="#cb59-74"></a></span> <span><a aria-hidden="true" href="#cb59-75"></a> <span class="cf">for</span> article_data <span class="kw">in</span> PubMetaData.instance._deletions:</span> <span><a aria-hidden="true" href="#cb59-76"></a> <span class="co"># </span><span class="al">TODO</span></span> <span><a aria-hidden="true" href="#cb59-77"></a> <span class="cf">pass</span></span> <span><a aria-hidden="true" href="#cb59-78"></a></span> <span><a aria-hidden="true" href="#cb59-79"></a> <span class="va">self</span>._update_de()</span> <span><a aria-hidden="true" href="#cb59-80"></a> <span class="va">self</span>._update_en()</span> <span><a aria-hidden="true" href="#cb59-81"></a></span> <span><a aria-hidden="true" href="#cb59-82"></a> <span class="kw">def</span> _update_de(<span class="va">self</span>):</span> <span><a aria-hidden="true" href="#cb59-83"></a> <span class="co">"""Update idee-archive.html."""</span></span> <span><a aria-hidden="true" href="#cb59-84"></a> <span class="cf">if</span> <span class="bu">len</span>(<span class="va">self</span>.de_list) <span class="op">==</span> <span class="dv">0</span>:</span> <span><a aria-hidden="true" href="#cb59-85"></a> <span class="cf">return</span></span> <span><a aria-hidden="true" href="#cb59-86"></a></span> <span><a aria-hidden="true" href="#cb59-87"></a> <span class="cf">with</span> <span class="bu">open</span>(gmc.idee_archive, <span class="st">'r'</span>) <span class="im">as</span> archive_file:</span> <span><a aria-hidden="true" href="#cb59-88"></a> html_doc <span class="op">=</span> archive_file.read()</span> <span><a aria-hidden="true" href="#cb59-89"></a> archive_file.flush()</span> <span><a aria-hidden="true" href="#cb59-90"></a> archive_file.close()</span> <span><a aria-hidden="true" href="#cb59-91"></a></span> <span><a aria-hidden="true" href="#cb59-92"></a> builder <span class="op">=</span> HTMLParserTreeBuilder</span> <span><a aria-hidden="true" href="#cb59-93"></a> soup <span class="op">=</span> BeautifulSoup(html_doc, builder<span class="op">=</span>builder)</span> <span><a aria-hidden="true" href="#cb59-94"></a></span> <span><a aria-hidden="true" href="#cb59-95"></a> <span class="cf">for</span> archive_path <span class="kw">in</span> <span class="va">self</span>.de_list:</span> <span><a aria-hidden="true" href="#cb59-96"></a> url <span class="op">=</span> <span class="st">'./'</span> <span class="op">+</span> archive_path.name</span> <span><a aria-hidden="true" href="#cb59-97"></a> tag <span class="op">=</span> soup.find(<span class="st">"a"</span>, href<span class="op">=</span>re.<span class="bu">compile</span>(<span class="vs">r""</span> <span class="op">+</span> url))</span> <span><a aria-hidden="true" href="#cb59-98"></a></span> <span><a aria-hidden="true" href="#cb59-99"></a> <span class="cf">if</span> <span class="kw">not</span> tag:</span> <span><a aria-hidden="true" href="#cb59-100"></a> tag <span class="op">=</span> soup.find(<span class="st">"main"</span>)</span> <span><a aria-hidden="true" href="#cb59-101"></a> new_tag <span class="op">=</span> soup.new_tag(<span class="st">"h3"</span>)</span> <span><a aria-hidden="true" href="#cb59-102"></a> tag.insert(<span class="dv">0</span>, new_tag)</span> <span><a aria-hidden="true" href="#cb59-103"></a> tag <span class="op">=</span> new_tag</span> <span><a aria-hidden="true" href="#cb59-104"></a> new_tag <span class="op">=</span> soup.new_tag(<span class="st">"a"</span>)</span> <span><a aria-hidden="true" href="#cb59-105"></a> new_tag.attrs.update({<span class="st">"href"</span>: url})</span> <span><a aria-hidden="true" href="#cb59-106"></a> new_tag.string <span class="op">=</span> archive_path.name</span> <span><a aria-hidden="true" href="#cb59-107"></a> tag.append(new_tag)</span> <span><a aria-hidden="true" href="#cb59-108"></a></span> <span><a aria-hidden="true" href="#cb59-109"></a> html_doc <span class="op">=</span> soup.prettify()</span> <span><a aria-hidden="true" href="#cb59-110"></a></span> <span><a aria-hidden="true" href="#cb59-111"></a> <span class="cf">with</span> <span class="bu">open</span>(gmc.idee_archive, <span class="st">'w'</span>) <span class="im">as</span> archive_file:</span> <span><a aria-hidden="true" href="#cb59-112"></a> <span class="bu">print</span>(html_doc, <span class="bu">file</span><span class="op">=</span>archive_file)</span> <span><a aria-hidden="true" href="#cb59-113"></a> archive_file.flush()</span> <span><a aria-hidden="true" href="#cb59-114"></a> archive_file.close()</span> <span><a aria-hidden="true" href="#cb59-115"></a></span> <span><a aria-hidden="true" href="#cb59-116"></a> <span class="kw">def</span> _update_en(<span class="va">self</span>):</span> <span><a aria-hidden="true" href="#cb59-117"></a> <span class="co">"""Update concept-archive.html."""</span></span> <span><a aria-hidden="true" href="#cb59-118"></a> <span class="cf">if</span> <span class="bu">len</span>(<span class="va">self</span>.en_list) <span class="op">==</span> <span class="dv">0</span>:</span> <span><a aria-hidden="true" href="#cb59-119"></a> <span class="cf">return</span></span> <span><a aria-hidden="true" href="#cb59-120"></a></span> <span><a aria-hidden="true" href="#cb59-121"></a> <span class="cf">with</span> <span class="bu">open</span>(gmc.concept_archive, <span class="st">'r'</span>) <span class="im">as</span> archive_file:</span> <span><a aria-hidden="true" href="#cb59-122"></a> html_doc <span class="op">=</span> archive_file.read()</span> <span><a aria-hidden="true" href="#cb59-123"></a> archive_file.flush()</span> <span><a aria-hidden="true" href="#cb59-124"></a> archive_file.close()</span> <span><a aria-hidden="true" href="#cb59-125"></a></span> <span><a aria-hidden="true" href="#cb59-126"></a> builder <span class="op">=</span> HTMLParserTreeBuilder</span> <span><a aria-hidden="true" href="#cb59-127"></a> soup <span class="op">=</span> BeautifulSoup(html_doc, builder<span class="op">=</span>builder)</span> <span><a aria-hidden="true" href="#cb59-128"></a></span> <span><a aria-hidden="true" href="#cb59-129"></a> <span class="cf">for</span> archive_path <span class="kw">in</span> <span class="va">self</span>.en_list:</span> <span><a aria-hidden="true" href="#cb59-130"></a> url <span class="op">=</span> <span class="st">'./'</span> <span class="op">+</span> archive_path.name</span> <span><a aria-hidden="true" href="#cb59-131"></a> tag <span class="op">=</span> soup.find(<span class="st">"a"</span>, href<span class="op">=</span>re.<span class="bu">compile</span>(<span class="vs">r""</span> <span class="op">+</span> url))</span> <span><a aria-hidden="true" href="#cb59-132"></a></span> <span><a aria-hidden="true" href="#cb59-133"></a> <span class="cf">if</span> <span class="kw">not</span> tag:</span> <span><a aria-hidden="true" href="#cb59-134"></a> tag <span class="op">=</span> soup.find(<span class="st">"main"</span>)</span> <span><a aria-hidden="true" href="#cb59-135"></a> new_tag <span class="op">=</span> soup.new_tag(<span class="st">"h3"</span>)</span> <span><a aria-hidden="true" href="#cb59-136"></a> tag.insert(<span class="dv">0</span>, new_tag)</span> <span><a aria-hidden="true" href="#cb59-137"></a> tag <span class="op">=</span> new_tag</span> <span><a aria-hidden="true" href="#cb59-138"></a> new_tag <span class="op">=</span> soup.new_tag(<span class="st">"a"</span>)</span> <span><a aria-hidden="true" href="#cb59-139"></a> new_tag.attrs.update({<span class="st">"href"</span>: url})</span> <span><a aria-hidden="true" href="#cb59-140"></a> new_tag.string <span class="op">=</span> archive_path.name</span> <span><a aria-hidden="true" href="#cb59-141"></a> tag.append(new_tag)</span> <span><a aria-hidden="true" href="#cb59-142"></a></span> <span><a aria-hidden="true" href="#cb59-143"></a> html_doc <span class="op">=</span> soup.prettify()</span> <span><a aria-hidden="true" href="#cb59-144"></a></span> <span><a aria-hidden="true" href="#cb59-145"></a> <span class="cf">with</span> <span class="bu">open</span>(gmc.concept_archive, <span class="st">'w'</span>) <span class="im">as</span> archive_file:</span> <span><a aria-hidden="true" href="#cb59-146"></a> <span class="bu">print</span>(html_doc, <span class="bu">file</span><span class="op">=</span>archive_file)</span> <span><a aria-hidden="true" href="#cb59-147"></a> archive_file.flush()</span> <span><a aria-hidden="true" href="#cb59-148"></a> archive_file.close()</span> <span><a aria-hidden="true" href="#cb59-149"></a></span> <span><a aria-hidden="true" href="#cb59-150"></a> <span class="at">@staticmethod</span></span> <span><a aria-hidden="true" href="#cb59-151"></a> <span class="kw">def</span> _get_abstract(article_data):</span> <span><a aria-hidden="true" href="#cb59-152"></a> <span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb59-153"></a><span class="co"> Read the abstract of the processed article.</span></span> <span><a aria-hidden="true" href="#cb59-154"></a></span> <span><a aria-hidden="true" href="#cb59-155"></a><span class="co"> The abstract consists of the first 406 characters of the first</span></span> <span><a aria-hidden="true" href="#cb59-156"></a><span class="co"> &lt;p&gt; tag, or less, if the respective string is shorter.</span></span> <span><a aria-hidden="true" href="#cb59-157"></a></span> <span><a aria-hidden="true" href="#cb59-158"></a><span class="co"> Returns</span></span> <span><a aria-hidden="true" href="#cb59-159"></a><span class="co"> -------</span></span> <span><a aria-hidden="true" href="#cb59-160"></a><span class="co"> Str.</span></span> <span><a aria-hidden="true" href="#cb59-161"></a></span> <span><a aria-hidden="true" href="#cb59-162"></a><span class="co"> """</span></span> <span><a aria-hidden="true" href="#cb59-163"></a> articlepath <span class="op">=</span> gmc.articlepath <span class="op">/</span> article_data.name</span> <span><a aria-hidden="true" href="#cb59-164"></a> articlepath <span class="op">=</span> articlepath.with_suffix(<span class="st">".html"</span>)</span> <span><a aria-hidden="true" href="#cb59-165"></a> articlepath.resolve()</span> <span><a aria-hidden="true" href="#cb59-166"></a></span> <span><a aria-hidden="true" href="#cb59-167"></a> <span class="cf">with</span> <span class="bu">open</span>(articlepath, <span class="st">'r'</span>) <span class="im">as</span> infile:</span> <span><a aria-hidden="true" href="#cb59-168"></a> html_doc <span class="op">=</span> infile.read()</span> <span><a aria-hidden="true" href="#cb59-169"></a> infile.flush()</span> <span><a aria-hidden="true" href="#cb59-170"></a> infile.close()</span> <span><a aria-hidden="true" href="#cb59-171"></a></span> <span><a aria-hidden="true" href="#cb59-172"></a> builder <span class="op">=</span> HTMLParserTreeBuilder()</span> <span><a aria-hidden="true" href="#cb59-173"></a> soup <span class="op">=</span> BeautifulSoup(html_doc, builder<span class="op">=</span>builder)</span> <span><a aria-hidden="true" href="#cb59-174"></a></span> <span><a aria-hidden="true" href="#cb59-175"></a> tag <span class="op">=</span> soup.find(<span class="st">"p"</span>)</span> <span><a aria-hidden="true" href="#cb59-176"></a> <span class="cf">return</span> <span class="st">" "</span>.join(tag.text.split())[<span class="dv">0</span>:<span class="dv">406</span>]</span> <span><a aria-hidden="true" href="#cb59-177"></a></span> <span><a aria-hidden="true" href="#cb59-178"></a> <span class="at">@staticmethod</span></span> <span><a aria-hidden="true" href="#cb59-179"></a> <span class="kw">def</span> _update(archive_path, article_data, article_loc<span class="op">=</span><span class="st">"../article/"</span>):</span> <span><a aria-hidden="true" href="#cb59-180"></a> is_new <span class="op">=</span> <span class="va">None</span></span> <span><a aria-hidden="true" href="#cb59-181"></a> archive_path.resolve()</span> <span><a aria-hidden="true" href="#cb59-182"></a> <span class="cf">if</span> archive_path.exists():</span> <span><a aria-hidden="true" href="#cb59-183"></a> <span class="cf">with</span> <span class="bu">open</span>(archive_path, <span class="st">'r'</span>) <span class="im">as</span> archive_file:</span> <span><a aria-hidden="true" href="#cb59-184"></a> html_doc <span class="op">=</span> archive_file.read()</span> <span><a aria-hidden="true" href="#cb59-185"></a> archive_file.flush()</span> <span><a aria-hidden="true" href="#cb59-186"></a> archive_file.close()</span> <span><a aria-hidden="true" href="#cb59-187"></a> is_new <span class="op">=</span> <span class="va">False</span></span> <span><a aria-hidden="true" href="#cb59-188"></a> <span class="cf">else</span>:</span> <span><a aria-hidden="true" href="#cb59-189"></a> gmc.archive_template.resolve()</span> <span><a aria-hidden="true" href="#cb59-190"></a> <span class="cf">with</span> <span class="bu">open</span>(gmc.archive_template, <span class="st">'r'</span>) <span class="im">as</span> archive_file:</span> <span><a aria-hidden="true" href="#cb59-191"></a> html_doc <span class="op">=</span> archive_file.read()</span> <span><a aria-hidden="true" href="#cb59-192"></a> archive_file.flush()</span> <span><a aria-hidden="true" href="#cb59-193"></a> archive_file.close()</span> <span><a aria-hidden="true" href="#cb59-194"></a> is_new <span class="op">=</span> <span class="va">True</span></span> <span><a aria-hidden="true" href="#cb59-195"></a></span> <span><a aria-hidden="true" href="#cb59-196"></a> builder <span class="op">=</span> HTMLParserTreeBuilder</span> <span><a aria-hidden="true" href="#cb59-197"></a> soup <span class="op">=</span> BeautifulSoup(html_doc, builder<span class="op">=</span>builder)</span> <span><a aria-hidden="true" href="#cb59-198"></a></span> <span><a aria-hidden="true" href="#cb59-199"></a> <span class="cf">if</span> is_new:</span> <span><a aria-hidden="true" href="#cb59-200"></a> tag <span class="op">=</span> soup.find(<span class="st">"body"</span>)</span> <span><a aria-hidden="true" href="#cb59-201"></a> <span class="co"># SSI header injection is a function of the language</span></span> <span><a aria-hidden="true" href="#cb59-202"></a> <span class="cf">if</span> article_data[PubMetaData.locale].startswith(<span class="st">"de"</span>):</span> <span><a aria-hidden="true" href="#cb59-203"></a> new_tag <span class="op">=</span> Comment(<span class="st">'# include file="/portal/idee-header.html" '</span>)</span> <span><a aria-hidden="true" href="#cb59-204"></a> language <span class="op">=</span> <span class="st">"de"</span></span> <span><a aria-hidden="true" href="#cb59-205"></a> site_name <span class="op">=</span> <span class="st">"Idee"</span></span> <span><a aria-hidden="true" href="#cb59-206"></a> title_prefix <span class="op">=</span> <span class="st">"Archiv"</span></span> <span><a aria-hidden="true" href="#cb59-207"></a> <span class="cf">else</span>:</span> <span><a aria-hidden="true" href="#cb59-208"></a> new_tag <span class="op">=</span> Comment(</span> <span><a aria-hidden="true" href="#cb59-209"></a> <span class="st">'# include file="/portal/concept-header.html" '</span>)</span> <span><a aria-hidden="true" href="#cb59-210"></a> language <span class="op">=</span> <span class="st">"en"</span></span> <span><a aria-hidden="true" href="#cb59-211"></a> site_name <span class="op">=</span> <span class="st">"Concept"</span></span> <span><a aria-hidden="true" href="#cb59-212"></a> title_prefix <span class="op">=</span> <span class="st">"Archive"</span></span> <span><a aria-hidden="true" href="#cb59-213"></a> tag.insert(<span class="dv">0</span>, new_tag)</span> <span><a aria-hidden="true" href="#cb59-214"></a></span> <span><a aria-hidden="true" href="#cb59-215"></a> tag <span class="op">=</span> soup.find(<span class="st">"html"</span>)</span> <span><a aria-hidden="true" href="#cb59-216"></a> tag.attrs.update({<span class="st">"lang"</span>: language, <span class="st">"xml:lang"</span>: language})</span> <span><a aria-hidden="true" href="#cb59-217"></a> tag <span class="op">=</span> soup.find(<span class="st">"meta"</span>, <span class="bu">property</span><span class="op">=</span><span class="st">"og:site_name"</span>)</span> <span><a aria-hidden="true" href="#cb59-218"></a> tag.attrs.update({<span class="st">"Content"</span>: site_name})</span> <span><a aria-hidden="true" href="#cb59-219"></a></span> <span><a aria-hidden="true" href="#cb59-220"></a> tag <span class="op">=</span> soup.find(<span class="st">"title"</span>)</span> <span><a aria-hidden="true" href="#cb59-221"></a> tag.string <span class="op">=</span> <span class="st">" "</span>.join([title_prefix,</span> <span><a aria-hidden="true" href="#cb59-222"></a> article_data[PubMetaData.pubdate][<span class="dv">0</span>:<span class="dv">7</span>]])</span> <span><a aria-hidden="true" href="#cb59-223"></a></span> <span><a aria-hidden="true" href="#cb59-224"></a> tag <span class="op">=</span> soup.find(<span class="st">"h1"</span>)</span> <span><a aria-hidden="true" href="#cb59-225"></a> tag.string <span class="op">=</span> <span class="st">" "</span>.join([title_prefix,</span> <span><a aria-hidden="true" href="#cb59-226"></a> article_data[PubMetaData.pubdate][<span class="dv">0</span>:<span class="dv">7</span>]])</span> <span><a aria-hidden="true" href="#cb59-227"></a></span> <span><a aria-hidden="true" href="#cb59-228"></a> article_url <span class="op">=</span> article_loc <span class="op">+</span> article_data.name <span class="op">+</span> <span class="st">".html"</span></span> <span><a aria-hidden="true" href="#cb59-229"></a></span> <span><a aria-hidden="true" href="#cb59-230"></a> tag <span class="op">=</span> soup.find(<span class="st">"a"</span>, href<span class="op">=</span>article_url)</span> <span><a aria-hidden="true" href="#cb59-231"></a> <span class="cf">if</span> <span class="kw">not</span> tag:</span> <span><a aria-hidden="true" href="#cb59-232"></a> tag <span class="op">=</span> soup.find(<span class="st">"h1"</span>)</span> <span><a aria-hidden="true" href="#cb59-233"></a></span> <span><a aria-hidden="true" href="#cb59-234"></a> new_tag <span class="op">=</span> soup.new_tag(<span class="st">"article"</span>)</span> <span><a aria-hidden="true" href="#cb59-235"></a> <span class="cf">if</span> tag: <span class="co"># true in archive, false in index page</span></span> <span><a aria-hidden="true" href="#cb59-236"></a> tag.insert_after(new_tag)</span> <span><a aria-hidden="true" href="#cb59-237"></a> <span class="cf">else</span>:</span> <span><a aria-hidden="true" href="#cb59-238"></a> tag <span class="op">=</span> soup.find(<span class="st">"main"</span>)</span> <span><a aria-hidden="true" href="#cb59-239"></a> tag.insert(<span class="dv">0</span>, new_tag)</span> <span><a aria-hidden="true" href="#cb59-240"></a> tag <span class="op">=</span> new_tag</span> <span><a aria-hidden="true" href="#cb59-241"></a></span> <span><a aria-hidden="true" href="#cb59-242"></a> new_tag <span class="op">=</span> soup.new_tag(<span class="st">"header"</span>)</span> <span><a aria-hidden="true" href="#cb59-243"></a> tag.append(new_tag)</span> <span><a aria-hidden="true" href="#cb59-244"></a> tag <span class="op">=</span> new_tag</span> <span><a aria-hidden="true" href="#cb59-245"></a></span> <span><a aria-hidden="true" href="#cb59-246"></a> new_tag <span class="op">=</span> soup.new_tag(<span class="st">"h2"</span>)</span> <span><a aria-hidden="true" href="#cb59-247"></a> tag.append(new_tag)</span> <span><a aria-hidden="true" href="#cb59-248"></a> tag <span class="op">=</span> new_tag</span> <span><a aria-hidden="true" href="#cb59-249"></a></span> <span><a aria-hidden="true" href="#cb59-250"></a> new_tag <span class="op">=</span> soup.new_tag(<span class="st">"a"</span>)</span> <span><a aria-hidden="true" href="#cb59-251"></a> new_tag.attrs.update({<span class="st">"href"</span>: article_url,</span> <span><a aria-hidden="true" href="#cb59-252"></a> <span class="st">"alt"</span>: article_data[PubMetaData.title]})</span> <span><a aria-hidden="true" href="#cb59-253"></a> new_tag.string <span class="op">=</span> article_data[PubMetaData.title]</span> <span><a aria-hidden="true" href="#cb59-254"></a> tag.append(new_tag)</span> <span><a aria-hidden="true" href="#cb59-255"></a> tag <span class="op">=</span> tag.parent <span class="co"># header</span></span> <span><a aria-hidden="true" href="#cb59-256"></a></span> <span><a aria-hidden="true" href="#cb59-257"></a> new_tag <span class="op">=</span> soup.new_tag(<span class="st">"div"</span>)</span> <span><a aria-hidden="true" href="#cb59-258"></a> tag.append(new_tag)</span> <span><a aria-hidden="true" href="#cb59-259"></a> tag <span class="op">=</span> new_tag</span> <span><a aria-hidden="true" href="#cb59-260"></a></span> <span><a aria-hidden="true" href="#cb59-261"></a> new_tag <span class="op">=</span> soup.new_tag(<span class="st">"time"</span>)</span> <span><a aria-hidden="true" href="#cb59-262"></a> new_tag.attrs.update({<span class="st">"datetime"</span>:</span> <span><a aria-hidden="true" href="#cb59-263"></a> article_data[PubMetaData.pubdate][:<span class="dv">19</span>],</span> <span><a aria-hidden="true" href="#cb59-264"></a> <span class="st">"pubdate"</span>: <span class="st">"true"</span>})</span> <span><a aria-hidden="true" href="#cb59-265"></a> new_tag.string <span class="op">=</span> article_data[PubMetaData.pubdate][:<span class="dv">10</span>]</span> <span><a aria-hidden="true" href="#cb59-266"></a> tag.append(new_tag)</span> <span><a aria-hidden="true" href="#cb59-267"></a></span> <span><a aria-hidden="true" href="#cb59-268"></a> new_tag <span class="op">=</span> soup.new_tag(<span class="st">"address"</span>)</span> <span><a aria-hidden="true" href="#cb59-269"></a> new_tag.string <span class="op">=</span> article_data[PubMetaData.author]</span> <span><a aria-hidden="true" href="#cb59-270"></a> tag.append(new_tag)</span> <span><a aria-hidden="true" href="#cb59-271"></a> tag <span class="op">=</span> tag.parent.parent <span class="co"># article</span></span> <span><a aria-hidden="true" href="#cb59-272"></a></span> <span><a aria-hidden="true" href="#cb59-273"></a> new_tag <span class="op">=</span> soup.new_tag(<span class="st">"p"</span>)</span> <span><a aria-hidden="true" href="#cb59-274"></a> tag.append(new_tag)</span> <span><a aria-hidden="true" href="#cb59-275"></a> tag <span class="op">=</span> new_tag</span> <span><a aria-hidden="true" href="#cb59-276"></a></span> <span><a aria-hidden="true" href="#cb59-277"></a> new_tag <span class="op">=</span> soup.new_tag(<span class="st">"a"</span>)</span> <span><a aria-hidden="true" href="#cb59-278"></a> new_tag.attrs.update({<span class="st">"href"</span>: article_url,</span> <span><a aria-hidden="true" href="#cb59-279"></a> <span class="st">"alt"</span>: article_data[PubMetaData.title]})</span> <span><a aria-hidden="true" href="#cb59-280"></a> new_tag.string <span class="op">=</span> <span class="st">"..."</span></span> <span><a aria-hidden="true" href="#cb59-281"></a> tag.append(<span class="st">"placeholder"</span>) <span class="co"># for the article abstract</span></span> <span><a aria-hidden="true" href="#cb59-282"></a> tag.append(new_tag)</span> <span><a aria-hidden="true" href="#cb59-283"></a> tag <span class="op">=</span> tag.parent <span class="co"># article</span></span> <span><a aria-hidden="true" href="#cb59-284"></a></span> <span><a aria-hidden="true" href="#cb59-285"></a> new_tag <span class="op">=</span> soup.new_tag(<span class="st">"hr"</span>)</span> <span><a aria-hidden="true" href="#cb59-286"></a> tag.append(new_tag)</span> <span><a aria-hidden="true" href="#cb59-287"></a> <span class="cf">else</span>:</span> <span><a aria-hidden="true" href="#cb59-288"></a> tag <span class="op">=</span> tag.parent.parent.parent <span class="co"># article</span></span> <span><a aria-hidden="true" href="#cb59-289"></a></span> <span><a aria-hidden="true" href="#cb59-290"></a> <span class="co"># tag holds now the article tag.</span></span> <span><a aria-hidden="true" href="#cb59-291"></a> <span class="co"># Either it had been found or created.</span></span> <span><a aria-hidden="true" href="#cb59-292"></a> <span class="co"># All used child tags exist also.</span></span> <span><a aria-hidden="true" href="#cb59-293"></a></span> <span><a aria-hidden="true" href="#cb59-294"></a> <span class="co"># Write or update the article abstract</span></span> <span><a aria-hidden="true" href="#cb59-295"></a> tag <span class="op">=</span> tag.find(<span class="st">"p"</span>)</span> <span><a aria-hidden="true" href="#cb59-296"></a> tag <span class="op">=</span> tag.find(<span class="st">"a"</span>)</span> <span><a aria-hidden="true" href="#cb59-297"></a> tag.previousSibling.replace_with(Archive._get_abstract(article_data))</span> <span><a aria-hidden="true" href="#cb59-298"></a></span> <span><a aria-hidden="true" href="#cb59-299"></a> <span class="co"># We give every anchor a tabindex</span></span> <span><a aria-hidden="true" href="#cb59-300"></a> <span class="co"># 5 Tabindexes are in the portal header</span></span> <span><a aria-hidden="true" href="#cb59-301"></a> index <span class="op">=</span> <span class="dv">6</span></span> <span><a aria-hidden="true" href="#cb59-302"></a> tags <span class="op">=</span> soup.find_all(re.<span class="bu">compile</span>(<span class="vs">r"^a$|^audio$|^input$"</span>))</span> <span><a aria-hidden="true" href="#cb59-303"></a> <span class="cf">for</span> tag <span class="kw">in</span> tags:</span> <span><a aria-hidden="true" href="#cb59-304"></a> tag.attrs.update({<span class="st">"tabindex"</span>: index})</span> <span><a aria-hidden="true" href="#cb59-305"></a> index <span class="op">+=</span> <span class="dv">1</span></span> <span><a aria-hidden="true" href="#cb59-306"></a></span> <span><a aria-hidden="true" href="#cb59-307"></a> <span class="cf">return</span> soup</span></code></pre> </div> <h2> Migration </h2> <p> Migration, I hoped for a quick one, needs to be manually. Not only need I supervise the result step by step, I also did put own comments below articles to update or amend articles, which now needs to be incorporated into the article text. </p> <p> And, as will be seen, slight adjustments to the wiki text needs to done in some cases to get the desired result. </p> <h3> Migration issue: double-byte unicode characters break PDF generation </h3> <p> The standard pandoc installation does not support double-byte unicode characters, as it does use LaTeX for the PDF generation. </p> <p> In my case this happened with the code U+03BA for the creek character κ. Not to know when and why PDF generation will break next time is no option. And its not possible to remove the issue just by removing the character, since it is for sure used for a reason. </p> <p> The stackoverflow discussion "Pandoc and foreign characters" <sup> ( 31 ) </sup> explains that the problem can be solved specifying a different PDF engine via --pdf-engine=xelatex. </p> <p> However, this is only a part of the answer, since this engine needs first to be installed and since a font needs to be chosen, which contains the character. </p> <p> The engine can be installed from the debian repository by: </p> <div class="sourceCode"> <pre class="sourceCode bash"><code class="sourceCode bash"><span><a aria-hidden="true" href="#cb60-1"></a><span class="ex">frank</span> @Asimov:~/projects/idee$ sudo apt-get install texlive-xetex</span></code></pre> </div> <p> A search for fonts supporting the character can be done with: </p> <div class="sourceCode"> <pre class="sourceCode bash"><code class="sourceCode bash"><span><a aria-hidden="true" href="#cb61-1"></a><span class="ex">frank</span> @Asimov:~/projects/idee$ fc-list <span class="st">':charset=03BA'</span></span></code></pre> </div> <p> This list is quite long and, if you think about it, helpful only in most exotic cases. My best guess for a suitable font to render everything in PDF what I use in my wiki pages would be the font used by my web browser. </p> <p> Was it in FireFox or was in Chromium? In one if my browsers I found the default to be DejaVu Sans. How can the font be specified? That can be done via command line parameters. </p> <p> Indeed I found a number of pages describing how this can be done, but in the end they all did not work as expected. Only the "Pandoc User’s Guide" <sup> ( 32 ) </sup> helped in the end. </p> <blockquote> <p> <strong> -V KEY[=VAL], --variable=KEY[:VAL] </strong> </p> <p> Set the template variable KEY to the value VAL when rendering the document in standalone mode. If no VAL is specified, the key will be given the value true. </p> <p> <strong> mainfont, sansfont, monofont, mathfont, CJKmainfont </strong> </p> <p> font families for use with xelatex or lualatex: take the name of any system font, using the fontspec package. CJKmainfont uses the xecjk package. </p> </blockquote> <p> Those two information combined explained me, when to use the ":" symbol and when the "=" symbol, which, for whatever reason, was not correctly done in the examples I found, or probably I failed to understand them correctly. </p> <p> The working code to call pandoc from python with naming the fonts to use: </p> <div class="sourceCode"> <pre class="sourceCode python"><code class="sourceCode python"><span><a aria-hidden="true" href="#cb62-1"></a> subprocess.run([<span class="st">"pandoc"</span>,</span> <span><a aria-hidden="true" href="#cb62-2"></a> <span class="co"># mediawiki markup as input format</span></span> <span><a aria-hidden="true" href="#cb62-3"></a> <span class="st">"-f"</span>, <span class="st">"html"</span>,</span> <span><a aria-hidden="true" href="#cb62-4"></a> <span class="co"># html as output forma</span></span> <span><a aria-hidden="true" href="#cb62-5"></a> <span class="st">"-t"</span>, <span class="st">"pdf"</span>,</span> <span><a aria-hidden="true" href="#cb62-6"></a> <span class="co"># input file</span></span> <span><a aria-hidden="true" href="#cb62-7"></a> <span class="co"># "-i", inpath,</span></span> <span><a aria-hidden="true" href="#cb62-8"></a> <span class="co"># output file</span></span> <span><a aria-hidden="true" href="#cb62-9"></a> <span class="st">"-o"</span>, <span class="va">self</span>.outpath,</span> <span><a aria-hidden="true" href="#cb62-10"></a> <span class="st">"--pdf-engine=xelatex"</span>,</span> <span><a aria-hidden="true" href="#cb62-11"></a> <span class="st">"--variable=mainfont:DejaVu Serif"</span>,</span> <span><a aria-hidden="true" href="#cb62-12"></a> <span class="st">"--variable=sansfont:DejaVu Sans"</span>,</span> <span><a aria-hidden="true" href="#cb62-13"></a> <span class="st">"--variable=monofont:DejaVu Sans Mono"</span>,</span> <span><a aria-hidden="true" href="#cb62-14"></a> <span class="st">"--variable=geometry:a4paper"</span>,</span> <span><a aria-hidden="true" href="#cb62-15"></a> <span class="st">"--variable=geometry:margin=2.5cm"</span>,</span> <span><a aria-hidden="true" href="#cb62-16"></a> <span class="st">"--variable=linkcolor:blue"</span></span> <span><a aria-hidden="true" href="#cb62-17"></a> ],<span class="op">\</span></span> <span><a aria-hidden="true" href="#cb62-18"></a> capture_output<span class="op">=</span><span class="va">False</span>,<span class="op">\</span></span> <span><a aria-hidden="true" href="#cb62-19"></a> <span class="co"># the correct workdirectory to find the images</span></span> <span><a aria-hidden="true" href="#cb62-20"></a> cwd<span class="op">=</span>workpath,<span class="op">\</span></span> <span><a aria-hidden="true" href="#cb62-21"></a> <span class="co"># html string as stdin</span></span> <span><a aria-hidden="true" href="#cb62-22"></a> <span class="bu">input</span><span class="op">=</span>html_doc.encode(<span class="st">"utf-8"</span>))</span></code></pre> </div> <p> A resource worth to visit for further beautification: "Customizing pandoc to generate beautiful pdf and epub from markdown" <sup> ( 33 ) </sup> </p> <p> For a start I'm happy if the PDF is generated correctly, but I'm sure I'll revisit the theme again to get from good results to perfect results. </p> <h4> Missing Character </h4> <p> Creek Characters where no problem, and for CJK (Chinese, Japan, Korea) Fonts a seperate vairable can be set. But now I got problems with Hebrew Chars and how would it look like if Arabian Chars would be required? </p> <p> Funnily enough having the chars nicely rendered in the web page doesn't tell you anything about your success during PDF creation. </p> <div class="sourceCode"> <pre class="sourceCode bash"><code class="sourceCode bash"><span><a aria-hidden="true" href="#cb63-1"></a>[<span class="ex">WARNING</span>] Missing character: There is no א (U+05D0) </span> <span><a aria-hidden="true" href="#cb63-2"></a><span class="kw">in</span> <span class="ex">font</span> DejaVu Serif/OT:script=latn<span class="kw">;</span><span class="va">language=</span>d</span> <span><a aria-hidden="true" href="#cb63-3"></a>[<span class="ex">WARNING</span>] Missing character: There is no ָ (U+05B8) </span> <span><a aria-hidden="true" href="#cb63-4"></a><span class="kw">in</span> <span class="ex">font</span> DejaVu Serif/OT:script=latn<span class="kw">;</span><span class="va">language=</span>d</span> <span><a aria-hidden="true" href="#cb63-5"></a>[<span class="ex">WARNING</span>] Missing character: There is no ד (U+05D3) </span> <span><a aria-hidden="true" href="#cb63-6"></a><span class="kw">in</span> <span class="ex">font</span> DejaVu Serif/OT:script=latn<span class="kw">;</span><span class="va">language=</span>d</span> <span><a aria-hidden="true" href="#cb63-7"></a>[<span class="ex">WARNING</span>] Missing character: There is no ָ (U+05B8) </span> <span><a aria-hidden="true" href="#cb63-8"></a><span class="kw">in</span> <span class="ex">font</span> DejaVu Serif/OT:script=latn<span class="kw">;</span><span class="va">language=</span>d</span> <span><a aria-hidden="true" href="#cb63-9"></a>[<span class="ex">WARNING</span>] Missing character: There is no ם (U+05DD) </span> <span><a aria-hidden="true" href="#cb63-10"></a><span class="kw">in</span> <span class="ex">font</span> DejaVu Serif/OT:script=latn<span class="kw">;</span><span class="va">language=</span>d</span> <span><a aria-hidden="true" href="#cb63-11"></a>[<span class="ex">WARNING</span>] Missing character: There is no א (U+05D0) </span> <span><a aria-hidden="true" href="#cb63-12"></a><span class="kw">in</span> <span class="ex">font</span> DejaVu Serif/OT:script=latn<span class="kw">;</span><span class="va">language=</span>d</span> <span><a aria-hidden="true" href="#cb63-13"></a>[<span class="ex">WARNING</span>] Missing character: There is no ֲ (U+05B2) </span> <span><a aria-hidden="true" href="#cb63-14"></a><span class="kw">in</span> <span class="ex">font</span> DejaVu Serif/OT:script=latn<span class="kw">;</span><span class="va">language=</span>d</span> <span><a aria-hidden="true" href="#cb63-15"></a>[<span class="ex">WARNING</span>] Missing character: There is no ד (U+05D3) </span> <span><a aria-hidden="true" href="#cb63-16"></a><span class="kw">in</span> <span class="ex">font</span> DejaVu Serif/OT:script=latn<span class="kw">;</span><span class="va">language=</span>d</span> <span><a aria-hidden="true" href="#cb63-17"></a>[<span class="ex">WARNING</span>] Missing character: There is no ָ (U+05B8) </span> <span><a aria-hidden="true" href="#cb63-18"></a><span class="kw">in</span> <span class="ex">font</span> DejaVu Serif/OT:script=latn<span class="kw">;</span><span class="va">language=</span>d</span> <span><a aria-hidden="true" href="#cb63-19"></a>[<span class="ex">WARNING</span>] Missing character: There is no מ (U+05DE) </span> <span><a aria-hidden="true" href="#cb63-20"></a><span class="kw">in</span> <span class="ex">font</span> DejaVu Serif/OT:script=latn<span class="kw">;</span><span class="va">language=</span>d</span> <span><a aria-hidden="true" href="#cb63-21"></a>[<span class="ex">WARNING</span>] Missing character: There is no ָ (U+05B8) </span> <span><a aria-hidden="true" href="#cb63-22"></a><span class="kw">in</span> <span class="ex">font</span> DejaVu Serif/OT:script=latn<span class="kw">;</span><span class="va">language=</span>d</span> <span><a aria-hidden="true" href="#cb63-23"></a>[<span class="ex">WARNING</span>] Missing character: There is no ה (U+05D4) </span> <span><a aria-hidden="true" href="#cb63-24"></a><span class="kw">in</span> <span class="ex">font</span> DejaVu Serif/OT:script=latn<span class="kw">;</span><span class="va">language=</span>d</span></code></pre> </div> <p> The command fc-list does not show any installed font for these character codes, but the browser does show them. This means that the browser gets its fonts from somewhere else, if it needs them. </p> <p> Curious what font would be reported by the browser, I used the inspection tool and got the answer "Liberation Sans", which convinced me to change the fonts to be used for the PDF generation to Liberation Fonts. </p> <p> This is one of the fonts installed by default on Debian. And guess what, it worked! Probably I did something wrong with the fc-list command. I think the font looks better balanced in the PDF, it is definitely a good change, not only for that article. </p> <h3> Tables flow out of the PDF page </h3> <p> There is a lot of web pages out there about this topic and all of them, at least those I found, where about how to change the markudown to prevent this to happen. </p> <p> To basic solutions are: </p> <ol class="incremental"> <li> scaling the table down together with the font size </li> <li> use the markdown for multiline tables </li> </ol> <p> Well, not using markdown but MediaWiki markup to write the articles, and then creating HTML first and then PDF from the HTML, I had some bad time to figure out the solution. Processing things in multiple steps I could probably find a solution by editing intermediate results, but that is cumbersome and not desirable. </p> <p> I even started to thing about my solution. Shouldn't I use markdown for the PDF as well as for the HTML generation? Should I throw big parts of my implementation away and start over again? </p> <p> However, reading about the solutions helped in the end. How can I convince Pandoc to make a multiline table from my markdown? I need to enforce a multiline header-cell in my MediaWiki markup. </p> <p> Note the </p> <pre><code>&lt;br/&gt;</code></pre> <p> in the third column: </p> <div class="sourceCode"> <pre class="sourceCode mediawiki"><code class="sourceCode mediawiki"><span><a aria-hidden="true" href="#cb65-1"></a><span class="dv">{|</span><span class="ot"> class=</span><span class="st">"wikitable"</span><span class="ot"> style=</span><span class="st">"text-align:left;"</span><span class="ot"> cellpadding=</span><span class="st">"2px"</span> </span> <span><a aria-hidden="true" href="#cb65-2"></a><span class="dv">!</span> Hersteller </span> <span><a aria-hidden="true" href="#cb65-3"></a><span class="dv">!</span> Impfstoff </span> <span><a aria-hidden="true" href="#cb65-4"></a><span class="dv">!</span> Primary<span class="kw">&lt;br/&gt;</span>Completion</span> <span><a aria-hidden="true" href="#cb65-5"></a><span class="dv">!</span> Completion </span> <span><a aria-hidden="true" href="#cb65-6"></a><span class="dv">|-</span></span> <span><a aria-hidden="true" href="#cb65-7"></a><span class="dv">|</span> BioNTech / Pfizer</span> <span><a aria-hidden="true" href="#cb65-8"></a><span class="dv">|</span> BNT162b2</span> <span><a aria-hidden="true" href="#cb65-9"></a><span class="dv">|</span> 2021-11-30</span> <span><a aria-hidden="true" href="#cb65-10"></a><span class="dv">|</span> 2021-11-30</span> <span><a aria-hidden="true" href="#cb65-11"></a><span class="dv">|}</span></span></code></pre> </div> <h3> Relative URLs to own articles do not work in PDF </h3> <p> Nothing to wonder about, but I stumbled upon it nonetheless. There is no chance, before generating PDF I have to revert the relative URLs back into absolute URLs pointing to my web site. </p> <p> DONE </p> <h3> Half Way Migrated - Checkpoint </h3> <p> At the point, where already migrated up to all Mai 2021 articles, I have 8 audio articles in my RSS feed. On my way I had the take care for a number of bugs, e.g in the code part, where the urn of the article is created based on the title. It is a critical detail, that the urn does match with current WordPress article URL stem, or I will not be able to process automatic redirects from the old URL to the new URL without an extensive matching list. </p> <p> I also learned what I have to take care for in regard to the articles Title in MediaWiki. E.g. in some titles I have used quotes. If I use the standard Quote ("), then I get a problem with the article filename on the disc. Writing the file, the quote char is correctly escaped. But the attempt to read the file in Python leads to a path where the escape char is escaped, leading to a file not found error. I now change these titles to use the quote chars („) and (“) instead. </p> <p> I'm also editing the articles to use &lt; ref &gt; tags also to my own articles and to put quotes into &lt; blockquite &gt; tags. </p> <p> Also the filenames of the audio files is now different than before, using now the urn of the article as stem of the audio filename. </p> <p> I wouldn't need to care, but I'd like to have all audio articles on my phone in their new representation in my GPodder App. This is not critical for other consumers, it is something I just want to have. Other consumers most probably have no problem with the changed appearance of the articles, just as long as their url for the feed consumption does not stop working. </p> <p> But for me my special requirement (my wish) leads to the conclusion, that I either need to allow a very big rss feed at the start, or I have to prepare the further article migration, implement the index page generation and to go live with a rapid migration after go-live. </p> <p> Which way I'll decide, I have to implement the index page generation rather sooner than later, because the go live is near. Not that a fixed day exists for this, but the progress indicates that it cannot be too far away. </p> <p> TODOs I have not to forget: </p> <ul class="incremental"> <li> Prevent the Search-Engine indexing of my legal page (in German and English) <ul class="incremental"> <li> Prevent the legal page to appear in the sitemap (Done) </li> <li> Prevent the legal page to appear in the RSS feed (Done) </li> <li> Prevent the legal page to appear in the archive (Done) </li> <li> nofollow information at the anchor in the headers (Done) </li> <li> disallow English and German legal page explicitly in the robots.txt (Done) </li> </ul> </li> <li> /feed/ redirect </li> <li> I have to take care, that source code references do not get a tabindex each </li> <li> Find out how to get backward references from references back to their text in PDF work. (Done) <ul class="incremental"> <li> Included readable http links into the PDF to provide useful footnotes also if printed. (footnote section only) </li> </ul> </li> </ul> <h2> Enabling Backlinks in PDF </h2> <p> I found a list of options to investigate in the post: "How to convert HTML to PDF using pandoc?" <sup> ( 34 ) </sup> </p> <h3> wkhtmltopdf </h3> <div class="sourceCode"> <pre class="sourceCode bash"><code class="sourceCode bash"><span><a aria-hidden="true" href="#cb66-1"></a><span class="ex">frank</span> @Asimov:~/projects/idee$ sudo apt-cache search wkhtmltopdf</span> <span><a aria-hidden="true" href="#cb66-2"></a>[<span class="ex">sudo</span>] password for frank: </span> <span><a aria-hidden="true" href="#cb66-3"></a><span class="ex">python3-django-wkhtmltopdf</span> - Django module with views for HTML to PDF </span> <span><a aria-hidden="true" href="#cb66-4"></a> <span class="ex">conversions</span> (Python 3)</span> <span><a aria-hidden="true" href="#cb66-5"></a><span class="ex">pandoc</span> - general markup converter</span> <span><a aria-hidden="true" href="#cb66-6"></a><span class="ex">python3-pdfkit</span> - Python wrapper for wkhtmltopdf to convert HTML to PDF </span> <span><a aria-hidden="true" href="#cb66-7"></a> <span class="kw">(</span><span class="ex">Python</span> 3<span class="kw">)</span></span> <span><a aria-hidden="true" href="#cb66-8"></a><span class="ex">wkhtmltopdf</span> - Command line utilities to convert html to pdf or image using</span> <span><a aria-hidden="true" href="#cb66-9"></a> <span class="ex">WebKit</span></span> <span><a aria-hidden="true" href="#cb66-10"></a></span> <span><a aria-hidden="true" href="#cb66-11"></a><span class="ex">frank</span> @Asimov:~/projects/idee$ sudo apt-get install wkhtmltopdf </span> <span><a aria-hidden="true" href="#cb66-12"></a></span> <span><a aria-hidden="true" href="#cb66-13"></a><span class="ex">frank</span> @Asimov:~/projects/idee/plain$ wkhtmltopdf --enable-local-file-access <span class="kw">\</span></span> <span><a aria-hidden="true" href="#cb66-14"></a><span class="ex">--enable-external-links</span> --enable-internal-links --keep-relative-links <span class="kw">\</span></span> <span><a aria-hidden="true" href="#cb66-15"></a><span class="ex">astrazeneca-vaxzevria-verunreinigungen-thromozytopenie-thrombose.html</span> <span class="kw">\</span></span> <span><a aria-hidden="true" href="#cb66-16"></a><span class="ex">astrazeneca-vaxzevria-verunreinigungen-thromozytopenie-thrombose.pdf</span></span> <span><a aria-hidden="true" href="#cb66-17"></a></span> <span><a aria-hidden="true" href="#cb66-18"></a><span class="ex">The</span> switch --enable-external-links, is not support using unpatched qt, and will </span> <span><a aria-hidden="true" href="#cb66-19"></a><span class="ex">be</span> ignored.The switch --enable-internal-links, is not support using unpatched </span> <span><a aria-hidden="true" href="#cb66-20"></a><span class="ex">qt</span>, and will be ignored.The switch --keep-relative-links, is not support using </span> <span><a aria-hidden="true" href="#cb66-21"></a><span class="ex">unpatched</span> qt, and will be ignored.Loading page (1/2)</span> <span><a aria-hidden="true" href="#cb66-22"></a><span class="ex">Printing</span> pages (2/2) </span> <span><a aria-hidden="true" href="#cb66-23"></a><span class="ex">Done</span> </span></code></pre> </div> <p> I'm not yet willing to install a qt-patch for this purpose, being not even sure about the result. Because of this no links at all are working in the PDF. The PDF shows the HTML exactly as it is rendered in the browser, and that is not really what I want to get as well. </p> <p> But I'll keep this in mind. It might be useful in other use cases. </p> <div class="sourceCode"> <pre class="sourceCode bash"><code class="sourceCode bash"><span><a aria-hidden="true" href="#cb67-1"></a><span class="ex">frank</span> @Asimov:~/projects/idee/plain$ sudo apt-get purge wkhtmltopdf </span></code></pre> </div> <h3> WeasyPrint </h3> <p> The quite impressing list of packages to be installed for WeasyPrint made me think twice about pressing yes. It even made me reading the documentation first: "WeasyPrint" <sup> ( 35 ) </sup> </p> <p> I learned from this documentation that CCS2 contains style elements for paged media layout. <sup> ( 36 ) </sup> </p> <p> From the reading I get the impression it does everything required to layout the HTML nicely for PDF and to enable all links to work. </p> <p> And from the post which made me aware of this tool I know already, that it can be named as engine for pandoc. The question for sure has to be asked, whether this does make sense. Calling a program written in R to call a program written in Python, when I'm already in a Python program. </p> <p> However, I'll try exact that setup for a start, and probably later I'll kick out Pandoc for PDF generation and use directly WeasyPrint via its API, if it works nicely. </p> <p> I guess if I go that route, I'll develop a second CSS for page layout details and to overwrite some CSS formatting used in HTML but being not nice in PDF. </p> <div class="sourceCode"> <pre class="sourceCode bash"><code class="sourceCode bash"><span><a aria-hidden="true" href="#cb68-1"></a><span class="ex">frank</span> @Asimov:~/projects/idee/plain$ sudo apt-get install weasyprint</span> <span><a aria-hidden="true" href="#cb68-2"></a>[<span class="ex">sudo</span>] password for frank: </span> <span><a aria-hidden="true" href="#cb68-3"></a><span class="ex">Reading</span> package lists... Done</span> <span><a aria-hidden="true" href="#cb68-4"></a><span class="ex">Building</span> dependency tree... Done</span> <span><a aria-hidden="true" href="#cb68-5"></a><span class="ex">Reading</span> state information... Done</span> <span><a aria-hidden="true" href="#cb68-6"></a><span class="ex">The</span> following additional packages will be installed:</span> <span><a aria-hidden="true" href="#cb68-7"></a> <span class="ex">libblkid-dev</span> libbrotli-dev libcairo-script-interpreter2 libcairo2-dev</span> <span><a aria-hidden="true" href="#cb68-8"></a> <span class="ex">libdatrie-dev</span> libfontconfig-dev libfontconfig1-dev libfreetype-dev</span> <span><a aria-hidden="true" href="#cb68-9"></a> <span class="ex">libfreetype6-dev</span> libfribidi-dev libglib2.0-dev libglib2.0-dev-bin</span> <span><a aria-hidden="true" href="#cb68-10"></a> <span class="ex">libgraphite2-dev</span> libharfbuzz-dev libharfbuzz-gobject0 libice-dev libmount-dev</span> <span><a aria-hidden="true" href="#cb68-11"></a> <span class="ex">libpango1.0-dev</span> libpcre2-32-0 libpcre2-dev libpcre2-posix2 libpixman-1-dev</span> <span><a aria-hidden="true" href="#cb68-12"></a> <span class="ex">libpng-dev</span> libpng-tools libpthread-stubs0-dev libselinux1-dev libsepol1-dev</span> <span><a aria-hidden="true" href="#cb68-13"></a> <span class="ex">libsm-dev</span> libthai-dev libx11-dev libxau-dev libxcb-render0-dev libxcb-shm0-dev</span> <span><a aria-hidden="true" href="#cb68-14"></a> <span class="ex">libxcb1-dev</span> libxdmcp-dev libxext-dev libxft-dev libxrender-dev pango1.0-tools</span> <span><a aria-hidden="true" href="#cb68-15"></a> <span class="ex">python-tinycss2-common</span> python3-cairocffi python3-cairosvg python3-cffi</span> <span><a aria-hidden="true" href="#cb68-16"></a> <span class="ex">python3-cssselect2</span> python3-pycparser python3-pyphen python3-tinycss2</span> <span><a aria-hidden="true" href="#cb68-17"></a> <span class="ex">python3-xcffib</span> uuid-dev x11proto-dev x11proto-xext-dev xorg-sgml-doctools</span> <span><a aria-hidden="true" href="#cb68-18"></a> <span class="ex">xtrans-dev</span></span> <span><a aria-hidden="true" href="#cb68-19"></a><span class="ex">Suggested</span> packages:</span> <span><a aria-hidden="true" href="#cb68-20"></a> <span class="ex">libcairo2-doc</span> libdatrie-doc freetype2-doc libgirepository1.0-dev libglib2.0-doc</span> <span><a aria-hidden="true" href="#cb68-21"></a> <span class="ex">libgraphite2-utils</span> libice-doc libpango1.0-doc libsm-doc libthai-doc libx11-doc</span> <span><a aria-hidden="true" href="#cb68-22"></a> <span class="ex">libxcb-doc</span> libxext-doc python-cairocffi-doc python-cssselect2-doc</span> <span><a aria-hidden="true" href="#cb68-23"></a> <span class="ex">python-tinycss2-doc</span></span> <span><a aria-hidden="true" href="#cb68-24"></a><span class="ex">The</span> following NEW packages will be installed:</span> <span><a aria-hidden="true" href="#cb68-25"></a> <span class="ex">libblkid-dev</span> libbrotli-dev libcairo-script-interpreter2 libcairo2-dev</span> <span><a aria-hidden="true" href="#cb68-26"></a> <span class="ex">libdatrie-dev</span> libfontconfig-dev libfontconfig1-dev libfreetype-dev</span> <span><a aria-hidden="true" href="#cb68-27"></a> <span class="ex">libfreetype6-dev</span> libfribidi-dev libglib2.0-dev libglib2.0-dev-bin</span> <span><a aria-hidden="true" href="#cb68-28"></a> <span class="ex">libgraphite2-dev</span> libharfbuzz-dev libharfbuzz-gobject0 libice-dev libmount-dev</span> <span><a aria-hidden="true" href="#cb68-29"></a> <span class="ex">libpango1.0-dev</span> libpcre2-32-0 libpcre2-dev libpcre2-posix2 libpixman-1-dev</span> <span><a aria-hidden="true" href="#cb68-30"></a> <span class="ex">libpng-dev</span> libpng-tools libpthread-stubs0-dev libselinux1-dev libsepol1-dev</span> <span><a aria-hidden="true" href="#cb68-31"></a> <span class="ex">libsm-dev</span> libthai-dev libx11-dev libxau-dev libxcb-render0-dev libxcb-shm0-dev</span> <span><a aria-hidden="true" href="#cb68-32"></a> <span class="ex">libxcb1-dev</span> libxdmcp-dev libxext-dev libxft-dev libxrender-dev pango1.0-tools</span> <span><a aria-hidden="true" href="#cb68-33"></a> <span class="ex">python-tinycss2-common</span> python3-cairocffi python3-cairosvg python3-cffi</span> <span><a aria-hidden="true" href="#cb68-34"></a> <span class="ex">python3-cssselect2</span> python3-pycparser python3-pyphen python3-tinycss2</span> <span><a aria-hidden="true" href="#cb68-35"></a> <span class="ex">python3-xcffib</span> uuid-dev weasyprint x11proto-dev x11proto-xext-dev</span> <span><a aria-hidden="true" href="#cb68-36"></a> <span class="ex">xorg-sgml-doctools</span> xtrans-dev</span> <span><a aria-hidden="true" href="#cb68-37"></a><span class="ex">0</span> upgraded, 54 newly installed, 0 to remove and 1 not upgraded.</span> <span><a aria-hidden="true" href="#cb68-38"></a><span class="ex">Need</span> to get 13.3 MB of archives.</span> <span><a aria-hidden="true" href="#cb68-39"></a><span class="ex">After</span> this operation, 47.1 MB of additional disk space will be used.</span> <span><a aria-hidden="true" href="#cb68-40"></a><span class="ex">Do</span> you want to continue? [Y/n]</span></code></pre> </div> <p> Naming weasyprint instead of xelatex as pdf-engine works instantly. Font setting from CSS is not used and Headline Color is also not used as in CSS defined. Probably the CSS is not found at all. </p> <p> CSS is however found when the program is called from the command line, making the headlines use the CSS defined color, but font settings are still ignored, but this time with an warning message informing about this. </p> <div class="sourceCode"> <pre class="sourceCode bash"><code class="sourceCode bash"><span><a aria-hidden="true" href="#cb69-1"></a><span class="ex">frank</span> @Asimov:~/projects/idee/website/article$ weasyprint -f pdf <span class="kw">\</span></span> <span><a aria-hidden="true" href="#cb69-2"></a><span class="ex">astrazeneca-vaxzevria-verunreinigungen-thromozytopenie-thrombose.html</span> <span class="kw">\</span></span> <span><a aria-hidden="true" href="#cb69-3"></a><span class="ex">astrazeneca-vaxzevria-verunreinigungen-thromozytopenie-thrombose.pdf</span></span> <span><a aria-hidden="true" href="#cb69-4"></a><span class="ex">WARNING</span>: Ignored <span class="kw">`</span><span class="ex">font</span>: var(--theme-font)<span class="kw">`</span> at 29:2, invalid value.</span> <span><a aria-hidden="true" href="#cb69-5"></a><span class="ex">WARNING</span>: Ignored <span class="kw">`</span><span class="ex">border-right</span>: 1px solid var(--theme-color)<span class="kw">`</span> at 44:2, invalid </span> <span><a aria-hidden="true" href="#cb69-6"></a> <span class="ex">value.</span></span> <span><a aria-hidden="true" href="#cb69-7"></a><span class="ex">WARNING</span>: Ignored <span class="kw">`</span><span class="ex">border-left</span>: 1px solid var(--theme-color)<span class="kw">`</span> at 45:2, invalid </span> <span><a aria-hidden="true" href="#cb69-8"></a> <span class="ex">value.</span></span> <span><a aria-hidden="true" href="#cb69-9"></a><span class="ex">WARNING</span>: Expected a media type, got screen/**/and/**/(min-width: 641px)</span> <span><a aria-hidden="true" href="#cb69-10"></a><span class="ex">WARNING</span>: Invalid media type <span class="st">" screen and (min-width: 641px) "</span> the whole @media </span> <span><a aria-hidden="true" href="#cb69-11"></a> <span class="ex">rule</span> was ignored at 83:1.</span> <span><a aria-hidden="true" href="#cb69-12"></a><span class="ex">WARNING</span>: Expected a media type, got screen/**/and/**/(max-width: 640px)</span> <span><a aria-hidden="true" href="#cb69-13"></a><span class="ex">WARNING</span>: Invalid media type <span class="st">" screen and (max-width: 640px) "</span> the whole @media </span> <span><a aria-hidden="true" href="#cb69-14"></a> <span class="ex">rule</span> was ignored at 105:1.</span> <span><a aria-hidden="true" href="#cb69-15"></a><span class="ex">WARNING</span>: Ignored <span class="kw">`</span><span class="ex">font</span>: var(--theme-font)<span class="kw">`</span> at 197:2, invalid value.</span> <span><a aria-hidden="true" href="#cb69-16"></a><span class="ex">WARNING</span>: Ignored <span class="kw">`</span><span class="ex">font</span>: var(--theme-font)<span class="kw">`</span> at 236:29, invalid value.</span> <span><a aria-hidden="true" href="#cb69-17"></a><span class="ex">WARNING</span>: Ignored <span class="kw">`</span><span class="ex">font</span>: var(--theme-font)<span class="kw">`</span> at 239:21, invalid value.</span> <span><a aria-hidden="true" href="#cb69-18"></a><span class="ex">WARNING</span>: Ignored <span class="kw">`</span><span class="ex">display</span>: inline-grid<span class="kw">`</span> at 254:2, invalid value.</span> <span><a aria-hidden="true" href="#cb69-19"></a><span class="ex">WARNING</span>: Ignored <span class="kw">`</span><span class="ex">grid-template-columns</span>: 30px auto auto auto<span class="kw">`</span> at 255:2, unknown </span> <span><a aria-hidden="true" href="#cb69-20"></a> <span class="ex">property.</span></span> <span><a aria-hidden="true" href="#cb69-21"></a><span class="ex">WARNING</span>: Ignored <span class="kw">`</span><span class="ex">font</span>: var(--theme-font)<span class="kw">`</span> at 270:2, invalid value.</span> <span><a aria-hidden="true" href="#cb69-22"></a><span class="ex">WARNING</span>: Ignored <span class="kw">`</span><span class="ex">text-shadow</span>: 1px 1px rgba(255, 255, 255, 0.4)<span class="kw">`</span> at 294:2, </span> <span><a aria-hidden="true" href="#cb69-23"></a> <span class="ex">unknown</span> property.</span> <span><a aria-hidden="true" href="#cb69-24"></a><span class="ex">WARNING</span>: Ignored <span class="kw">`</span><span class="ex">border-bottom</span>: 0.3em solid var(--theme-color)<span class="kw">`</span> at 327:2, </span> <span><a aria-hidden="true" href="#cb69-25"></a> <span class="ex">invalid</span> value.</span> <span><a aria-hidden="true" href="#cb69-26"></a><span class="ex">WARNING</span>: Ignored <span class="kw">`</span><span class="ex">font</span>: var(--theme-font)<span class="kw">`</span> at 332:2, invalid value.</span> <span><a aria-hidden="true" href="#cb69-27"></a><span class="ex">WARNING</span>: Ignored <span class="kw">`</span><span class="ex">font</span>: var(--theme-font)<span class="kw">`</span> at 402:2, invalid value.</span> <span><a aria-hidden="true" href="#cb69-28"></a><span class="ex">WARNING</span>: Ignored <span class="kw">`</span><span class="ex">outline</span>: 5px solid var(--theme-meta-color)<span class="kw">`</span> at 406:2, invalid </span> <span><a aria-hidden="true" href="#cb69-29"></a> <span class="ex">value.</span></span> <span><a aria-hidden="true" href="#cb69-30"></a><span class="ex">WARNING</span>: Ignored <span class="kw">`</span><span class="ex">border-top</span>: 2px solid var(--theme-meta-color)<span class="kw">`</span> at 412:2, </span> <span><a aria-hidden="true" href="#cb69-31"></a> <span class="ex">invalid</span> value.</span></code></pre> </div> <p> Links pointing backward inside the document do work as they should. Obviously I'll now take a look into a CSS optimization for the PDF generation before I'll proceed with my migration. </p> <h4> fspdf.css </h4> <p> Creating a complete new CSS for PDF generation is not helpful, since this might introduce a lot of double maintenance if the style is changed in future. But a separate CSS to overwrite just some specific things is quite simple. </p> <p> See the earlier chapter [#The PDF Style Sheet|The PDF Style Sheet] </p> <p> This little initial CSS also reveals, that the removal of figures around images is no longer required. In contrary these figures are now important means to layout the images as we need them. However, anchor tags inside the figures around the image do nothing. Opening the image in web browser by clicking the image does not work. But I see this as a minor issue, since every document created will carry a QR-Code with the URL of the article for those, who wish to use the web-version of the article, </p> <p> I was able to add a header line with the articles title and a page number at the top of the page. Over time the layout of the page might change to get perfect results, but for now good is good enough. </p> <h4> pdfworker.py </h4> <p> The following code shows just the essential parts of the new code. A lot more lines have been removed, e.g. the pandoc systemcall and the removal of figures from tables or the article header. </p> <div class="sourceCode"> <pre class="sourceCode CSS"><code class="sourceCode css"><span><a aria-hidden="true" href="#cb70-1"></a>from weasyprint import HTML</span> <span><a aria-hidden="true" href="#cb70-2"></a>from weasyprint import CSS</span> <span><a aria-hidden="true" href="#cb70-3"></a><span class="ex">[...]</span></span> <span><a aria-hidden="true" href="#cb70-4"></a> csspath = Path(r"/home/frank/projects/idee/website/css/fspdf<span class="fu">.css</span>")</span> <span><a aria-hidden="true" href="#cb70-5"></a> csspath<span class="fu">.resolve</span>()</span> <span><a aria-hidden="true" href="#cb70-6"></a> html_doc = soup<span class="fu">.prettify</span>()</span> <span><a aria-hidden="true" href="#cb70-7"></a></span> <span><a aria-hidden="true" href="#cb70-8"></a> weasy_html = HTML(string=html_doc<span class="op">,</span> base_url=str(workpath))</span> <span><a aria-hidden="true" href="#cb70-9"></a> weasy_html<span class="fu">.write_pdf</span>(target=self<span class="fu">.outpath</span><span class="op">,</span></span> <span><a aria-hidden="true" href="#cb70-10"></a> stylesheets=<span class="ex">[CSS(filename</span><span class="op">=</span><span class="st">str(csspath))</span><span class="ex">]</span></span> <span><a aria-hidden="true" href="#cb70-11"></a> )</span> <span><a aria-hidden="true" href="#cb70-12"></a><span class="ex">[...]</span></span></code></pre> </div> <h4> WeasyPrint Bug? </h4> <p> I'm perfectly satisfied with the PDF generated by Weasyprint, but I discovered now, not before generating quite a lot of PDF documents, that German special characters (ÄäÖöÜüß) in Headlines lead to dysfunctional links in the table of contents. </p> <p> The TOC used is not a real PDF TOC, but it is the TOC generated for the HTML and it should work on the PDF just as it does in HTML. </p> <p> The HTML is generated by Pandoc, and till now I did not meddle with the id and href names generated for internal navigation. Indeed I like it very much, that Pandoc does not escape the Germen umlaute in those, </p> <p> At a point in near feature I need to investigate this issue closer. Does PDF allow full UTF-8 in references? Where is the Bug in the WeasyPrint implementation? Would the correction be in the escaping of special characters or in enablement of UTF-8? </p> <p> And then, when the issue is solved, I'll have to trigger re-creation of the PDFs. </p> <h4> Knowledge Resources about CSS Paged Media </h4> <ul class="incremental"> <li> "Revisting HTML To PDF Conversion with CSS Paged Media" <sup> ( 37 ) </sup> </li> <li> "CSS Paged Media Module Level 3" <sup> ( 38 ) </sup> </li> </ul> <h2> Index Page Implementation </h2> <p> The subdomain idee.frank-siebert.de will serve two index pages, one with English language and one with German language. </p> <ul class="incremental"> <li> idee.html <ul class="incremental"> <li> Main index page in German language </li> </ul> </li> <li> concept.html <ul class="incremental"> <li> English index page </li> </ul> </li> </ul> <p> There will not be many English articles, as far as I foresee. That's the main reason to give those article no own subdomain. And most probably the English articles will not be a translation of German articles. </p> <p> I'm undecided about the question, whether the search should be restricted to the current sites language. For a start I'll not implement such a restriction. </p> <p> Under these circumstances it seems to make no sense to have a language switch somewhere on the site, because then visitors would assume that they can switch the language of the current article, which will not be the case. And for sure I'll refrain from faking a multi-language page via google translate, just to be able to show a language switch button. </p> <h3> Index Page Content </h3> <p> <del> The index page content for the respective language will be generated from the RSS file created for that language. The language specific portal header will be injected. </del> I based the index page generation on the archive generation. Hot needle implementation and a lot to refactor to get it nice, but it works. </p> <p> This implementation allows to define a separate item count for the RSS feed and the index page. </p> <h3> The Index Builder </h3> <p> <strong> ~/projects/idee/generator/idxbuilder.py </strong> </p> <div class="sourceCode"> <pre class="sourceCode Python"><code class="sourceCode python"><span><a aria-hidden="true" href="#cb71-1"></a><span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb71-2"></a><span class="co">Update the index pages of the webseite.</span></span> <span><a aria-hidden="true" href="#cb71-3"></a></span> <span><a aria-hidden="true" href="#cb71-4"></a><span class="co">@author: Frank Siebert</span></span> <span><a aria-hidden="true" href="#cb71-5"></a><span class="co">@license: https://creativecommons.org/publicdomain/zero/1.0/deed.en</span></span> <span><a aria-hidden="true" href="#cb71-6"></a><span class="co">@date: 2022-03-15</span></span> <span><a aria-hidden="true" href="#cb71-7"></a></span> <span><a aria-hidden="true" href="#cb71-8"></a><span class="co">All links provided relative to the /article/ folder</span></span> <span><a aria-hidden="true" href="#cb71-9"></a></span> <span><a aria-hidden="true" href="#cb71-10"></a><span class="co">@author: Frank Siebert</span></span> <span><a aria-hidden="true" href="#cb71-11"></a><span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb71-12"></a><span class="im">import</span> datetime</span> <span><a aria-hidden="true" href="#cb71-13"></a><span class="im">from</span> pubmetadata <span class="im">import</span> PubMetaData</span> <span><a aria-hidden="true" href="#cb71-14"></a><span class="im">from</span> gitmsgconstants <span class="im">import</span> GitMsgConstants <span class="im">as</span> gmc</span> <span><a aria-hidden="true" href="#cb71-15"></a><span class="im">from</span> archive <span class="im">import</span> Archive</span> <span><a aria-hidden="true" href="#cb71-16"></a></span> <span><a aria-hidden="true" href="#cb71-17"></a><span class="co"># Number of items to included into the RSS feed</span></span> <span><a aria-hidden="true" href="#cb71-18"></a>ITEM_COUNT <span class="op">=</span> <span class="dv">15</span></span> <span><a aria-hidden="true" href="#cb71-19"></a></span> <span><a aria-hidden="true" href="#cb71-20"></a></span> <span><a aria-hidden="true" href="#cb71-21"></a><span class="kw">def</span> by_pub_date(article_data):</span> <span><a aria-hidden="true" href="#cb71-22"></a> <span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb71-23"></a><span class="co"> Return the publishing date as sort criteria.</span></span> <span><a aria-hidden="true" href="#cb71-24"></a></span> <span><a aria-hidden="true" href="#cb71-25"></a><span class="co"> Parameters</span></span> <span><a aria-hidden="true" href="#cb71-26"></a><span class="co"> ----------</span></span> <span><a aria-hidden="true" href="#cb71-27"></a><span class="co"> e : Series</span></span> <span><a aria-hidden="true" href="#cb71-28"></a><span class="co"> article_data.</span></span> <span><a aria-hidden="true" href="#cb71-29"></a></span> <span><a aria-hidden="true" href="#cb71-30"></a><span class="co"> Returns</span></span> <span><a aria-hidden="true" href="#cb71-31"></a><span class="co"> -------</span></span> <span><a aria-hidden="true" href="#cb71-32"></a><span class="co"> TYPE</span></span> <span><a aria-hidden="true" href="#cb71-33"></a><span class="co"> Date as Str</span></span> <span><a aria-hidden="true" href="#cb71-34"></a></span> <span><a aria-hidden="true" href="#cb71-35"></a><span class="co"> """</span></span> <span><a aria-hidden="true" href="#cb71-36"></a> <span class="cf">return</span> article_data[PubMetaData.pubdate]</span> <span><a aria-hidden="true" href="#cb71-37"></a></span> <span><a aria-hidden="true" href="#cb71-38"></a></span> <span><a aria-hidden="true" href="#cb71-39"></a><span class="kw">class</span> IDXBuilder():</span> <span><a aria-hidden="true" href="#cb71-40"></a> <span class="co">"""Manage all changees in the index page."""</span></span> <span><a aria-hidden="true" href="#cb71-41"></a></span> <span><a aria-hidden="true" href="#cb71-42"></a> <span class="kw">def</span> <span class="fu">__init__</span>(<span class="va">self</span>):</span> <span><a aria-hidden="true" href="#cb71-43"></a> <span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb71-44"></a><span class="co"> Initialize changelists.</span></span> <span><a aria-hidden="true" href="#cb71-45"></a></span> <span><a aria-hidden="true" href="#cb71-46"></a><span class="co"> The information about the changed html pages comes from</span></span> <span><a aria-hidden="true" href="#cb71-47"></a><span class="co"> PubMetaData.instance._updates and</span></span> <span><a aria-hidden="true" href="#cb71-48"></a><span class="co"> PubMetaData.instance._deletions .</span></span> <span><a aria-hidden="true" href="#cb71-49"></a></span> <span><a aria-hidden="true" href="#cb71-50"></a><span class="co"> Returns</span></span> <span><a aria-hidden="true" href="#cb71-51"></a><span class="co"> -------</span></span> <span><a aria-hidden="true" href="#cb71-52"></a><span class="co"> None.</span></span> <span><a aria-hidden="true" href="#cb71-53"></a></span> <span><a aria-hidden="true" href="#cb71-54"></a><span class="co"> """</span></span> <span><a aria-hidden="true" href="#cb71-55"></a> <span class="co"># information for German page changes on site "Idee".</span></span> <span><a aria-hidden="true" href="#cb71-56"></a> <span class="va">self</span>.de_list <span class="op">=</span> []</span> <span><a aria-hidden="true" href="#cb71-57"></a> <span class="co"># information for English page changes on site "Concept".</span></span> <span><a aria-hidden="true" href="#cb71-58"></a> <span class="va">self</span>.en_list <span class="op">=</span> []</span> <span><a aria-hidden="true" href="#cb71-59"></a> <span class="co"># The time of the update</span></span> <span><a aria-hidden="true" href="#cb71-60"></a> <span class="va">self</span>._nowdate <span class="op">=</span> datetime.datetime.now().isoformat()</span> <span><a aria-hidden="true" href="#cb71-61"></a> <span class="co"># soup of currently processed Index html</span></span> <span><a aria-hidden="true" href="#cb71-62"></a></span> <span><a aria-hidden="true" href="#cb71-63"></a> <span class="cf">for</span> article_data <span class="kw">in</span> PubMetaData.instance._updates:</span> <span><a aria-hidden="true" href="#cb71-64"></a> <span class="cf">if</span> article_data[PubMetaData.site] <span class="op">==</span> <span class="st">"Idee"</span> \</span> <span><a aria-hidden="true" href="#cb71-65"></a> <span class="kw">and</span> article_data.name <span class="op">!=</span> <span class="st">"rechtliches"</span>:</span> <span><a aria-hidden="true" href="#cb71-66"></a> <span class="va">self</span>.de_list.append(article_data)</span> <span><a aria-hidden="true" href="#cb71-67"></a> <span class="cf">else</span>:</span> <span><a aria-hidden="true" href="#cb71-68"></a> <span class="cf">if</span> article_data.name <span class="op">!=</span> <span class="st">"legal"</span>:</span> <span><a aria-hidden="true" href="#cb71-69"></a> <span class="va">self</span>.en_list.append(article_data)</span> <span><a aria-hidden="true" href="#cb71-70"></a></span> <span><a aria-hidden="true" href="#cb71-71"></a> <span class="cf">for</span> article_data <span class="kw">in</span> PubMetaData.instance._deletions:</span> <span><a aria-hidden="true" href="#cb71-72"></a> <span class="co"># </span><span class="al">TODO</span></span> <span><a aria-hidden="true" href="#cb71-73"></a> <span class="cf">pass</span></span> <span><a aria-hidden="true" href="#cb71-74"></a></span> <span><a aria-hidden="true" href="#cb71-75"></a> <span class="co"># Default sort is ascending, oldest posts first in list</span></span> <span><a aria-hidden="true" href="#cb71-76"></a> <span class="va">self</span>.de_list.sort(key<span class="op">=</span>by_pub_date)</span> <span><a aria-hidden="true" href="#cb71-77"></a> <span class="va">self</span>.en_list.sort(key<span class="op">=</span>by_pub_date)</span> <span><a aria-hidden="true" href="#cb71-78"></a></span> <span><a aria-hidden="true" href="#cb71-79"></a> <span class="kw">def</span> update(<span class="va">self</span>):</span> <span><a aria-hidden="true" href="#cb71-80"></a> <span class="co">"""</span></span> <span><a aria-hidden="true" href="#cb71-81"></a><span class="co"> Iterate over changes and update respective index pages.</span></span> <span><a aria-hidden="true" href="#cb71-82"></a></span> <span><a aria-hidden="true" href="#cb71-83"></a><span class="co"> The information about the changed html pages comes from</span></span> <span><a aria-hidden="true" href="#cb71-84"></a><span class="co"> PubMetaData.instance._updates and</span></span> <span><a aria-hidden="true" href="#cb71-85"></a><span class="co"> PubMetaData.instance._deletions .</span></span> <span><a aria-hidden="true" href="#cb71-86"></a></span> <span><a aria-hidden="true" href="#cb71-87"></a><span class="co"> Returns</span></span> <span><a aria-hidden="true" href="#cb71-88"></a><span class="co"> -------</span></span> <span><a aria-hidden="true" href="#cb71-89"></a><span class="co"> None.</span></span> <span><a aria-hidden="true" href="#cb71-90"></a></span> <span><a aria-hidden="true" href="#cb71-91"></a><span class="co"> """</span></span> <span><a aria-hidden="true" href="#cb71-92"></a> <span class="cf">for</span> article_data <span class="kw">in</span> <span class="va">self</span>.de_list:</span> <span><a aria-hidden="true" href="#cb71-93"></a> soup <span class="op">=</span> Archive._update(gmc.idee_index, article_data,</span> <span><a aria-hidden="true" href="#cb71-94"></a> article_loc<span class="op">=</span><span class="st">"./article/"</span>)</span> <span><a aria-hidden="true" href="#cb71-95"></a> soup <span class="op">=</span> IDXBuilder._limit_entries(soup)</span> <span><a aria-hidden="true" href="#cb71-96"></a> html_doc <span class="op">=</span> soup.prettify()</span> <span><a aria-hidden="true" href="#cb71-97"></a></span> <span><a aria-hidden="true" href="#cb71-98"></a> <span class="cf">with</span> <span class="bu">open</span>(gmc.idee_index, <span class="st">'w'</span>) <span class="im">as</span> index_file:</span> <span><a aria-hidden="true" href="#cb71-99"></a> <span class="bu">print</span>(html_doc, <span class="bu">file</span><span class="op">=</span>index_file)</span> <span><a aria-hidden="true" href="#cb71-100"></a> index_file.flush()</span> <span><a aria-hidden="true" href="#cb71-101"></a> index_file.close()</span> <span><a aria-hidden="true" href="#cb71-102"></a></span> <span><a aria-hidden="true" href="#cb71-103"></a> <span class="cf">for</span> article_data <span class="kw">in</span> <span class="va">self</span>.en_list:</span> <span><a aria-hidden="true" href="#cb71-104"></a> soup <span class="op">=</span> Archive._update(gmc.concept_index, article_data,</span> <span><a aria-hidden="true" href="#cb71-105"></a> article_loc<span class="op">=</span><span class="st">"./article/"</span>)</span> <span><a aria-hidden="true" href="#cb71-106"></a> soup <span class="op">=</span> IDXBuilder._limit_entries(soup)</span> <span><a aria-hidden="true" href="#cb71-107"></a></span> <span><a aria-hidden="true" href="#cb71-108"></a> html_doc <span class="op">=</span> soup.prettify()</span> <span><a aria-hidden="true" href="#cb71-109"></a></span> <span><a aria-hidden="true" href="#cb71-110"></a> <span class="cf">with</span> <span class="bu">open</span>(gmc.concept_index, <span class="st">'w'</span>) <span class="im">as</span> index_file:</span> <span><a aria-hidden="true" href="#cb71-111"></a> <span class="bu">print</span>(html_doc, <span class="bu">file</span><span class="op">=</span>index_file)</span> <span><a aria-hidden="true" href="#cb71-112"></a> index_file.flush()</span> <span><a aria-hidden="true" href="#cb71-113"></a> index_file.close()</span> <span><a aria-hidden="true" href="#cb71-114"></a></span> <span><a aria-hidden="true" href="#cb71-115"></a> <span class="at">@staticmethod</span></span> <span><a aria-hidden="true" href="#cb71-116"></a> <span class="kw">def</span> _limit_entries(soup):</span> <span><a aria-hidden="true" href="#cb71-117"></a> tags <span class="op">=</span> soup.find_all(<span class="st">"article"</span>)</span> <span><a aria-hidden="true" href="#cb71-118"></a> count <span class="op">=</span> <span class="dv">0</span></span> <span><a aria-hidden="true" href="#cb71-119"></a> <span class="cf">for</span> tag <span class="kw">in</span> tags:</span> <span><a aria-hidden="true" href="#cb71-120"></a> <span class="cf">if</span> count <span class="op">&gt;</span> ITEM_COUNT:</span> <span><a aria-hidden="true" href="#cb71-121"></a> tag.decompose()</span> <span><a aria-hidden="true" href="#cb71-122"></a> <span class="cf">else</span>:</span> <span><a aria-hidden="true" href="#cb71-123"></a> count <span class="op">+=</span> <span class="dv">1</span></span> <span><a aria-hidden="true" href="#cb71-124"></a> <span class="cf">return</span> soup</span></code></pre> </div> <p> Since the template for archive pages is used for the index page, it is necessary to remove the h1 tag with the text "Archive" after initial creation. </p> <p> That's a one time intervention, and i did not see any need to implement something to avoid this. As easily can be seen, the index pages are created with Archive._update() function. Probably not very elegant implemented, but effective reuse. </p> <p> There is obviously room for improvement. </p> <h2> Own Magic Words </h2> <p> I indroduced own so called magic words to steer the production of PDF or the display of the CCß license information. </p> <ul class="incremental"> <li> __NOPDF__ prevents the PDF creation and the placement of the PDF Icon. </li> <li> __NOLIC__ prevents the placement of the License Icons </li> </ul> <p> The rationale is quote simple. If I post just a simple video, audio or reading recommendation, it does not make any sense to place a license information for a non existing own intellectual work. </p> <p> Indeed it only does raise the risk that consumers misunderstand the meaning of the license information as to be applicable to the recommended content. </p> <p> The magic words are ignored in the MediaWiki and processed by Pandoc into content placed in &lt; p &gt; tags. The plainworker.py does query there existence and changes the output accordingly. </p> <h2> German Literals </h2> <p> As I found out, that I cannot use "normal" literals in titles, I learned today how enter German literals via the German keyboard in front of me. </p> <p> Which year is it? 2022. When did I start working in the IT business? I think it was called EDV in Germany at those times, „elektronische Datenverarbeitung“. It was in December 1988. </p> <p> It took 33 years and a bit more to learn how to enter the German literals. Time to note it down, or I probably forget it again. </p> <ul class="incremental"> <li> „ [AltGr]+[Fn]+v </li> <li> “ [AltGr]+[Fn]+b </li> </ul> <h2> Final Recapitulation </h2> <p> The shown documentation follows the implementation sequence, while avoiding to show the code evolution in detail. This is probably not the best sequence possible for a documentation, but I tried to combine it with the implementation story. </p> <p> The code itself contains a opportunity for improvement. I would not consider the code shown here to be best practice for any purpose. </p> <p> However, the code is stable enough to go live with the solution on my own site, and I already did. This is the first article, apart from the legal page and the page about the PDF logo, which gets published natively on this page. </p> <p> It is a very long article and I hope the formatting in the new article HTML takes care to keep it readable in spite of its length. </p> <p> I learned a lot from this project, and I hope the description is helpful for someone. </p> <h2> Footnotes </h2> <hr/> <ol> <li> <a href="https://golb.hplar.ch/2020/05/gitblog.html"> Gitblog - the software that powers my blog </a> , 2020-05-07 </li> <li> <a href="https://docs.gitlab.com/ee/user/markdown.html#wiki-specific-markdown"> GitLab Flavored Markdown </a> </li> <li> <a href="https://www.sitemaps.org/protocol.html"> sitemaps.org </a> ; www.sitemaps.org </li> <li> <a href="https://www.gyford.com/phil/writing/2015/03/25/wikipedia-parsing/"> Parsing a Wikipedia page's content with python </a> </li> <li> <a href="https://bart.degoe.de/building-a-full-text-search-engine-150-lines-of-code/"> Building a full-text search engine in 150 lines of Python code </a> ; Bart de Goede; bart.degoe.de; 2021-03-24 </li> <li> <a href="https://en.wikipedia.org/wiki/Gensim"> Gensim </a> ; WikiPedia </li> <li> <a href="https://whoosh.readthedocs.io/en/latest/searching.html"> https://whoosh.readthedocs.io/en/latest/searching.html </a> Whoosh - How to search]; whoosh.readthedocs.io </li> <li> <a href="https://pypi.org/project/rank-bm25/"> rank-bm25 0.2.1 </a> ; pypi.org; 2020-06-04 </li> <li> <a href="https://dl.acm.org/doi/10.1145/2682862.2682863"> Improvements to BM25 and Language Models Examined </a> ; Andrew Trotman, Antti Puurula, Blake Burgess; Association for Computing Machinery; DOI: <a href="https://doi.org/10.1145/2682862.2682863"> https://doi.org/10.1145/2682862.2682863 </a> ; <a href="https://www.cs.otago.ac.nz/homepages/andrew/papers/2014-2.pdf"> PDF </a> ; 2014-11-26 </li> <li> <a href="https://datascience.stackexchange.com/questions/89435/what-is-the-difference-between-okapi-bm25-and-nmslib"> What is the difference between Okapi bm25 and NMSLIB? </a> ; Data Science Stack Exchange; 2021-03-01 </li> <li> <a href="https://github.com/mwclient/mwclient/issues/272"> expandtemplates should use "post" instead of "get" · Issue -272 · mwclient-mwclient </a> ; github.com </li> <li> <a href="https://en.wikipedia.org/wiki/Somebody_elses_problem#Douglas_Adams_SEP_field"> Somebody elses problem - Wikipedia </a> ; en.wikipedia.org </li> <li> <a href="https://mariadb.com/kb/en/configuring-mariadb-for-remote-client-access/"> Configuring MariaDB for Remote Client Access </a> ; mariadb.com </li> <li> <a href="https://agate.readthedocs.io/en/latest/index.html"> agate 1.6.3 </a> ; agate.readthedocs.io </li> <li> <a href="https://pandas.pydata.org/pandas-docs/stable/index.html"> pandas documentation </a> ; pandas.pydata.org </li> <li> <a href="https://kanoki.org/2019/08/03/add-new-rows-and-columns-to-pandas-dataframe/"> Add new rows and columns to Pandas dataframe </a> ; kanoki; 2019-08-03 </li> <li> <a href="https://www.w3schools.com/python/pandas/default.asp"> Pandas Tutorial </a> ; www.w3schools.com </li> <li> <a href="https://bioconductor.org/news/bioc_3_7_release/"> Getting Started with Bioconductor 3.7 </a> ; bioconductor.org </li> <li> <a href="http://joejanuszk.com/blog/git-hook-pull-after-push-remote-fatal-not-a-git-repository/"> Git Hook Pull After Push - remote: fatal: Not a git repository: '.' · Joe Januszkiewicz </a> ; Joe Januszkiewicz; 2014-04-03 </li> <li> <a href="https://www.sitemaps.org/protocol.html"> sitemaps.org </a> ; www.sitemaps.org </li> <li> <a href="https://validator.w3.org/feed/docs/rss2.html"> Feed Validation Service </a> ; validator.w3.org </li> <li> <a href="https://www.rssboard.org/rss-specification"> RSS 2.0 Specification </a> ; www.rssboard.org </li> <li> <a href="https://web.resource.org/rss/1.0/modules/content/"> RDF Site Summary 1.0 Modules: Content </a> ; web.resource.org </li> <li> <a href="https://www.rfc-editor.org/rfc/rfc4287.html"> The Atom Syndication Format </a> ; M. Nottingham, R. Sayre; www.rfc-editor.org; DOI: <a href="https://doi.org/10.17487/RFC4287"> https://doi.org/10.17487/RFC4287 </a> ; December, </li> <li> <a href="https://stackoverflow.com/questions/3798863/multiple-channels-in-a-single-rss-xml-is-it-ever-appropriate"> Multiple channels in a single RSS xml - is it ever appropriate? </a> ; , aoeu, aoeu; Stack Overflow; 2010-10-18 </li> <li> <a href="https://stackoverflow.com/questions/15245896/rss-update-single-item"> RSS update single item </a> ; , lou; Stack Overflow; 2013-03-18 </li> <li> <a href="https://www.rssboard.org/news/151/relative-links"> RSS Advisory Board - Relative links </a> ; www.rssboard.org </li> <li> <a href="https://nginx.org/en/docs/http/ngx_http_addition_module.html"> Module ngx_http_addition_module </a> ; nginx.org </li> <li> <a href="https://wildwolf.name/nginx-mitigating-the-breach-vulnerability-with-perl-and-ssi-add-sub-modules/"> nginx: Mitigating the BREACH Vulnerability with Perl and SSI or Addition or Substitution Modules — Wild Wild Wolf </a> ; wwa; Wild Wild Wolf; 2018-09-04 </li> <li> <a href="https://nginx.org/en/docs/http/ngx_http_ssi_module.html"> Module ngx_http_ssi_module </a> ; nginx.org </li> <li> <a href="https://stackoverflow.com/questions/18178084/pandoc-and-foreign-characters"> Pandoc and foreign characters </a> ; , Mike Thomsen, Mike Thomsen; Stack Overflow; 2013-09-05 </li> <li> <a href="https://pandoc.org/MANUAL.html#option--variable"> Pandoc User’s Guide </a> ; pandoc.org </li> <li> <a href="https://learnbyexample.github.io/customizing-pandoc/"> Customizing pandoc to generate beautiful pdf and epub from markdown </a> ; learnbyexample.github.io </li> <li> <a href="https://stackoverflow.com/questions/44177555/how-to-convert-html-to-pdf-using-pandoc"> How to convert HTML to PDF using pandoc? </a> ; , Chris Stryczynski; Stack Overflow; 2017-06-08 </li> <li> <a href="https://doc.courtbouillon.org/weasyprint/stable/"> WeasyPrint </a> ; doc.courtbouillon.org </li> <li> <a href="https://doc.courtbouillon.org/weasyprint/stable/going_further.html"> Going Further </a> ; doc.courtbouillon.org </li> <li> <a href="https://publishing-project.rivendellweb.net/revisting-html-to-pdf-conversion-with-css-paged-media/"> Revisting HTML To PDF Conversion with CSS Paged Media </a> ; carlos; The Publishing Project; 2021-11-15 </li> <li> <a href="https://www.w3.org/TR/css-page-3/#page-context"> CSS Paged Media Module Level 3 </a> ; www.w3.org; 2018-10-18 </li> </ol> <p> </p> </div>]]></content:encoded>
  </item>
  <item>
   <title>How do you translate a recursive acronym?</title>
   <link>https://idee.frank-siebert.de/article/how-do-you-translate-a-recursive-acronym.html</link>
   <pubDate>Sun, 13 Mar 2022 15:51:01 +0000</pubDate>
   <guid isPermaLink="false">https://idee.frank-siebert.de/article/how-do-you-translate-a-recursive-acronym.html</guid>
   <description><![CDATA[I started my page intentionally in German and named it Idee , english Idea , which is the recursive acronym for Idee der eigenen Erkenntnis . Easy to see, the four words, if abbreviated by their respective first letter, read Idee , which is also the first of the four words. ...]]></description>
   <content:encoded><![CDATA[<div> <div> <h1> How do you translate a recursive acronym? </h1> <div> <time datetime="2022-03-13T15:51:01" pubdate="true"> 2022-03-13 </time> <address> Frank Siebert </address> </div> <div> <figure> <a href="https://idee.frank-siebert.de/qrcode/how-do-you-translate-a-recursive-acronym.png"> <img src="https://idee.frank-siebert.de/qrcode/how-do-you-translate-a-recursive-acronym.png"/> </a> <figcaption> </figcaption> </figure> <figure> <a accesskey="p" href="https://idee.frank-siebert.de/pdf/how-do-you-translate-a-recursive-acronym.pdf" target="_blank" type="application/pdf"> <img src="https://idee.frank-siebert.de/image/3cd97bab8bb20288768b35fd72979ec3bbf4b2a8.png"/> </a> </figure> <a href="https://idee.frank-siebert.de/article/creative-commons-cc0-1-0-universal.html"> <img src="https://idee.frank-siebert.de/image/CC-Icon.png"/> </a> <a href="https://idee.frank-siebert.de/article/creative-commons-cc0-1-0-universal.html"> <img src="https://idee.frank-siebert.de/image/CC0-Icon.png"/> </a> </div> </div> <p> I started my page intentionally in German and named it <strong> Idee </strong> , english <strong> Idea </strong> , which is the recursive acronym for <strong> Idee der eigenen Erkenntnis </strong> . Easy to see, the four words, if abbreviated by their respective first letter, read <strong> Idee </strong> , which is also the first of the four words. </p> <p> Now I have to face the fact, that some articles might be written in English. If I do this, the header of the website should also be in English. But I can't translate simply every word into English, because then the result is no longer a recursive acronym. </p> <p> How do you translate a recursive acronym? Recursive acronyms are also called by some bacronym or backronym, a questionable language invention. </p> <p> I tried it, but I cannot get any meaningful long-text out of the abbreviation <strong> Idea </strong> , which would come near to the meaning it should have. </p> <p> It took me quite a while to figure out, what the name of the English version would be. It is <strong> Concept </strong> . </p> <p> <strong> Concept of new cognition elicitation personally thinking </strong> , which translates back into German as <strong> Konzept der neuen Erkenntnisgewinnung durch persönliches Denken </strong> . </p> <p> As you can see, the translation back into German doesn't literate back into a recursive acronym as well. Recursive acronyms do not translate very well. But if a recursive acronym is not translated into a recursive acronym, then the translation is wrong. </p> <p> I'm quite happy with the translation found. Depending on the context <strong> Idee </strong> translates well into <strong> Concept </strong> instead of <strong> Idea </strong> , and the long form of the abbreviation <strong> Concept </strong> contains the meaning of the long form of the abbreviation <strong> Idee </strong> . </p> <p> I'm aware, that the English version might not sound very usual, but this is also the case for the German version. Do you have a better translation? Please let me know. </p> <p> </p> </div>]]></content:encoded>
  </item>
 </channel>
</rss>

