Invisible link to canonical for Microformats

Implementing group_by count in Liquid

See the magic happening


Liquid is an open-source template language created by Shopify back in 2006 and written in Ruby. It is widely used by several frameworks, with Jekyll being one of the most famous.

This website is created using Jekyll, specifically my Jekyll template Chulapa (link).

Some time ago, @cargocultprogramming opened dieghernan/chulapa#29 because one of the components of the theme was broken in Jekyll =>4.1.0. Digging a bit, I saw jekyll/jekyll#8214, exposing the same issue. What seemed to be a feature was indeed a bug that some developers were exploiting.

The change is that when applying the group_by Liquid filter on an array, it used to produce a “grouped” version of the array, while on Jekyll =>4.1.0 it produces a different result that can’t be used in the same way.



{% assign alldocs = site.exercises %}
{% assign grouptag = alldocs | map: 'tags' | join: ',' | split: ',' | group_by: tag %}

{{ grouptag }}

<!-- Jekyll < 4.1.0 result -->
{"name"=>"tag A", "items"=>["tag A"], "size"=>1}{"name"=>"Tag B", "items"=>["Tag B"], "size"=>1}{"name"=>"Virtualbox", "items"=>["Virtualbox"], "size"=>1}{"name"=>"netcat", "items"=>["netcat"], "size"=>1}{"name"=>"whois", "items"=>["whois"], "size"=>1}{"name"=>"dig", "items"=>["dig"], "size"=>1} ... {"name"=>"Hydra", "items"=>["Hydra"], "size"=>1}

<!-- Jekyll >= 4.1.0 result -->
{"name"=>"", "items"=>["tag A", "Tag B", "Virtualbox", "netcat", "whois", "dig", ... , "Hydra"], "size"=>26}


So basically, counting items was not easy anymore. I developed a solution in pure Liquid (which happens to be a quite verbose language out of the predefined filters) that is compatible with any Jekyll version.

The algorithm is now implemented in Chulapa. You can check the results on my /tags page.

Note that the tables produced in the example are taken from my live site, hence they may change as I add more posts. The results of the table should be the same as the order and number of tags displayed on the /tags page.

Alternative group_by with Liquid

First, we define an array of all the tags included in the documents of my site:


{% assign alldocs = site.documents %}
{% assign alltags = alldocs | map: 'tags' | join: ',' | split: ',' %}

Cool! Now we can count the number of unique elements in alltags by counting the occurrences of unique tags in the array:

<!-- Allocating array to group_by: replacement -->

<!-- Unique values -->
{% assign single_tags = alltags | uniq %}

<!-- Arrays to populate -->
{% assign count_tags = '' | split: ',' %}

<!-- Iterator 0 to number of unique tags - 1 (size = number of unique tags) -->
{% assign n_tags = single_tags | size | minus: 1 %}

{% for i in (0..n_tags) %}
<!-- Populate -->
  {% assign count_this_tag = alltags | where_exp:"item", "item == single_tags[i]" | size %}
  {% assign count_tags = count_tags | push: count_this_tag %}
{% endfor %}

<!-- Display single_tags and count_tags as a table -->
<table>
  <caption>Display count of tags on this site </caption>
  <tr>
    <th>Tag</th>
    <th>Count</th>
  </tr>
  {% for i in (0..n_tags) %}
    <tr>
      <td>{{ single_tags[i] }}</td>
      <td>{{ count_tags[i] }}</td>
    </tr>
  {% endfor %}
</table>

See results
Display count of tags on this site
Tag Count
r_bloggers 20
rstats 21
rspatial 21
sf 12
maps 25
vignette 1
rnaturalearth 3
function 4
leaflet 5
jekyll 2
html 2
beautiful_maps 7
giscoR 9
raster 1
flags 4
mapSpain 3
Wikipedia 1
cartography 4
svg 1
inset 3
r_package 5
classInt 1
terra 6
rasterpic 1
ggplot2 8
tmap 1
mapsf 1
discontinued 8
project 11
R 5
python 2
guest-author 1
COVID19 3
ggridges 1
tidyterra 4
maptiles 1
s2 1
astronomy 2
celestial 2
geojson 2
gpkg 2
resmush 1
liquid 1
chulapa 1
pebble 4
watchface 4
javascript 4
C 4
webscrapping 1
dataset 2
csv 2
json 1
twitter 1

Sorting

How to rank the tags by the number of occurrences? We can set the maximum number of occurrences and loop in reverse order. The ranked array would be populated if a tag presents the number of occurrences in the main loop:

<!-- Used in https://github.com/mmistakes/minimal-mistakes/blob/master/_includes/posts-taxonomy.html -->

{% assign items_max = count_tags | sort | last %}
{% assign sorted_tags = '' | split: ',' %}
{% assign sorted_count_tags = '' | split: ',' %}

{% for i in (1..items_max) reversed %}
  {% for j in (0..n_tags) %}
    {% if count_tags[j] == i %}
     {% assign sorted_tags = sorted_tags | push: single_tags[j] %}
     {% assign sorted_count_tags = sorted_count_tags | push: i %}
    {% endif %}
  {% endfor %}
{% endfor %}

{% assign sorted_tags = sorted_tags | uniq %}

<table>
  <caption>Display sorted count of tags on this site </caption>
  <tr>
    <th>Tag</th>
    <th>Count (desc sorted)</th>
  </tr>
  {%- for i in (0..n_tags) %}
    <tr>
      <td>{{ sorted_tags[i] }}</td>
      <td>{{ sorted_count_tags[i] }}</td>
    </tr>
  {%- endfor -%}
</table>

See results
Display sorted count of tags on this site
Tag Count (desc sorted)
maps 25
rstats 21
rspatial 21
r_bloggers 20
sf 12
project 11
giscoR 9
ggplot2 8
discontinued 8
beautiful_maps 7
terra 6
leaflet 5
r_package 5
R 5
function 4
flags 4
cartography 4
tidyterra 4
pebble 4
watchface 4
javascript 4
C 4
rnaturalearth 3
mapSpain 3
inset 3
COVID19 3
jekyll 2
html 2
python 2
astronomy 2
celestial 2
geojson 2
gpkg 2
dataset 2
csv 2
vignette 1
raster 1
Wikipedia 1
svg 1
classInt 1
rasterpic 1
tmap 1
mapsf 1
guest-author 1
ggridges 1
maptiles 1
s2 1
resmush 1
liquid 1
chulapa 1
webscrapping 1
json 1
twitter 1

Bottom line

Done! Here you have a clean version of the algorithm:


{% assign alldocs = site.documents %}
{% assign alltags = alldocs | map: 'tags' | join: ',' | split: ',' %}
{% assign single_tags = alltags | uniq %}

<!-- Counting -->
{% assign count_tags = '' | split: ',' %}
{% assign n_tags = single_tags | size | minus: 1 %}
{% for i in (0..n_tags) %}
  {% assign count_this_tag = alltags | where_exp:"item", "item == single_tags[i]" | size %}
  {% assign count_tags = count_tags | push: count_this_tag %}
{% endfor %}

<!-- Extra: sort -->
{% assign items_max = count_tags | sort | last %}
{% assign sorted_tags = '' | split: ',' %}
{% assign sorted_count_tags = '' | split: ',' %}

{% for i in (1..items_max) reversed %}
  {% for j in (0..n_tags) %}
    {% if count_tags[j] == i %}
     {% assign sorted_tags = sorted_tags | push: single_tags[j] %}
     {% assign sorted_count_tags = sorted_count_tags | push: i %}
    {% endif %}
  {% endfor %}
{% endfor %}

{% assign sorted_tags = sorted_tags | uniq %}


Related posts