panthema (page 1 of 1 2 3 4 5 6 7 8 9 10 11)

Welcome to panthema.net

This website is a diverse collection of interesting ideas, thus it is panthematic. It contains free open-source software and projects (FOSS), computer science research results, blog articles and more, all created by myself, Timo Bingmann. Over the years, the amount of information, source code and other content has grown rather large. All entries are ordered chronologically in the weblog, with some special projects highlighted in the following summary:

Computer Science Research

since 2018
COBS (in development)
COBS: Our Compact Bit-Sliced Signature Index is a cross-over between inverted indices and Bloom filters for fast approximate search.
since 2015-08-22
Thrill (in development)
A C++ framework for distributed Big Data computations with emphasis on high performance and a convenient interface like Apache Spark or Flink.
since 2018-05-28
TLX
Collection of Sophisticated C++ Data Structures, Algorithms, and Miscellaneous Helpers
2015-04-03
External Memory Algorithms
Additionally to maintaining STXXL, we also developed a bulk-parallel priority queue for EM.
2014-03-09
Parallel String Sorting
Experimental implementations of many string sorting algorithms, including Parallel Super Scalar String Sample Sort (pS5) and Parallel Multiway LCP-Mergesort
2012-11-19
External Memory Suffix Sorting
Experimental implementation of eSAIS and DC3, two suffix and LCP array construction algorithms for external memory, using STXXL.

Open-Source Projects

2020-06-26
disk-filltest 0.8
Tool to detect bad disks by filling with random data
2020-01-20
digup 0.6.50
Console tools to verify file integrity by updating MD5 and SHA digest files
2019-08-01
SqlPlotTools
Automatically import key=value experimental RESULTS into SQL tables and generate plots from them.
2014-10-29
STXXL 1.4
Library of external memory algorithms, including external block paging and efficient external sorting
2014-09-19
malloc_count 0.7.1
Simple tool for run-time memory usage analysis and profiling under Linux
2014-05-15
The Sound of Sorting 0.6.5
Viral "audibilization" and visualization of sorting algorithms
2013-12-12
pmbw 0.6.2
Benchmark tool for parallel memory bandwidth / measurement under Linux
2013-05-05
STX B+ Tree 0.9
Main memory B+ tree implementation with STL compatible interfaces
2011-01-30
CryptoTE 0.5.390
Text editor with transparent strong encryption, useful for password lists and more.
2009-09-05
Flex Bison C++ Example 0.1.4
Example of using GNU flex and bison in a C++ program.

Miscellaneous Weblog Posts

2019-08-02  BlinkenSort with Sound - The Sound of LED Sorting Algorithms with Raspberry Pi 3 and APA102 or SK9822 LEDs
2019-08-01  BlinkenSort - Sorting Algorithms on LEDs with ESP8266 and SK6812 or WS2812B
2019-06-19  Uniserv Research-Prize "Algorithms for Efficient Data-Processing" for my Dissertation
2018-07-03  Dissertation "Scalable String and Suffix Sorting: Algorithms, Techniques, and Tools"
2016-01-14  "On the Structure of the Graph of Unique Symmetric Base Exchanges of Bispanning Graphs" - Diploma Thesis in Mathematics
2014-10-26  1.000.000 Views of Sound of Sorting YouTube Video
2014-06-22  Recording of a Talk "STXXL 1.4.0 and Beyond"
2013-10-24  Sound of Sorting: Viral Video on KIT Informatik Webpage
2013-05-06  STX B+ Tree Speed Test Measurements on Raspberry Pi (Model B)
2013-05-05  STX B+ Tree Measuring Memory Usage with malloc_count
2013-01-24  Coding Tricks 101: How to Save the Assembler Code Generated by GCC
2008-09-01  C++ Code Snippet - Print Stack Backtrace Programmatically with Demangled Function Names
2007-03-28  C++ Code Snippet - Compressing STL Strings with zlib

Weblog

Thrill Tutorial Title Slide

Thrill YouTube Tutorial: High-Performance Algorithmic Distributed Computing with C++

Posted on 2020-06-01 11:13 by Timo Bingmann at Permlink with 0 Comments. Tags: talk university thrill

This post announces the completion of my new tutorial presentation and YouTube video: "Thrill Tutorial: High-Performance Algorithmic Distributed Computing with C++".

YouTube video link: https://youtu.be/UxW5YyETLXo (2h 41min)

Slide Deck: slides-20200601-thrill-tutorial.pdf slides-20200601-thrill-tutorial.pdf (21.3 MiB) (114 slides)

In this tutorial we present our new distributed Big Data processing framework called Thrill (https://project-thrill.org). It is a C++ framework consisting of a set of basic scalable algorithmic primitives like mapping, reducing, sorting, merging, joining, and additional MPI-like collectives. This set of primitives can be combined into larger more complex algorithms, such as WordCount, PageRank, and suffix sorting. Such compounded algorithms can then be run on very large inputs using a distributed computing cluster with external memory.

After introducing the audience to Thrill we guide participants through the initial steps of downloading and compiling the software package. The tutorial then continues to give an overview of the challenges of programming real distributed machines and models and frameworks for achieving this goal. With these foundations, Thrill's DIA programming model is introduced with an extensive listing of DIA operations and how to actually use them. The participants are then given a set of small example tasks to gain hands-on experience with DIAs.

After the hands-on session, the tutorial continues with more details on how to run Thrill programs on clusters and how to generate execution profiles. Then, deeper details of Thrill's internal software layers are discussed to advance the participants' mental model of how Thrill executes DIA operations. The final hands-on tutorial is designed as a concerted group effort to implement K-means clustering for 2D points.

The video on YouTube (https://youtu.be/UxW5YyETLXo) contains both presentation and live-coding sessions. It has a high information density and covers many topics.

Table of Contents

This article continues on the next page ...

Distributed Merge String Sort

IPDPS Paper "Communication-Efficient String Sorting" and Talk Recording

Posted on 2020-05-18 15:30 by Timo Bingmann at Permlink with 0 Comments. Tags: talk university

Due to the coronavirus this year's IPDPS conference is held in a virtual fashion, and we sadly missed a chance to visit New Orleans. Instead, I recorded a YouTube video of our conference talk, because the slides usually are illustrations requiring more explanation.

The paper "Communication-Efficient String Sorting", which I coauthored with Peter Sanders and Matthias Schimek, will still be published in the IEEE proceedings. A preprint of the full paper is available on arXiv:2001.08516 and also from this webpage:

2001.08516v1-Communication-Efficient-String-Sorting.pdf 2001.08516v1-Communication-Efficient-String-Sorting.pdf,

There are two versions of the slides: the longer presentation version slides-20200518-distributed-string-sorting-ipdps.pdf below used for the YouTube recording and a short ten page teaser version slides-20200518-distributed-string-sorting-ipdps-short.pdf for the virtual conference.

Download slides-20200518-distributed-string-sorting-ipdps.pdf

The source code and more documentation about the implementations of our communication-efficient distributed string sorting algorithms can be found on my GitHub repository https://github.com/bingmann/distributed-string-sorting.

Matthias Schimek's master thesis, on which this paper and presentation are based on, can be downloaded from this website 2019_Schimek_Distributed_String_Sorting_Algorithms.pdf as well.

Below you can watch the video recording of my presentation, or head over to YouTube: https://youtu.be/uWro8fsfs5I.

This article continues on the next page ...

Cover images of Youtube lecture series

List of Recordings of Lectures and Exercises on YouTube

Posted on 2020-01-28 01:05 by Timo Bingmann at Permlink with 0 Comments. Tags: university

This post features a list of videos on Youtube of lectures and exercises which I have given at the KIT. The recordings are all in German and were produced semi-automatically by the Center for Media-Learning (Zentrum für Mediales Lernen) at the KIT.

I have listed various entry time points for topics in the lectures for an easier overview. These entry points are only to those parts of the lectures or exercises which I personally presented. The lecture series are mainly presented by Prof Peter Sanders and exercises were jointly given with colleagues.

The main purpose of this article is that I can never seem to find the lectures on Youtube when I am looking for them. And I can now point people to this post if I want to reference some video explanation on a topic I previously gave.

This article continues on the next page ...

Plot of Lines of Code

Lines of Code Plotted over Time of Some Large and Some Small Projects

Posted on 2019-11-22 10:30 by Timo Bingmann at Permlink with 0 Comments. Tags: utilities

This article contains a chart of code line statistics over time extracted from a set of popular and less known git repositories. The git repositories are read with a script and each tag (release/version) is checked out and analyzed. If there are too few tags or versions (< 20), then for each beginning of a month the current commit is checked out analyzed. For each of these commit trees, the number of lines of code is determined with cloc (version 1.85).

Of course one can argue about whether lines of code is a useful metric at all. Depending on programming language and style the density of code will vary a lot. Furthermore, many projects over time have accidentally or purposefully included external library code (e.g. boost or similar packages) which greatly inflates their metric. And of course long source code may not actually accomplish much compared to a small algorithmic code. Nevertheless I believe the lines of code still somewhat valuable to compare the amount of developer time spent on the open-source projects.

This article continues on the next page ...

COBS data structure

Presentation "COBS: A Compact Bit-Sliced Signature Index" at SPIRE 2019 (Best Paper Award)

Posted on 2019-10-08 15:30 by Timo Bingmann at Permlink with 0 Comments. Tags: talk university

Today, I presented our paper "COBS: A Compact Bit-Sliced Signature Index" at the 26th International Symposium on String Processing and Information Retrieval (SPIRE) 2019 in Segovia, Spain. The full paper is available from this webpage:

paper-COBS-A-Compact-Bit-Sliced-Signature-Index.pdf paper-COBS-A-Compact-Bit-Sliced-Signature-Index.pdf,

from the Springer proceedings in LNCS (same content, somewhat differently typeset), and also from arXiv:1905.09624.

I am honored that our paper won the best paper award at SPIRE'19: bestPaperAward.pdf bestPaperAward.pdf

The slides of my presentation at the SPIRE conference are available here: slides-20191008-cobs-spire.pdf slides-20191008-cobs-spire.pdf.

The source code and more documentation about COBS can be found by following this link.

Update 2019-11-06: COBS now has a Python interface and can be downloaded from PyPI.

Download slides-20191008-cobs-spire.pdf

Abstract

We present COBS, a COmpact Bit-sliced Signature index, which is a cross-over between an inverted index and Bloom filters. Our target application is to index k-mers of DNA samples or q-grams from text documents and process approximate pattern matching queries on the corpus with a user-chosen coverage threshold. Query results may contain a number of false positives which decreases exponentially with the query length. We compare COBS to seven other index software packages on 100000 microbial DNA samples. COBS' compact but simple data structure outperforms the other indexes in construction time and query performance with Mantis by Pandey et al. in second place. However, unlike Mantis and other previous work, COBS does not need the complete index in RAM and is thus designed to scale to larger document sets.


BlinkenSort with Sound on Raspberry Pi 3 with APA102 or SK9822 LEDs

BlinkenSort with Sound - The Sound of LED Sorting Algorithms with Raspberry Pi 3 and APA102 or SK9822 LEDs

Posted on 2019-08-02 19:00 by Timo Bingmann at Permlink with 0 Comments. Tags: sorting electronics LEDs sound of sorting frontpage blinken-algorithms

Once BlinkenSort successfully showed fascinatingly complex sorting algorithms on an LED strip using an ESP8266 (see other article), I naturally ventured to add the sound output from the Sound of Sorting program. This turned out much harder than initially thought, because the sound generation software required more processing power than available in the cheap standard microcontrollers. After much trail and error, I found out that a sufficiently new Raspberry Pi (I happened to have a Pi 3 model B) has the rare combination of enough compute power, can drive an LED strip directly via SPI, and has an audio line-out.

The Raspberry Pi, however, cannot drive the same type of LED strip reliably as the ESP8266 can, because the Raspberry Pi runs a time-shared Linux system instead of a real-time program. But it can drive the more expensive APA102 or SK9822 LEDs which have a separate clock line. These are 4-pin LED strips with clock and data, which can easily be attached to the Pi's SPI output pins. Furthermore, the APA102 LEDs can be driven at a much higher refresh rate than the WS2812B and SK6812, due to the extra clock signal. This ultimately makes the sorting animations even smoother than with the ESP8266 (where the frame rate is already unnoticeable). I could not find any off-the-shelf library to drive the APA102 with a C++ program on the Pi, but it was trivial to write a frame buffer class and access the /dev/spi0.0 devices directly.

Then there was the question of adding a display. And after more experimentation, I found the MAX7219 dot LED matrix modules work well. These can also be driven by SPI from the Pi, which actually has two SPI outputs on the models with 40 pins. And as a bonus these dot LED matrix models can pull off an amazing frame rate. That means that besides showing the algorithm name, the super-fast refresh rate enables displaying of (nearly) every comparison counter increment, despite the human viewer only being able to see around 25 changes per second. Simply adding a HDMI display to the Pi would have also worked, but the LED matrix is cooler and has a retro feeling to it.

As you can see in the following YouTube video, each algorithm does something quite different which makes this a very interesting art installation with deep connections to informatics. BlinkenSort currently contains the following eighteen sorting algorithms (listed in the same order as in the video): MergeSort, Insertion Sort, QuickSort (LR) Hoare, QuickSort (LL) Lomoto, QuickSort Dual Pivot, ShellSort, HeapSort, CycleSort, RadixSort-MSD (High First), RadixSort-LSD (Low First), std::sort, std::stable_sort, WikiSort, TimSort, Selection Sort, Bubble Sort, Cocktail-Shaker Sort, and BozoSort. Besides sorting algorithms the collection also contains four hash table implementations: Linear Probing Hash Table, Quadratic Probing Hash Table, Cuckoo-Hashing with two places, and Cuckoo-Hashing with three places.

The video above (https://youtu.be/kAjQ8shElP8) shows the LED strip with sound in action and I added a voice-over commentary about the algorithms. There is also a second YouTube video available, without my commentary.

All source code for the sorting algorithms and other Neopixel animations is available from Github:
https://github.com/bingmann/BlinkenAlgorithms.git

This article continues on the next page ...

BlinkenSort with ESP8266 and SK6812 LEDs

BlinkenSort - Sorting Algorithms on LEDs with ESP8266 and SK6812 or WS2812B

Posted on 2019-08-01 19:00 by Timo Bingmann at Permlink with 2 Comments. Tags: sorting electronics LEDs sound of sorting frontpage blinken-algorithms

About two years ago I first got my hands on one of the fancy addressable "Neopixel" LED strips, which can be programmed to show colorful animations. I quickly saw there was one project I just had to do with these strips: use them to display sorting algorithms as animations. Since my Sound of Sorting project already contained the source code for many algorithms, the step to using an addressable LED strip as a "display" was not a large one. The strips can be animated using microcontrollers such as the Arduino or Espressif ESP chips, which have way enough compute power for running sorting algorithms on a few hundred items. Since all sorting algorithms run on random input data and may make random decisions themselves, the shown animations are near infinitely varying and fascinatingly complex.

The outcome of my project of displaying sorting algorithms on LED strips are two pieces of art:

  • "BlinkenSort", which are sorting algorithms animated on SK6812 RGBW LEDs using an ESP8266 microcontroller, is described in this article.
  • "BlinkenSort with Sound", which are sorting algorithms animated using APA102 RGB LEDs and a Raspberry Pi 3 with real-time sound, is detailed in a second blog article!

For both projects I created a YouTube video and provide a construction manual on this website if you are interested in the internals or want to build something similar.

This article continues on the next page ...

Photo of the Uniserv prize and my dissertation

(Foto: Andreas Drollinger, KIT)

Uniserv Research-Prize "Algorithms for Efficient Data-Processing" for my Dissertation

Posted on 2019-06-19 19:00 by Timo Bingmann at Permlink with 1 Comments. Tags: dissertation university frontpage

Today was the ceremonial graduation day on which new bachelors, masters, and also Dr.s (PhDs in the Anglo-Saxon world) were celebrated who received their degrees from the department of informatics at the Karlsruhe Institute of Technology (KIT) within the last year. This year's graduation day coincided with the 50th anniversary of the introduction of computer science as a field of study and as a diploma degree.

I was among those honoured for completing the Dr. last year, (see my dissertation page). In total 44 freshly minted Dr.s, 292 masters, and 261 bachelors where celebrated today.

Furthermore, I was awarded the Uniserv Research-Prize "Algorithms for Efficient Data-Processing" for the best dissertation in the field of fast algorithms in the academic year 2017/2018 at the KIT department of informatics by the talent committee of the department.

I would like to thank Uniserv for endowing the department and my dissertation with this prize and especially the talent committee for selecting my work. It was a great and pleasant surprise to receive such a notification in the morning email inbox, which as usual contained many "prize and distant inheritance" emails. But on that day one of them was real, and I nearly overlooked it.

My dissertation was the central purpose in my life for many years, and receiving the prize helps me feel the time was well spent and consider it as confirmation of having furthered the field of informatics a tiny bit. It is rare to receive such honours and I am truly grateful.

Uniserv wrote a much longer honorific press release in German.


YouTube Video: "Animation of the US Treasury Yield Curve with Inversions from 1962-01-01 to 2019-04-01"

Posted on 2019-04-03 23:30 by Timo Bingmann at Permlink with 0 Comments. Tags: market fun

Today I published the first animation of market data by my new charting tool on Youtube. In light of recent events this was an video of inversions of the US Treasury Yield Curve. The video is created with QCustomPlot and a large Qt program to process data.

The dancing green line plots the yields of all constant maturity treasury notes. The trailing blue line shows the efficient 30 day Federal Funds rate. The orange line is the broad stock market SPX index. The background of each day is painted red if an inversion of the 1y/5y, 1y/10y, or 2y/10y yields occurs. The darker the red color, the greater the inversion.

Watch the yield curve and the stock market index change over the decades, notice their behaviour in times of crisis. The video ends with the current inversion around April 2019. More about the yield curve and inversions: https://en.wikipedia.org/wiki/Yield_curve#Inverted_yield_curve

The Federal Funds rate data was taken from FRED (Federal Reserve Bank of St. Louis), series DFF: https://fred.stlouisfed.org/series/DFF. The US Treasury yields data was also taken from FRED, series DGS1MO, DGS3MO, DGS1, DGS2, DGS5, DGS7, DGS10, DGS20, DGS30, e.g. https://fred.stlouisfed.org/series/DGS30. Across the decades the various durations were sometimes not emitted, which is visible by points being added or removed from the green curve. SPX data was taken from Stooq.com, series ^SPX.


Photo of a Samsung NVMe SSD

NVMe "Disk" Bandwidth and Latency for Batched Block Requests

Posted on 2019-03-22 16:00 by Timo Bingmann at Permlink with 0 Comments. Tags: c++ stxxl thrill

Last week I had the pleasure of being invited to the Dagstuhl seminar 19111 on Theoretical Models of Storage Systems. I gave a talk on the history of STXXL and Thrill, but also wanted to include some current developments. Most interesting I found is the gap closing between RAM and disk bandwidth due to the (relatively) new Non-Volatile Memory Express (NVMe) storage devices.

Since I am involved in many projects using external memory, I decided to perform a simple set of fundamental experiments to compare rotational disks and newer solid-state devices (SSDs). The results were interesting enough to write this blog article about.

Among the tools of STXXL/FOXXLL there are two benchmarks which perform two distinct access patterns: Scan (benchmark_disks) and Random (benchmark_disks_random).

  • In Scan a batch of k sequential blocks of size B are read or written in order.
    Batched Scanning Pattern
  • In Random a batch of k randomly selected blocks of size B from a span of size N are read or written.
    Batched Random Access Pattern

The Scan experiment is probably the fastest access method as it reads or writes the disk (actually: storage device) sequentially. The Random experiment is good to determine the access latency of the disk as it first has to seek to the block and then transfer the data. Notice that the Random experiment does batched block accesses like one would perform in a query/answering system where the next set of random blocks depends on calculations performed with the preceding blocks (like in a B-Tree). This is a different experiment than done by most "throughput" measurement tools which issue a continuous stream of random block accesses.

This article continues on the next page ...

Show Page: 1 2 3 4 5 6 7 8 9 10 11 Next >