random bits of computer info

Friday, March 21, 2025

SQL ANTI JOIN vs NOT IN vs NOT EXISTS in Rails and ActiveRecord

SQL is not my specialty but from time to time it is necessary to pull off something more elaborate.

My way to explain ANTI JOIN is to find records in a table that lack an associated record in another.

Lets imagine we have a table "Messages" and a table "MessageRecipients". We want to find out which Messages do not have any recipient records or if they have, that all of them are marked as deleted.

But lets first look at a simpler way to achieve this

Using NOT IN

Look at this relatively easy to understand scope

```
scope :stale, -> do
where.not(id: MessageRecipient.not_deleted.select(:message_id))
end
```

The `non_deleted` scope results in `deleted_at IS NULL`, not important to include here.

This results in the following SQL:

```
SELECT `messages`.* FROM `messages` WHERE `messages`.`id` NOT IN (SELECT `message_recipients`.`message_id` FROM `message_recipients` WHERE `message_recipients`.`deleted_at` IS NULL)
```

```
| -> Nested loop antijoin (cost=46.1e+12 rows=461e+12)
   -> Table scan on messages (cost=2.3e+6 rows=21.5e+6)
   -> Single-row index lookup on <subquery2> using <auto_distinct_key> (message
_id=messages.id) (cost=4.32e+6..4.32e+6 rows=1)
       -> Materialize with deduplication (cost=4.32e+6..4.32e+6 rows=21.5e+6)
           -> Filter: (message_recipients.message_id is not null) (cost=2.17e+
6 rows=21.5e+6)
               -> Filter: (message_recipients.deleted_at is null) (cost=2.17e+
6 rows=21.5e+6)
                   -> Table scan on message_recipients (cost=2.17e+6 rows=21.5
e+6)
```

You see that it needs a materialized view. `message_recipients` is scanned without an index.

But lets try another better understood construct.

Using NOT EXISTS

This is how we can define this with baby_squeel

```
scope :stale, -> do
where.has { not_exists(MessageRecipient.not_deleted.select(:id).where.has { message_id == BabySqueel[:messages].id }) }
end
```

The resulting SQL is

```
SELECT `messages`.* FROM `messages` WHERE NOT EXISTS(SELECT `message_recipients`.`id` FROM `message_recipients` WHERE `message_recipients`.`deleted_at` IS NULL AND `message_recipients`.`message_id` = `messages`.`id`)
```

This has exactly the same result and exactly the same optimizer explanation so still not ideal. MySQL is documented to convert NOT EXISTS in anti-join but in this case, perhaps because of the additional `deleted_at` condition, it doesn't do so.

Lets assume that a future version of MySQL might be able to optimize this properly. In the mean time though, let us help it but instead

Using LEFT OUTER JOIN

We can define the scope again with the help of baby_squeel

```
scope :stale, -> do
joining { recipients.on((recipients.message_id == id) & recipients.sift(:not_deleted)).outer }.
where.has { BabySqueel[:message_recipients].id == nil }
```

Basically we LEFT OUTER JOIN the `recipients` association which has a class `MessageRecipient`. We only JOIN ON recipients which are `not_deleted`. But then we filter only records that do not have a joined message recipient.

That's how MySQL recognizes that it can use an antijoin.

We use `BabySqueel[:message_recipients].id` instead of `recipients.id` because it translates to something wrong, maybe a bug or a deficiency of baby_squeel.

The resulting SQL is

```
SELECT `messages`.* FROM `messages` LEFT OUTER JOIN `message_recipients` ON
message_recipients`.`message_id` = `messages`.`id` AND `message_recipients`.`deleted_at` IS
NULL WHERE `message_recipients`.`id` IS NULL
```

Now optimizer explanation is

In TREE format

```
-> Filter: (message_recipients.id is null) (cost=9.87e+6 rows=2.16e+6)
   -> Nested loop antijoin (cost=9.87e+6 rows=2.16e+6)
       -> Table scan on messages (cost=2.3e+6 rows=21.5e+6)
       -> Filter: (message_recipients.deleted_at is null) (cost=0.252 rows=0.1
01)
           -> Index lookup on message_recipients using index_message_recipients
_on_message_id_and_kind (message_id=messages.id) (cost=0.252 rows=1.01)
```

It now uses index instead of a table scan on `message_recipients`. And overall SIMPLE select types, everything much simpler. On my server and data, the query works twice as fast as the other ones.

Excuse me for poor formatting. I hopefully get back to fix it later. I thought it may save someones time.

Monday, December 16, 2024

Minimal Viable Product

I know it is hard for young programmers right after university to adapt to real work environment. Business requirements often times are very different from what we study.

One concept that's hard to grasp initially but very important for success is the Minimal Viable Product.

I was sent this illustration in the morning that should make the concept clear as day

I hope you found this blog post most useful and have a great day!

Sunday, April 7, 2024

Zenbleed microcode update on AMD Ryzen 5 3600

tl;dr; not worth the effort but an interesting exercise

Overview of updating microcode on Linux

Figure out your CPUID and current microcode version
Check for a newer version and download it
Integrate it in your boot process

This example will be useful for other AMD CPUs as well. Some parts should be useful for Intel CPUs, but there is generally more information about them elsewhere anyway.

Getting the CPUID

I couldn't figure out a nice way to figure out your CPUID. Initially I was confused by the output of `dmidecode` for the "Processor" section. Where "ID" read as "10 0F 87 00 FF FB 8B 17". It turned out though that it has nothing to do with the actual CPUID that is needed to obtain and upload microcode.

What I did was to open http://instlatx64.atw.hu/ which was linked from MC Extractor README and find my CPU by model name "Ryzen 5 3600". So I saw the proper CPUID. MC Extractor is used to extract and analyze microcode from BIOS images and other sources so it is an interesting project to explore further.

In this case the CPUID is "00870F10". After the fact, I could figure out that this ID can also be obtained by `cpuid -r` looking at the `eax` register at `0x00000001` and `0x80000001`. But I can't say that this is universal. You can check more details about the CPUID instruction itself.

Getting current microcode version

This must be much more straightforward. Current microcode you can see in the BIOS settings, `dmidecode`, `/proc/cpuinfo`, `cpuid` (on windows hwinfo and CPUz showed it too). It was 8701030 for me (motherboard has AGESA 1.2.0.B, while the zenbleed fix should be in 1.2.0.C, see AMD bulletin).

Also with `dmesg`

```
$ sudo dmesg | grep -i microcode
[ 0.431255] Zenbleed: please update your microcode for the most optimal fix
[ 1.127054] microcode: Current revision: 0x08701030
```

Finding a newer version

You can look at the excellent CPUMicrocodes repo. In the `AMD` folder you can search for files matching your CPUID. I found and donloaded this one cpu00870F10_ver08701033_2023-10-06_E71C3D44.bin

It is not in a format understandable by Linux kernel though. So it has to be packaged appropriately.

Packaging as Kernel microcode

First you need to clone and compile amd-ucodegen

```
$ git clone https://github.com/AndyLavr/amd-ucodegen.git
$ cd amd-ucodegen
$ make
$ ./amd-ucodegen -o ~/packaged-08701033.bin cpu00870F10_ver08701033_2023-10-06_E71C3D44.bin
```

Now you need to check your CPU model. for me this is 23 decimal and 17 hex. So I have to either integrate this file into the existing `/usr/lib/firmware/amd-ucode/microcode_amd_fam17h.bin` or just overwrite it. For completeness, lets see how to integrate into it.

```
$ git clone https://github.com/AMDESE/amd_ucode_info.git
$ sudo mv /usr/lib/firmware/amd-ucode/microcode_amd_fam17h.bin /usr/lib/firmware/amd-ucode/microcode_amd_fam17h.bin_bak
$ sudo python amd_ucode_info/amd_ucode_info.py -m /usr/lib/firmware/amd-ucode/microcode_amd_fam17h.bin /usr/lib/firmware/amd-ucode/microcode_amd_fam17h.bin_bak packaged-08701033.bin
```

Verify operation with
```
$ python amd_ucode_info/amd_ucode_info.py /usr/lib/firmware/amd-ucode/microcode_amd_fam17h.bin
Microcode patches in /usr/lib/firmware/amd-ucode/microcode_amd_fam17h.bin:
Family=0x17 Model=0x08 Stepping=0x02: Patch=0x0800820d Length=3200 bytes
Family=0x17 Model=0x31 Stepping=0x00: Patch=0x0830107b Length=3200 bytes
Family=0x17 Model=0xa0 Stepping=0x00: Patch=0x08a00008 Length=3200 bytes
Family=0x17 Model=0x01 Stepping=0x02: Patch=0x0800126e Length=3200 bytes
Family=0x17 Model=0x71 Stepping=0x00: Patch=0x08701033 Length=3200 bytes
```

Integrate it in your boot process

Different linux distributions use different approaches to install early microcode. You can check archwiki for information. On Fedora 39 I had to just call `dracut -f` and the magic was done.

btw an alternative to all this is to integrate the original raw microcode into your mainboard BIOS image. There are some howtos about it. But I think it is more dangerous than loading on boot.

Verifying the result

```
$ sudo dmesg | grep -i microcode
[ 1.126852] microcode: Current revision: 0x08701033
[ 1.126854] microcode: Updated early from: 0x08701030
```

You can see the difference with the previous call. No mention of Zenbleed anymore.

Performance effect

The performance effect is hard to measure and is prone to statistical error. But it appears that single-core performance is slightly better while multi-core performance is slightly worse. The penalty in multi-thread is slightly worse than the positive effect in single-thread performance.

See my scientific Geekbench results:

Obviously things depend on your workload. Biggest margin has

Photo filter ~ 6%
Text and PDF processing ~ 4%

HTH