Terminator 1 is the best Terminator movie

November 14, 2022, 4:00 pm

≫ Next: Can ChatGPT debug my code? Can it write better code?

≪ Previous: Fast zero copy static web server with KTLS

And now for something completely different.

I’ve off and on thought about this for years, so it needed to be written down.

Terminator 1 is the best Terminator movie

Obviously SPOILERS, for basically all Terminator movies.

Summary of reasons

The robot is really not human.
It’s a proper time loop, with a bonus that none of the players in the movie know it.

I’m aware of The Terminator Wiki, but I don’t care about it. My opinions are on the movies as movies.

The behavior of the terminator

In Terminator 1 (T1) Arnold is clearly a robot in human skin. At no point do you believe it’s a human. The only reason people don’t stop and scream and point, is that “I’m being silly, that’s clearly impossible”. But Arnold spends the whole movie in the uncanny valley, the kind in 2022 reserved for realistically generated CGI characters.

It’s very nearly a perfect movie. Just take his first dialog. “Nice night for a walk”, the punks say. They are saying this to a machine that has never talked to a human before, so its response is complete nonsense. It just repeats the words back to them.

It’s a complete alien.

And then “Your clothes. Give them to me. Now”. It’s completely broken grammar from something that has never used the language. It’s a stranger in a strange land.

Everything is so obviously foreign. Effectively the terminator was just born.

At no point do I believe Arnold is human. Even in the end, when the flesh is stripped off, does it become more machine-like. Throughout the movie it’s always been pure mechanism.

It’s not even clear that T1’s terminator is conscious. It’s just like those assembly robots at the end, but with a more complex goal.

Robert Patrick (actor playing the evil T1000) in T2 is great and all, and is attempting to “feel weird”. He does a great job in T2. But he’s never one realization of “oh robots exists” away from being spotted. Arnold (in T1) is. Even if you knew killer robots existed you’d never suspect the T1000. He’s gone beyond the uncanny valley, which to me makes him less horrifying, not more.

This goes for every other cyborg in movies or TV series. Except for T1 I can’t think of a single performance where someone actually went into the uncanny valley.

In T2 (extra scene in Special Edition, the best edition) they flip Arnold’s switch to learning mode. Or should I say “anthropomorphize mode”? Except even before that Arnold is showing human traits. Such as the look he gives Sarah in the car when she says “So what’s your story?”.

The terminator in T1 is rotting during the movie. It is flesh slapped on top of metal skeleton. As the flesh becomes injured it gets in the way, and is just cut off and discarded (wrist, eye). With more “mere flesh wounds” it starts smelling, attracting flies laying eggs in the rotting flesh. It smells like a dead body, and it’s even attracting attention of the cleaner at the hotel (“you got a dead cat in there or what?”).

It doesn’t really matter that the terminator can’t last forever. It has a mission, and only needs to survive long enough to complete it. It doesn’t really matter that the flesh is coming off. That just makes it a worse disguise.

Likely the flesh would not have lasted long even uninjured. It’s never needed to. In the future it only needed to get through the door to start shooting. And once the shooting starts it’ll inevitably start showing metal, losing its disguise.

In the other movies they explicitly say that terminators heal. But there’s no reason for it.

The sunglasses in T1 are utilitarian. Its red eye is exposed otherwise. A red glowing eye is a risk to the mission, and a logical workaround is sunglasses. In the other movies it’s… fashion?

T3… sigh. The reason the T1 terminator says “get out”, and also why T1000 says it in the helicopter, is that it’s faster than throwing them out. In T3 it’s a catchphrase, and completely needlessly Arnold says “get off”, while throwing a guy of a motorcycle.

Even in T2 the “that’s a nice bike” from T2 anthropomorphizes too much. The Terminator wiki has a page on emotions. There’s not one example from T1.

There’s no reason a terminator would say anything while tossing someone. He says nothing in T1 while killing Matt and Ginger.

The T1 terminator doesn’t die screaming, thrashing around. It doesn’t wag its finger in its moment of apparent triumpf. It communicates nothing except to further its mission. Nothing verbally and no body language.

T1’s terminator can’t be bargained with and can’t be reasoned with. I feel like the T1000 kinda can. Its mouth is open in pain and surprise from the final shot. It seems like more than mere mechanism.

T1’s terminator is as soulless and mechanistic as the pneumatic press that ultimately destroys it.

The time travel theory

The T1 story is a perfect time loop. None of the other movies have that. In T1 the machines build a time machine in order to change the past, not knowing that it’s not possible. They send a terminator back, thus creating themselves. Them sending the terminator back causes humans to send Reese back, creating their enemy.

The script even says that as Sarah gets put in the ambulance the camera tilts up and sees “Cyber Dynamics Corporation”. So you can fill in the same time loop “breakthrough” that they explain in T2.

The time machine is created because of the war, and both sides of the war are created by the use of the time machine.

Even if the machines realized this they have no choice but to go through with it. It’s a double grandfather paradox, if they don’t.

It reminds me of my favourite explanation of why you can’t go back in time and kill Hitler: “because that’s not what happened”.

Reese is light on details, but it does include the fact that Sarah went into hiding before the war. Meaning he was always there to warn her about it. (also, you know, John would not exist without him)

The only thing that speaks against this is that Reese says “one possible future, from your point of view”, but his next sentence is “I don’t know tech stuff”. Sounds like neither human nor machine understands that it’s a closed time loop. That makes sense because the humans need to fulfill their destiny too. They can’t not send Reese back, because he was sent back.

T2 can in this regard, at best, be seen as an eventually converging time loop. It’s not as “clean”.

Examples of why T2 can’t be a neat time loop:

They actually do stop Miles Dyson from “inventing” the chip.
Wouldn’t John have told Reese about the second terminator?
The quote from Reese is different between T1 and T2. Though I’ll excuse that as an artistic choice.

But even ignoring that, it’s clear that T1 is the time loop. T1 has the [fridge horror][fridge] (or is it Fridge Brilliance?), T2 does not.

T3 is a mess, of course. But I do like the ending. It’s compatible with the T2 theory of time travel, even if not T1. So are the others.

An eventually converging time loop raises the question of how it started. How many loops have already happened before we reach the events of T1? How many more loops before the events in T2?

At what point does the converging loop stabilize? Clearly not with T2, since Dark Fate has HUGE changes.

Maybe it will never stabilize? Each way around the causality loop causes another butterfly to flap its wings, eventually changing major plot points, or even the name of Skynet.

How did the loop start? And if the loop never converges, then why should we care about this particular iteration of the loop?

Only T1 has the proper neat horrifying closed time loop.

Actions make sense

Sarah does all the right things. She goes to a public space, calls the police, and warns her friend. She reluctantly goes along with the guy who at least is trying to get her away from someone who’s trying to kill her. She probably would still prefer the police, though. When she is taken in by the police she has no reason to not feel safe in a police station.

In the police station, from Sarah’s point of view she’s given a plausible enough explanation, assured that there are over 30 police officers in the building protecting her. She feels safe enough to to go sleep.

She accepts the explanation that the attacker was wearing a bullet proof vest, but no way will that be enough against 10-11 shotgun hits (I counted 11 shots, one may have missed). The police also suggest, and she accepts, that a human can punch through a car windscreen. It’s barely believable that someone can punch through a side window, and clearly not through a windscreen.

But they all believe it, because the alternative is preposterous.

When she wakes up and comes out of the room she’s in, the safe police station she entered has turned into a nightmare on fire. Dead police everywhere, furniture blown up and on fire, and all defenses failed, leaving escape as the only option.

She does tell her mother where they are, but there was no reason for her to know that the terminator would be able to impersonate a voice. And her character is 19, after all. But if she hadn’t then the movie would be pretty short.

T2

Other than not being as good as T1, it’s pretty great.

T3

Ugh, this movie makes no sense. It’s just one big plot hole, that it tries to fill with references and memberberries.

It takes a couple of viewings to understand just how bad this is. I’d tear it down, but for a complete list I’d have to watch it again. No thanks.

There’s even a fan edit, I believe called T-HOPE Edition, that cuts away as much crap as possible. It’s better, but still terrible. And you can tell that there are missing pieces. It’s pieces of shit, so it’s still better to cut them out.

The less said about this one the better.

The only good thing about T3 is the commentary track. Arnold is unintentionally hilarious.

Terminator Salvation

A brave attempt at expanding the Terminator universe. Yes, there are comics, books, and TV series, but for Terminator the movies are the center.

There isn’t much time travelling here, but I dislike the anthropomorphizing. Skynet does monologing, ffs!

But it’s not so much a Terminator movies as it is a movie in the Terminator universe. And that’s fine.

Genisys

Terminator Genisys is basically non-canonical fan fiction. I can enjoy it, but it’s absolutely silly and makes no sense at all.

It goes so over the top as to become a farce, which is why it can be enjoyable and T3 just isn’t.

Dark Fate

This clearly should have been Terminator 3. Nowhere near as good as T1 and T2, of course, but a fine sequel.

It doubles down on T2’s (and especially T3’s) eventually converging time loop theory.

And again Arnold gets anthropomorphized. He’s not really a terminator in this movie, but instead a really strong autistic guy.

The evil terminator is pretty “correct”, though.

The three best Terminator movies ranked.

T1, T2, and neo-T3 (Dark Fate). In that order.

Perfect movie?

T1 does have flaws. I don’t want to list them, since once you’ve seen them, you can’t unsee them. And I want you to enjoy The Terminator for the masterpiece that it is.

Side note: Harlan’s claims are bullshit

I’ve seen both “Demon with the glass hand” and “Soldier”, and they are absolutely not The Terminator. Their only similarity is “a time traveller comes”. From when? The future, obviously. There’s a war? Ok, that also matches. Both sides show up? Well, for a movie you’d kinda have to. There are lasers in the future? Of course there would be.

There the similarities end. There’s no point in listing the differences, because they’re just completely different.

↧

Can ChatGPT debug my code? Can it write better code?

December 16, 2022, 4:00 pm

≫ Next: Better pipes

≪ Previous: Terminator 1 is the best Terminator movie

Everyone’s talking about it, so why should I not add something too.

Interview question

ChatGPT is pretty amazing. I asked it an interview question, and it just plain solved it. Well, there was one real world imperfection. I asked it to fix that problem, and then it went off the rails. The more I hinted, the worse the code got.

I won’t give the details here, because I want to keep using the question / topic.

Finding bugs

So I wondered if it could find bugs. First an easy one:

Q:

What's the problem with this C++ code:
int* foo() {
  auto ret = new int;
  int r2 = *ret * 2;
  return &r2;
}

A:
There are several problems with the code you provided:

1. The function foo is returning a pointer to a local variable r2,
   which is not safe because the memory for r2 is deallocated when the
   function returns. This can cause undefined behavior when you try to
   access the returned pointer after the function has returned.
2. The function foo is allocating memory for an integer with new int,
   but it is not deallocating that memory before the function
   returns. This can lead to a memory leak, as the allocated memory
   will remain allocated until the program ends.
3. The function does not have a way to handle the case where the
   memory allocation with new int fails. In this case, new int will
   throw an exception, which will not be caught by the function. This
   can cause the program to terminate unexpectedly.

To fix these problems, you could consider the following changes:

1. Instead of returning a pointer to a local variable, you could
   return a pointer to the dynamically allocated memory. This will
   ensure that the returned pointer remains valid after the function
   returns.
2. You should deallocate the memory that was allocated with new int
   when it is no longer needed. This can be done using the delete
   operator, like this: delete ret;.
3. You should catch any exceptions that might be thrown by new int and
   handle them appropriately. This can be done using a try-catch
   block, like this:

try {
  auto ret = new int;
  // ...
} catch (std::bad_alloc& e) {
  // Handle the exception here
}

That’s… pretty amazing. I’d argue with the last point, since an overall strategy for OOM is not something for just one little function to solve, but not bad.

How about a tricky one?

Let’s take my safe integer conversion tool from a previous post.

Q:

Is there anything wrong with this C++ code:

template<typename To, typename From>
std::optional<To> cast_int(const From from)
{
  // If casting from signed to unsigned then reject negative inputs.
  if (std::is_signed_v<From> && std::is_unsigned_v<To> && from < 0) {
    return {};
  }

  const To to = static_cast<To>(from);

  // If casting from unsigned to signed then the result must be positive.
  if (std::is_unsigned_v<From> && std::is_signed_v<To> && to < 0) {
    return {};
  }

  // If the number fits then it'll be the same number when cast back.
  if (from != static_cast<From>(to)) {
    return {};
  }
  return to;
}

A:
There are a few potential problems with the code you provided:

1. The function uses the std::optional type to represent a value that
   may or may not be present. However, std::optional is not available
   until C++17, so the code will not compile with earlier versions of
   C++.
2. The function uses the std::is_signed_v and std::is_unsigned_v type
   traits to check the signedness and unsignedness of the From and To
   template arguments. However, these type traits are not available
   until C++20, so the code will not compile with earlier versions of
   C++.
3. The function checks whether the value of from is negative before
   casting it to an unsigned type. However, this check will not catch
   all cases where the value of from is too large to fit in the target
   type. For example, if From is int and To is unsigned short, the
   value of from could be 65537, which is too large to fit in an
   unsigned short, but it is not negative.

To fix these problems, you could consider the following changes:

1. If you need to support earlier versions of C++, you could use a
   different mechanism to represent optional values. For example, you
   could use a std::pair<To, bool> where the second element is a
   boolean indicating whether the value is present or not.
2. If you need to support earlier versions of C++, you could use a
   different mechanism to check the signedness and unsignedness of the
   template arguments. For example, you could use the std::is_signed
   and std::is_unsigned type traits, which are available in C++11 and
   later.
3. To catch all cases where the value of from is too large to fit in
   the target type, you could use the std::numeric_limits class to
   check the maximum and minimum values that can be represented by the
   target type. For example:

if (from > std::numeric_limits<To>::max()) {
  return {};
}

if (from < std::numeric_limits<To>::min()) {
  return {};
}

Again, I’m impressed. It’s wrong, but I’m impressed. The first two comments are technically correct, but they are not actually problems. Or at least I would not call them problems.

The third one, though, is wrong. It’ll fail to convert from unsigned 0 to int.

template <typename To, typename From>
std::optional<To> cast_int(const From from)
{
  if (from > std::numeric_limits<To>::max()) {
    std::cerr << "Too big: "<< from << "\n";
    return {};
  }

  if (from < std::numeric_limits<To>::min()) {
    std::cerr << "Too small: "<< from
              << "< "<< std::numeric_limits<To>::min() << "\n";
    return {};
  }
  return static_cast<To>(from);
}
int main()
{
  std::cout << cast_int<int>(unsigned(0)) << "\n";
}

This produces output:

Too small: 0 < -2147483648
0

Let’s follow up:

Q: The third suggestion would fail a conversion into type int, because
   of integer promotion rules.

A:
You are correct that the third suggestion I provided would not catch
all cases where the value of from is too large to fit in the target
type.

Integer promotion is a form of type coercion that occurs in C++ when
an expression involving integers of small types is used in a context
where an integer of a larger type is expected. During integer
promotion, the value of the small integer is first converted to a
larger integer type, and then the operation is performed using the
larger type. This means that if from is an integer of a small type
(e.g. char, short) and To is a larger integer type (e.g. int, long
long), then the value of from will be promoted to the larger type
before the comparison with std::numeric_limits<To>::max() or
std::numeric_limits<To>::min() is performed.

To catch all cases where the value of from is too large to fit in the
target type, you can use a combination of integer promotion and the
std::numeric_limits class. Here is an example of how you can do this:

if (from > static_cast<decltype(from)>(std::numeric_limits<To>::max())) {
  return {};
}

if (from < static_cast<decltype(from)>(std::numeric_limits<To>::min())) {
  return {};
}

This code first converts the maximum and minimum values of the target
type to the type of from using static_cast, and then compares the
value of from to these converted values. This ensures that the
comparison is performed using the correct types, and that integer
promotion is correctly applied.

And like with my interview followups, this is where it gets off the rails.

First of all the type of from is already available, it’s From. Second, casting min and max of the type To into From is the root problem, and you can’t just cast and expect it to work.

I guess ChatGPT has not read my blog post. :-)

Q: You can't just cast from To to From. If the signedness doesn't
   match then it may wrap at best, or undefined behavior at worst.

A:
You are correct that simply casting from To to From is not sufficient
to properly check whether the value of from is within the range of the
target type To.

If the signedness of From and To do not match, then a simple cast from
To to From can lead to undefined behavior, as you mentioned. This is
because casting a negative value to an unsigned type is undefined
behavior in C++.

To properly check whether the value of from is within the range of the
target type To, you can use a combination of integer promotion and the
std::numeric_limits class, as I suggested in my previous
response. This will ensure that the comparison is performed using the
correct types, and that integer promotion is correctly applied.

Alternatively, you can use the std::is_signed_v and std::is_unsigned_v
type traits to check the signedness and unsignedness of From and To,
and use different logic depending on the signedness of the types. For
example:

if constexpr (std::is_signed_v<From> == std::is_signed_v<To>) {
  // Both types have the same signedness
  if (from > std::numeric_limits<To>::max()) {
    return {};
  }

  if (from < std::numeric_limits<To>::min()) {
    return {};
  }
} else if constexpr (std::is_signed_v<From>) {
  // From is signed, To is unsigned
  if (from < 0) {
    return {};
  }

  if (from > static_cast<decltype(from)>(std::numeric_limits<To>::max())) {
    return {};
  }
} else {
  // From is unsigned, To is signed
  if (from > static_cast<decltype(from)>(std::numeric_limits<To>::max())) {
    return {};
  }
}

This code uses different logic depending on the signedness of the
types, which ensures that the correct comparisons are performed and
that integer promotion is correctly applied.

Wow, that is… almost correct. The static casts to decltypes need to be removed, to instead rely on integer promotions. Otherwise e.g. converting 0 in a signed 32bit int into an unsigned 64bit int will fail. The unsigned 64bit max will be converted into a signed 32bit int, and get the bit pattern for -1.

But damn, after that I think it’s correct.

Q: The decltype casts cause a bug with that solution.

You are correct that the use of decltype in the code I provided can
cause a bug when the value of from is too large to fit in the target
type To.

The problem occurs because decltype(from) is the type of from, which
may be larger than the target type To. This means that when the
maximum or minimum value of To is converted to the type of from, the
result may be larger than the maximum or minimum value of To,
respectively. This can cause the comparison to always evaluate to
false, even if the value of from is within the range of To.

To fix this problem, you can use a different mechanism to ensure that
the comparison is performed using the correct types. One option is to
use a helper function that compares the value of from to the maximum
and minimum values of To using the correct types. Here is an example
of how you could do this:

template <typename From, typename To>
bool in_range(From from) {
  using common_type = std::common_type_t<From, To>;
  return static_cast<common_type>(from) >= static_cast<common_type>(std::numeric_limits<To>::min()) &&
         static_cast<common_type>(from) <= static_cast<common_type>(std::numeric_limits<To>::max());
}

template<typename To, typename From>
std::optional<To> cast_int(const From from)
{
  // If casting from signed to unsigned then reject negative inputs.
  if (std::is_signed_v<From> && std::is_unsigned_v<To> && from < 0) {
    return {};
  }

  const To to = static_cast<To>(from);

  // If casting from unsigned to signed then the result must be positive.
  if (std::is_unsigned_v<From> && std::is_signed_v<To> && to < 0) {
    return {};
  }

  // If the number fits then it'll be the same number when cast back.
  if (!in_range<From, To>(from)) {
    return {};
  }
  return to;
}

This code uses the std::common_type type trait to determine the common
type of From and To, and then uses static_cast to convert both from
and the minimum and maximum values of To to this common type. This
ensures that the comparison is performed using the correct types, and
that integer promotion is correctly applied.

Again, I’m impressed. But again, it’s wrong! common_type for int32_t and uint32_t will (just like integer promotion rules) become uint32_t, and the comparison will reject good input such as this:

std::cout << cast_int<int32_t>(uint32_t(0)).value() << "\n";

Conclusion

ChatGPT is really impressive. I’m not at risk of getting replaced yet, but for code challenges I think it does a much better job than most programmers. And certainly much faster.

But if it gets it slightly wrong then it seems unable to fix it.

For job security in the future as this gets better you should probably focus on turning business value into what to solve, not perfecting implementation. Because in the future the implementation will just be telling the computer to “add the ability for customers to upload their own profile photos via the web or the API, deduplicate and store them in GCS/S3. Write a blog post about it”.

In writing this I’ve gone back and forth about if I’m in danger in the next few years. But considering the amount of time I spend in meetings and on problem anticipation as opposed to implementation, maybe I have that to lean back on?

I guess until my employer trains on our code base and our specific customer needs. Then I’m fucked.

↧

Better pipes

December 26, 2022, 4:00 pm

≫ Next: Learning Rust, assisted by ChatGPT

≪ Previous: Can ChatGPT debug my code? Can it write better code?

In my blog post The uselessness of bash I made a tool to improve pipes in shell, to assemble a better pipeline.

It solves the problem, but it’s a bit too different, with its own language.

While complaining with some people at work that one of the main features of shell (the pipe operator) is broken, someone joked that it should be replaced by a protobuf based protocol.

But on second thought it’s not really a joke.

How about instead of this:

$ goodpipe <<EOF
[
  ["gsutil", "cat", "gs://example/input-unsorted.txt"],
  ["sort", "-S300M", "-n"],
  ["gzip", "-9"],
  ["gsutil", "cp", "-", "gs://example/input-sorted-numerically.txt.gz"]
]
EOF

how about this:

$ wp -o gsutil cat gs://example/input-unsorted.txt \
  | wp -io sort -S300M -n \
  | wp -io gzip -9 \
  | wp -i gsutil cp - gs://example/input-sorted-numerically.txt.gz

It doesn’t use protobufs, but a simpler regular protocol. This in order to avoid well known bugs types. Before implementing any protocol also see formal theory and science of insecurity.

First I hacked it together in Go, but I think the main implementation I’ll maintain is the one I made while porting it to Rust, as a way to learn Rust. The two implementations are compatible with each other.

I’ll write up my thoughts on Rust in a future blog post, but the summary is that it’s great, and ChatGPT was a helpful mentor in learning a new language.

↧

Learning Rust, assisted by ChatGPT

December 27, 2022, 4:00 pm

≫ Next: VisionFive 2 quickstart

≪ Previous: Better pipes

I finally got around to learn Rust. Well, starting to.

It’s amazing.

I’m comparing this to learning Basic, C, C++, Erlang, Fortran, Go, Javascript, Pascal, Perl, PHP, Prolog and Python. I wouldn’t say I know all these languages well, but I do know C, C++, Go, and Python pretty well.

I can critique all these languages, but I’ve not found anything frustrating or stupid in Rust yet.

Rust is like taking C++, but all the tricky parts are now the default, and checked at compile time.

Copies

With C++11 we got move semantics, so we have to carefully disable the (usually) default-created copy constructor and copy assignments, or if you have to allow copies then every single use of the type has to be very careful to not trigger copies unless absolutely necessary.

And you have to consider exactly when RVO kicks in. And even the best of us will sometimes get it wrong, especially with refactors. E.g. who would have thought that adding this optimization would suddenly trigger a copy of the potentially very heavy Obj object, a copy that was not there before?

--- before.cc   2022-12-28 10:32:50.969273274 +0000
+++ after.cc    2022-12-28 10:32:50.969273274 +0000
@@ -1,5 +1,8 @@
 Obj make_string(int i)
 {
+    if (!i) [[unlikely]] {
+        return "zero as input to function";
+    }
     Obj ret = foo();
     return ret;
  }

But it does:

// Before
make_string(int):
        pushq   %rbx
        movq    %rdi, %rbx
        call    foo()
        movq    %rbx, %rax
        popq    %rbx
        ret

// After
.LC0:
        .string "zero as input to function"
make_string(int):
        pushq   %rbx
        movq    %rdi, %rbx
        subq    $16, %rsp
        testl   %esi, %esi
        je      .L8
        leaq    15(%rsp), %rdi
        call    foo()
        leaq    15(%rsp), %rsi
        movq    %rbx, %rdi
        call    Obj::Obj(Obj const&) // <---- New copy in the common case.
        addq    $16, %rsp
        movq    %rbx, %rax
        popq    %rbx
        ret
.L8:
        movl    $.LC0, %esi
        call    Obj::Obj(char const*) // <--- Expected construction for optimization.
        addq    $16, %rsp
        movq    %rbx, %rax
        popq    %rbx
        ret

Sure, this is not as bad if Obj is movable, but sometimes it isn’t. Sometimes that fact is forgotten. Especially since the move may be disabled in the future due to an added member variable to a transitive dependency not being movable. Hell, the triggering change may actually be:

--- before.cc   2022-12-28 11:00:02.196662142 +0000
+++ after.cc    2022-12-28 11:00:02.196662142 +0000
@@ -7,6 +7,9 @@
     Obj(Obj&&) = default;
     Obj& operator=(const Obj&);
     Obj& operator=(Obj&&) = default;
+
+private:
+    std::array<char, 4096> arr_;
};

It would be fine if make_string() did not optimize for 0 (because RVO). It would be fine if Obj were movable (because move is still cheap, probably), but the combination could blow everything up through a no-copy cascade.

Rust just does the right thing. A copy has to be explicitly cloned.

Sure, you could do this in C++. You could have all your types disable copy assignment and copy constructor, and create a .clone() member function, creating an rvalue that you then use move constructor and move assignment from. But even then you’ll sometimes accidentally copy due to some code you didn’t write, like std::vector.

With Rust there’s no need to be clever with std::move, std::forward, decltype(auto), and other fancy stuff. The only downside is that you won’t feel as proud of yourself for getting it right.

Aliases

Aliases can be tricky to get right. Rust enforces it at compile time. Awesome.

Const

The C++ equivalent of const qualifier is the default, like god intended. C++ (including my code) is full of things that should be const, but you just can’t bother. Like value function args. You know you should do int foo(const int a, const std::string_view b) { … }, but it’s easy to skip, and hard to insist on in code reviews.

I’m pretty strict about “const all the things”, but I’ll admit to not always using it for value function args.

And it can’t even be used for for loops. E.g. for (const int i = 0; i < 10; i++) { does not compile. In Rust i is immutable in for i in 1..10 {.

Ownership

The object ownership is incredibly thought through. Rust really takes in the 50 years of C/C++ of sloppy manual ownership and… solves it.

In large code bases you have to be careful to have a good ownership model, and maintain it. C++11 makes it easier to have a right way. E.g. std::unique_ptr<> means owned, raw pointer means not owned. But those are conventions only.

GC’d languages like Go and Java just give up on thinking about ownership, and it easily becomes a big mess.

In C++ some people give up and use std::shared_ptr<>, which makes as much of a mess as it does in Go and Java.

Ownership anarchy in Go also makes slices… surprising. Basically Go is a bunch of puzzle pieces that don’t fit together, and ownership is just one place this bites you.

Unicode

C++ just has not solved this. At least it has the honestly to basically call it out of scope. Python3 seems to solve it nicely with str. Why… just why did Go half-ass this? string and []byte is kind of the same, but iterate differently, and indexing is wrong. Go strings are like a database without a schema. It’s weak typing pretending to be strong typing. []rune? Really?

A thought through language

The one strength Go has is the standard library. It does most things, in a pretty good API. But like with other aspects of Go, for every year that passes it becomes more obvious that it has a shitty foundation. E.g. Context is bolted on. It’s great as a feature, but clearly bolted on. Arch-specific code? Bolted on. Generics? Bolted on.

Learning Rust makes everything else about Go seem like a missed opportunity. This is what Go could have been. Something actually thought through, and good.

Like there’s no excuse for having null pointers in a language designed this century. Go has two types of nil (the second being an interface that has a type but is a nil pointer). And pretending to design for parallelism, but not solving data races is a pretty big WTF.

Go is clearly a design from someone who’s never looked at anything anybody else has been doing for the last 30 years. It doesn’t matter how smart you are, nobody is smarter than everybody.

Go feels like a language “designed” by someone who doesn’t care about making the best language for everyone, but just one that works according to the way they happen to already think.

I’ve complained about Go before, but that was 9 years ago and I now find those complaints superficial. Go’s problem are much deeper. I still stand by the complaints, though.

C++ was designed in a different time, with different requirements. C compatibility simply prevented these modern innovations. C++ is C done right. Rust is programming done right.

I didn’t mean for this to be a rant about other languages, but the way to talk about one language is to compare it to others.

So far at least Rust feels like coming home.

ChatGPT

The way I’ve been learning Rust is to read the rust programming language while writing the wp tool, and asking followup questions to ChatGPT.

ChatGPT has been a very helpful mentor. Like with everything else it’s mostly correct, but sometimes gives you something plausible looking but wrong. E.g. it tells me that I can close a child process with:

child.stdin.as_mut().unwrap().close().expect("failed to close stdin");

but you can’t (maybe I’m missing something, but ChatGPT’s example code does not compile). And it takes me a while to even figure out that it’s wrong. Google to the rescue, finding this stackoverflow question.

But it’s been very helpful for other questions. Like example syntax for struct lifetime, or convert from Vec<u8> to [u8]. Or if I kind of remember something from the book, but not exactly. ChatGPT has usually gotten it right, and is faster than finding the right chapter.

It’s like having a private tutor, that only lies like 10% of the time. Like here:

If I were in school now I’d use ChatGPT as a tutor for any subject. It’s a force multiplier for learning things.

And yes, I’d probably use it as a co-author, too. But while it creates risk of cheating, we should not dismiss it as a learning tool. In this aspect it’s the next generation’s Wikipedia.

↧

VisionFive 2 quickstart

January 24, 2023, 4:00 pm

≫ Next: RISC-V optimization and -mtune

≪ Previous: Learning Rust, assisted by ChatGPT

RISC-V small computer

For a long time I’ve wanted something Raspberry-pi-like but with RISC-V. And finally there is one, and a defensible price! Especially with the Raspberry Pi 4 shortage this seemed like a good idea.

This post is my first impressions and setup steps.

It’s just like when I was a kid!

When I was in my late teens I was playing with different architectures, mostly using discarded university computers. It was fun to have such different types of computers. Back then it was SPARC (And UltraSparc), Alpha, and x86. Maybe access to some HPPA. I even had a MIPS (SGI Indigo 2).

Nowadays instead of SPARC, Alpha, and x86 it’s ARM, RISC-V, and x64.

Luckily they can be smaller nowadays. Before I left home my room had more towers of computers than it had furniture. In my first flat I had a full size rack!

Write SD card

pv starfive-jh7110-VF2_515_v2.5.0-69-minimal-desktop.img \
   | sudo dd of=/dev/sda

Repartition SD card

We need to repartition, because the boot partition is way too small. It only fits one kernel/initrd, which became a problem I ran into.

Unfortunately gparted doesn’t seem to work on disk images. It wants access to the individual partitions. I’m sure I could mess with partitioned loop devices, but it tends to get messy. So I just worked against the SD card directly.

gparted /dev/sda
Move /dev/sda3 (root) one gig to the right.
Expand /dev/sda2 (boot) to fill the new space.
Click green button to apply, and wait.

Resizing vfat (boot filesystem) apparently doesn’t work, so we’ll have to do that manually.

sudo mount /dev/sda2 /mnt/tmp
cd /mnt/tmp
tar cf ~/tmp/bo.tar .
cd ~/tmp
umount /mnt/tmp
sudo mkfs.vfat -F 16 /dev/sda2
sudo mount /dev/sda2 /mnt/tmp
cd /mnt/tmp
sudo tar xf ~/tmp/bo.tar
cd
sudo umount /mnt/tmp

I tried fatresize -s max /dev/sda2, but it didn’t seem to actually do anything.

Maybe I don’t actually need a gig for /boot, once the correct kernel upgrades happen (see below), but 100MB is too small.

Boot up

I’m connected on console using the same cable I use for Raspberry Pi console access. Just make sure to not connect the 5V lead. Only ground, RX, and TX. Just like with Raspberry Pi.

I add a user, enable sudo for that user, and SSH in.

adduser thomas
echo'thomas ALL=(ALL) NOPASSWD: ALL'> /dev/sudoers.d/thomas

Seems the hostname is not in /etc/hosts, so I added it there to remove warnings.

I also have an HDMI video capture device, something like this one, and interestingly even the minimal image seems to have an X server.

It seems that the best way to get clean low latency output from this capture card is:

cvlc v4l2:///dev/video4:chroma=mjpg --live-caching=0

Upgrade firmware

In a link on the forum I found on reddit it says that I should update SPL and U-Boot. So let’s do that.

flashcp -v u-boot-spl.bin.normal.out /dev/mtd0
flashcp -v visionfive2_fw_payload.img /dev/mtd1

Running

$ cat /proc/cpuinfo
processor       : 0
hart            : 1
isa             : rv64imafdc
mmu             : sv39
uarch           : sifive,u74-mc

processor       : 1
hart            : 2
isa             : rv64imafdc
mmu             : sv39
uarch           : sifive,u74-mc

processor       : 2
hart            : 3
isa             : rv64imafdc
mmu             : sv39
uarch           : sifive,u74-mc

processor       : 3
hart            : 4
isa             : rv64imafdc
mmu             : sv39
uarch           : sifive,u74-mc
$ free -m
               total        used        free      shared  buff/cache   available
Mem:            7927         326        7078           6         522        7509
Swap:              0           0           0

I built some stuff in Go, C, C++, and Rust. It seems to “just work”. It’s a Linux system, I know this.

Wireguard and the kernel

This is where I ran into trouble my first attempt, since apt install wireguard will install a new kernel into /boot, which won’t fit.

It also seems to use the wrong directory structure. The original kernel (5.15.0) is in /boot/boot/ for some reason, but the new kernel (5.18.0) gets put into /boot.

It’s also not compressed, for some reason, and the initrd is huge!

-rwxr-xr-x 1 root root 7.7M Dec 21 08:56 /boot/boot/vmlinuz-5.15.0-starfive
-rwxr-xr-x 1 root root  20M Jun  6  2022 /boot/vmlinux-5.18.0-1-riscv64
-rwxr-xr-x 1 root root 9.3M Dec 21 09:31 /boot/boot/initrd.img-5.15.0-starfive
-rwxr-xr-x 1 root root  60M Jan 25 19:38 /boot/initrd.img-5.18.0-1-riscv64

Maybe I should just stay clear of that kernel, if it’s not customized for starfive? Will it boot? Not tried yet.

The default kernel does not support wireguard:

$ sudo wg-quick up ./foowg.conf
[#] ip link add foowg type wireguard
Error: Unknown device type.
Unable to access interface: Protocol not supported
[#] ip link delete dev foowg
Cannot find device "foowg"

Weird things

There’s stuff in /usr/local/, including an old version of openssl? Wat? That doesn’t seem right. Let’s just move that out of the way:

mv /usr/local{,.orig}
mkdir /usr/local

Future

I have an M.2 NVMe drive on the way, that aside from much more space, will also presumably be faster and more reliable.

The USB wifi dongle that came with the sifive doesn’t seem to be detected. Maybe I need to install some driver.

There’s plenty of benchmarking and such left to do, as well. But before I do that maybe I’ll need active cooling? I should read up on this.

So far though I can say that it’s fast enough for lab purposes. I don’t really care if it’s the best processing per watt. That’s not what it’s for, for me. But I would be disappointed if it were annoyingly slow to use. It’s not.

It doesn’t have AES instructions, so be deliberate in how you compare this benchmark to anything, but here’s openssl speed:

Doing sha256 for 3s on 16 size blocks: 843430 sha256's in 3.00s
Doing sha256 for 3s on 64 size blocks: 561138 sha256's in 2.99s
Doing sha256 for 3s on 256 size blocks: 279993 sha256's in 3.00s
Doing sha256 for 3s on 1024 size blocks: 93370 sha256's in 3.00s
Doing sha256 for 3s on 8192 size blocks: 12927 sha256's in 3.00s
Doing sha256 for 3s on 16384 size blocks: 6490 sha256's in 2.99s
Doing sha512 for 3s on 16 size blocks: 822824 sha512's in 3.00s
Doing sha512 for 3s on 64 size blocks: 820624 sha512's in 3.00s
Doing sha512 for 3s on 256 size blocks: 398446 sha512's in 2.99s
Doing sha512 for 3s on 1024 size blocks: 155672 sha512's in 3.00s
Doing sha512 for 3s on 8192 size blocks: 23303 sha512's in 3.00s
Doing sha512 for 3s on 16384 size blocks: 11786 sha512's in 3.00s
Doing aes-256-cbc for 3s on 16 size blocks: 3385626 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 64 size blocks: 1054135 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 256 size blocks: 281246 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 1024 size blocks: 71503 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 8192 size blocks: 8980 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 16384 size blocks: 4487 aes-256-cbc's in 3.00s

It has four cores, and each core then seems approximately one tenth the speed of a core on my laptop on SHA. Not bad. Don’t know if the above thermothrottled, though.

For future RISC-V hardware I’m looking forward to playing with the vector instructions. They are more truly different than the rest of the instruction set.

↧

RISC-V optimization and -mtune

April 16, 2023, 5:00 pm

≫ Next: Tracing function calls

≪ Previous: VisionFive 2 quickstart

I’ve been getting into RISC-V optimization recently. Partly because I got my StarFive VisionFive 2, and partly because unlike x86 the number of RISC-V instructions is so managable that I may actually have a chance at beating the compiler.

I’m optimizing the inner loops of GNURadio, or in other words the volk library. I’ve been getting up to a about a doubling of the speed compared to the compiled C code, depending on the function.

But it got me thinking how far I could tweak the compiler and its options, too.

Yes, I should have done this much sooner.

Many years ago now I built some data processing thing in C++, and thought it ran too slowly. Sure, I did a debug build, but how much slower could that be? Half speed? Nope. 20x slower.

Of course this time I never compared to a debug build, so don’t expect that kind of difference. Don’t expect that it’ll reach my hand optimized assembly either, imperfect as it may be.

The test code

This may look like a synthetic benchmark, in simplified C++:

complexvolk_32fc_x2_dot_prod_32fc_generic(constvector<complex>&in1,constvector<complex>&in2){complexres;for(unsignedinti=0;i<in1.size();i++){res+=in1[i]*in2[i];}returnret;}

The actual C code is a bit more complex, because it’s been unrolled. Whether that’s needed or not, or indeed makes things worse I don’t know.

This is not a contrived benchmark example I’m optimizing. A doubling in performance directly corresponds to a doubling of the signal bandwidth that can be handled by a FIR filter without needing to drop samples.

Ok, so in most cases I would use an FFT filter, whose performance is more dominated by the FFT and inverse FFT than by the volk parts.

Maybe optimizing the FFT library should be on my list…

GCC 12.2.0

First let’s see what good ol’ GCC will do:

Default options:

RUN_VOLK_TESTS: volk_32fc_x2_dot_prod_32fc(131071,1987)
sifive_u74 completed in 1938.08 ms  <--- Hand coded in assembly.
generic completed in 2718.17 ms    <--- C compiled version.
a_generic completed in 2700.02 ms
Best aligned arch: sifive_u74
Best unaligned arch: sifive_u74

Supposedly optimized for my CPU:

$ CXX=g++-12 \
CC=gcc-12 \
CXXFLAGS="-O3 -march=rv64gc -mtune=sifive-u74" \
CFLAGS="-O3 -march=rv64gc -mtune=sifive-u74" \
cmake -DCMAKE_INSTALL_PREFIX=$HOME/opt/volk ..
$ make -j4
$ make install
$ LD_LIBRARY_PATH=$HOME/opt/volk/lib  ~/opt/volk/bin/volk_profile -R 32fc_x2_dot_prod_32fc
RUN_VOLK_TESTS: volk_32fc_x2_dot_prod_32fc(131071,1987)
sifive_u74 completed in 2001.66 ms
generic completed in 2637.79 ms    <--- C
a_generic completed in 2630.31 ms
Best aligned arch: sifive_u74
Best unaligned arch: sifive_u74

3% better. Within the margin of error. Let’s just call it no difference.

clang 13.0.1

This is the clang that the VisionFive2 root came with.

Default options, except CC/CXX set to clang/clang++.

RUN_VOLK_TESTS: volk_32fc_x2_dot_prod_32fc(131071,1987)
sifive_u74 completed in 1996.03 ms
generic completed in 5559.72 ms   <-- Yikes!
a_generic completed in 5534.04 ms
Best aligned arch: sifive_u74
Best unaligned arch: sifive_u74

That’s less than half the speed of GCC!

But with tuning:

RUN_VOLK_TESTS: volk_32fc_x2_dot_prod_32fc(131071,1987)
sifive_u74 completed in 2013.59 ms
generic completed in 2987.79 ms  <---
a_generic completed in 2939.02 ms
Best aligned arch: sifive_u74
Best unaligned arch: sifive_u74

~10% worse than GCC. That’s much better than without -mtune, though. A huge difference, even though clang compiled for the same architecture.

Clang trunk

Commit: 73c258048e048b8dff0579b8621aa995aab408d4
Date: 2023-04-17

Build clang trunk

I followed the LLVM Getting Started guide.

$ git clone https://github.com/llvm/llvm-project.git
[…]
$ cd llvm-project
$ cmake \
  -S llvm \
  -B build \
  -G Ninja \
  -DLLVM_ENABLE_PROJECTS='clang;lld' \
  -DCMAKE_INSTALL_PREFIX=$HOME/opt/clang-trunk \
  -DCMAKE_BUILD_TYPE=Release \
  -DLLVM_PARALLEL_COMPILE_JOBS=$(nproc) \
  -DLLVM_PARALLEL_LINK_JOBS=$(nproc)
$ time ninja -C build -j$(nproc)
real    586m8.006s     <-- aka ~10h
user    2225m44.714s   <-- 37 CPU-hours
sys     95m51.166s
$ ninja -C build install

Yeah that took a while. Sure, I could have cross-compiled it, but I just started it in the morning before work, and it finished by the time I needed it.

Default settings:

RUN_VOLK_TESTS: volk_32fc_x2_dot_prod_32fc(131071,1987)
sifive_u74 completed in 1996.35 ms
generic completed in 5485.2 ms   <--- Yikes, still.
a_generic completed in 5473.75 ms
Best aligned arch: sifive_u74
Best unaligned arch: sifive_u74

Tuned:

RUN_VOLK_TESTS: volk_32fc_x2_dot_prod_32fc(131071,1987)
sifive_u74 completed in 1957.5 ms
generic completed in 2297.44 ms  <---- Yay!
a_generic completed in 2276.93 ms
Best aligned arch: sifive_u74
Best unaligned arch: sifive_u74

Oh wow, that’s pretty good. My hand coded assembly is just ~15% better. Clang trunk beat GCC 12.2.0.

But also: phew, I didn’t waste my time. It would have sucked to see clang beating the hand coded assembly.

But I am a bit surprised. The U74 is not a complex implementation. I’m surprised there’s anything to tune. But looking at the assembly, the untuned code is crap. Like, what’s this shit?

        fadd.s  fa3,fa3,fa1
        fneg.s  fa1,ft0
        fmul.s  fa1,ft1,fa1
        fmadd.s fa1,fa0,ft2,fa1
        fadd.s  fa4,fa4,fa1
        fmul.s  fa1,ft2,ft0
        fmadd.s fa1,fa0,ft1,fa1
        fadd.s  fa5,fa5,fa1

Additions? Negation? Non-fused multiplies? Of course that’s less efficient. More instructions, too:

   87 clang-trunk-default.txt
  243 clang-trunk-tuned.txt
   51 hand-written.txt

The tuned version also has a bunch of needless instructions. This function should inherently boil down to only fused multiply-adds (fmadd.s and fnmsub.s), for the floating point.

Summary

For my tiny sample here I can say that -mtune-ing for the sifive-u74 did nothing on GCC, but doubled the speed on clang.

Interesting.

Vector instructions

I’m really looking forward to the next generation of RISC-V hardware, that should have the vector instruction set. That’s likely going to give much more than doubling of CPU speed for DSP (digital signals processing).

Vector instructions are like SIMD, but more general. In short, SIMD instructions let you take four input elements, do the same operation on these four at the same time, then store all four back to memory. Then the next generation of SIMD increases that to eight. But because it’s a new set of instructions to do “eight at a time”, all software needs to be rewritten to take advantage of the newest SIMD.

Vector instructions instead let the programmer tell the CPU to “take as many as you can” at a time. As new CPUs get the ability to do more, they automatically do, without any need for software updates.

Clang trunk seems to be able to generate the instructions already, which is great! I expect some optimization to still be possible manually, but there will likely be diminishing returns.

I’ll start experimenting with these vector instructions as soon as I get hardware for it.

Comments also on this reddit post.

↧

Tracing function calls

April 17, 2023, 5:00 pm

≫ Next: Counting current live readers

≪ Previous: RISC-V optimization and -mtune

Sometimes you want to see functions of a library, as they’re called. I know of two ways of doing this.

Let’s have a super simple test program:

#include<iostream>
#include<unistd.h>
voidfunc1(){}voidfunc2(){}intmain(){std::cout<<"Hello world\n";func1();func2();func1();// Wait a bit for bpftrace to be able to aquire the function name.
// Not applicable for something that doesn't exist.
sleep(1);}

bpftrace

Start a bpftrace in one terminal, and run the program in another.

$ sudo bpftrace -e 'uprobe:./a.out:func* { print(func); }'
Attaching 2 probes...
func1()
func2()
func1()

GDB

$ gdb a.out
[…]
(gdb) rbreak func.*
[…]
(gdb) commands
Type commands for breakpoint(s) 1-3, one per line.
End with a line saying just "end".
>silent
>bt 1
>cont
>end
(gdb) r
Starting program: […]/a.out
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
#0  0x0000555555555215 in _GLOBAL__sub_I__Z5func1v ()
Hello world
#0  0x000055555555516d in func1() ()
#0  0x0000555555555174 in func2() ()
#0  0x000055555555516d in func1() ()
[Inferior 1 (process 424744) exited normally]
(gdb)

Which to use?

bpftrace is lower (but not zero) overhead, but in my experience sometimes just fails to trace some things. It also has a maximum number of functions it can trace.

If the function is in a library then you may need to add filtering on the pid, or you’ll get every process’s use of this library.

gdb to me seems to always just work. But it’s higher overhead.

↧

Counting current live readers

April 22, 2023, 5:00 pm

≫ Next: Linking statically, and glibc breaking userspace for fun

≪ Previous: Tracing function calls

Once upon a time it was popular to put a counter on your web page, to show how many people had visited the site before you. I thought it be more fun, and less bragging about how long the page has existed, if it just showed who’s reading it now.

As I mentioned in a previous post, I’m learning Rust. My teaching project has been to make this web widget that shows the current number of browsers that that have the page open.

You see this counter here on the blog in the top right.

The idea is pretty simple. Have some javascript open a websocket to a server, and stream down messages with the current count, as it changes. When a client connects or disconnects, inform all other clients of the new total.

This does mean that it needs to keep one TCP connection open per client, which may be too costly for some sites. Especially since I’m putting it behind an nginx, so the machine needs to keep 3x the state.

I’m not logging anything to disk, nor sharing anything between the clients except for the current count. It’s just an amusing publicly visible presence counter.

Actually, because it can go up or down, I should really call it a gauge instead of counter.

SNI multiplexing

I plan to move this to its own port, so that I can stop involving nginx. But I don’t have a dedicated IPv4 address to give to livecount, unfortunately. The downside to a nonstandard port, of course, is that there will be many corp firewalls and such that will block it.

Maybe I can find som nice SNI multiplexer solution, and pass the file descriptor on to livecount for its hostname, and nginx for everything else. But I also want it to work with encrypted SNI, so that may be tricky.

Oh well, that project is for another day.

Continued Rust thoughts

I really like how Rust works. Well, I guess I don’t know if everything is Rust, or also the Warp framework. But it gives an almost functional programming HTTP handler, where you don’t just call out to the extract the various parts of the request you’re interested in. Instead you tell the HTTP framework at callback register time what you want, and it shows up as a function argument to your handler.

The language will not tolerate sloppiness.

nginx config

location /livecount {
    proxy_pass  http://127.0.0.1:8000;
    proxy_http_version 1.1;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection $connection_upgrade;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}

Building & running

For reasons I go into elsewhere, I couldn’t just cargo build --release.

$ rustup target add x86_64-unknown-linux-musl
[…]
$ cargo build --release --target=x86_64-unknown-linux-musl
[…]
$ ./livecount -t sec -vvvvvv
[… lots of debug output]

↧

Linking statically, and glibc breaking userspace for fun

April 22, 2023, 5:00 pm

≫ Next: ZModem over amateur radio

≪ Previous: Counting current live readers

glibc is annoyingly eager to break userspace. You can’t just build something that only depends on libc and expect it to work on all linux systems of that architecture.

I don’t know why Linus Torvalds keeps insisting “we do not break userspace” as a policy for the kernel when libc seems to make that exact thing a hobby. And either way the userspace programs break.

Compiling static (including libc) is frowed upon, and has even had known breakages left unaddressed.

E.g. setlocale() had a strange bug where for new threads you had to first set the locale to the wrong locale, and then call it again to set it to the right one. Otherwise the new thread would be in a weird state where the local is wrong, but it thought it’s right, so won’t allow you to change it to what it thought it already was.

I can’t find the bug now (I ran into this around 2004-2005), but the official response was basically “well don’t compile statically, then”.

And DNS can be broken with static glibc. “a statically linked glibc can’t use NSS (Name Service Switch) modules from a different glibc version, so if you statically link glibc you can’t reliably resolve names with getaddrinfo”.

On other operating systems, like Solaris and OpenBSD, support for building statically has actually been withdrawn.

Here’s building some Rust against glibc, and trying to run it elsewhere:

build-machine$ cargo build --release
[…]
build-machine$ ldd target/release/livecount
        linux-vdso.so.1 (0x00007fff191a0000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fc8f6077000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fc8f5f90000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fc8f5d68000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fc8f6567000)
build-machine$ scp target/release/livecount prod-machine:
[…]
prod-machine$ ./livecount -h
l: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.33' not found (required by ./livecount)
./livecount: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.28' not found (required by ./livecount)
./livecount: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.32' not found (required by ./livecount)
./livecount: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.25' not found (required by ./livecount)
./livecount: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found (required by ./livecount)

It’s libc! Not like you need to maintain some complex C++ vtable ABI. How can you not be backwards compatible for libc?!

So what do you do?

glibc is broken when you compile statically, and ABI breaks regularly so you can’t link dynamically either. So what do you do?

You’ll need to install another libc, and link statically against that:

$ rustup target add x86_64-unknown-linux-musl
[…]
$ cargo build --release --target=x86_64-unknown-linux-musl
[…]
$ ldd target/x86_64-unknown-linux-musl/release/livecount
        statically linked

↧

ZModem over amateur radio

May 12, 2023, 5:00 pm

≫ Next: Multichannel fast file transfers over AX.25

≪ Previous: Linking statically, and glibc breaking userspace for fun

While I have built a file transfer protocol for AX.25, I also wanted to try old BBS era protocols. They’re more tested and standardized.

The easiest way is probably to take lrzsz and let it talk over AX.25 connected sockets. Ideally socat should be enough, but it seems that it does not support AX.25.

That’s actually fine, because ideally I want to run on my authenticated wrapped AX.25 (while encryption, obscuring the meaning, is banned, signatures are not).

So I had to make an adapter that bridges stdin/stdout to AX.25. Simple enough.

The setup is two Kenwood TH-D74s, set up the same way as before, in 9600bps.

D74 -> rfcomm -> kissattach -> axpipe -> socat -> lrzsz

The D74 is a great radio. It has the best text entry, menu system, and APRS support of any handheld radio I’ve seen, and it also has a built-in TNC (“modem”) that works both in 1200bps and 9600bps.

First, just for fun, let’s try YModem.

YModem

socat EXEC:'sz --ymodem axpipe.cc' EXEC:'./axpipe -r radio1 -l 0'
socat EXEC:'rz --ymodem -t 100'    EXEC:'./axpipe -r radio6 -s M6VMB-12 -c M0THC-1'

On ARM and RISC-V I have to specify a packet length (-l) of 0, to avoid setting AX25_PACLEN at all. Otherwise it tries to set it to a reasonable number, which doesn’t work on ARM and RISC-V because of some kernel bug.

Speed: best attempt 720 bps when transferring ~3kB, counting all the overhead such as handshakes.

I got some timeouts, so I thought I could fix that by adding -O, to turn off all timeout handling. But the YModem implementation doesn’t actually work without timeouts, because then the receiver never sends a single “ready to receive” message.

This seems like a bug in lrzsz. Surely it should send the first probe?

ZModem

The king of transfer protocols in the BBS era was ZModem.

The fastest I could make this go for a 3112 byte file was 2095 bps using:

# Sender
socat EXEC:'sz -b axpipe.cc' EXEC:'./axpipe -r radio1 -e -w 63 --t1=2 --t2=2'

# Receiver
socat EXEC:'rz -b -t 1000'   EXEC:'./axpipe -r radio6 -c M0THC-1 -e -w 63 --t1=2 --t2=2'

Ok, so 2095bps is not so much “king” for this use case. It’s a bit too chatty. Every “over” and ACK spends precious bps.

I don’t know that this is optimally tuned, but it’s as far as I’m going with the kernel implementation today, since the kernel is prone to panic if you use AX.25 sockets.

The -e and -w options only occasionally work on RISC-V. There appears to be some sort of race condition in the kernel. I’ve never seen it work on ARM, but it works approximately one in five times on RISC-V.

Clearly I should be using my own AX.25 implementation in user space, since the kernel one is so bad. But for this blog post I wanted a more well known implementation.

ZModem on ax25ms (my AX.25 implementation)

I’ve not even implemented extended sequence numbers and still with the same settings I get 2294bps on those same 3684 bytes. For a 100kB file I managed 4878bps. The limiting factor is now the extended sequence numbers, causing the ACK windows to only be just over one second long before it needs to leave room for an ACK.

My previous blog post and the README explain how to run the different parts, but the zmodem-specific parts are:

Sender:

export LD_PRELOAD="$HOME/opt/ax25ms/lib/libpreload.so"
export PATH="$PATH:$HOME/opt/axsh/bin"
exec socat \
     EXEC:"sz -O --start-8k -b -O -l 8000 rand.bin" \
     EXEC:"axpipe -C 220 -s M0THC-1"

Receiver:

export LD_PRELOAD="$HOME/opt/ax25ms/lib/libpreload.so"
export PATH="$PATH:$HOME/opt/axsh/bin"
exec socat \
     EXEC:"rz -y -b -O" \
     EXEC:"axpipe -s M6VMB-1 -c M0THC-1"

The Linux AX.25 implementation is completely broken

The more I use the Linux kernel AX.25 implementation, the more I think it should just be destroyed. It’s mostly behaved well on x86_64, but today I got kernel panics pretty consistently for some tests.

I’m sure there are exploitable problems, but I think they’re only exposed if root has configured an AX.25 interface. So basically “don’t do that” on a production machine.

I’m not saying my implementation is perfect, but I would say that it’s better than the kernel even in this early stage.

Fun quote from the section 4 ax25 manpage: `BUGS: Too numerous to list in full currently’.

Yeah… I believe that.

They did fix the simultaneous read/write deadlock I reported a while back, though.

The D74 firmware is not much better

When testing speeds using my own AX.25 stack I tried increasing the packet size. At 800 byte packets the D74 corrupted enough packets that the AX.25 state machine never recovered into the good state. It kept making progress, but never reset the error counter.

Maybe not getting out of the recovery state is a bug in the AX.25 specs (I double checked and in this aspect I do follow the specs). The specs seem to say that the error counter should increase until there is a time when all sent packets have been ACKed. After the error counter reaches a fixed value (like 11), it cuts the connection.

Maybe all new incoming ACKs should reset the error counter?

But in any case corrupted packets is wasted airtime, so the right solution is to not have them.

At 400 byte packets it handled it better, except the D74 on the sending side kept crashing after a few minutes. At 220 byte (payload) packets it seems stable.

I could switch to Direwolf as the modem, instead of the D74’s built in TNC. This way I would be able to step through all the code end to end. But then I’d have to settle for 1200 bps, which obviously is slower. 9600bps can generally not be sent to a radio as audio. Hence why I want to keep using the D74 TNC.

Ham radio file transfer

It’s becoming more clear to me that amateur radio needs a modern file transfer program. I’ve not yet found one that does this one thing, and does it well.

There are plenty of tools that let you send emails and attach files and such (e.g. WinLink), but I’d like a file transfer primitive.

Probably the best thing would be “the bittorrent of ham radio”. But it would look very different, since it would need to take advantage of the effective “multicast” of radio.

This also means UI frames (AX.25’s version of UDP), which means it won’t need the buggiest and trickiest parts of the AX.25 stack. But on the flip side it’ll need to reinvent them.

Obviously such a protocol would not be usable for pirated material, both because of the low speed and because everyone uses registered callsigns.

↧

Multichannel fast file transfers over AX.25

July 4, 2023, 5:00 pm

≫ Next: The unreasonable effectiveness of radio

≪ Previous: ZModem over amateur radio

Lately I’ve been thinking about a better data protocol for amateur radio.

“Better” is, of course, relative. And the space is so big. Are we talking HF or VHF/UHF? Should it work with existing radios (just working the audio spectrum), or be its own radio? Should it be just RF improvements, or higher networking layers?

File transfers on the application layer

In my previous post I started off trying ZMODEM, but was fairly disappointed. The Linux AX.25 implementation sucks, and ZMODEM is too chatty. Every roundtrip is expensive. But even tuning the parameters, there are better ways to avoid needless retransmits and roundtrips.

I’ve started a tool called hamtransfer. The implementation is currently only point-to-point, but the protocol will work for more “bittorrent” style too.

It uses Raptor codes, but I’ll save you some time: It encodes the file (it calls a “block”) into smaller chunks (it calls “symbols”). It then sends the symbols to the receiver, which will be able to reassemble the original block.

The trick is that the set of symbols is infinite, and the block can be assembled by almost any subset of symbols. If the block is 10kB, then with more than 99% probability sending just 10kB worth of symbols is enough.

Or you can send just two extra symbols (though remember, symbols here should be read as “chunks” or “packets”) and get 99.9999% probability of decode.

So you can send just two extra packets, and if any two packets are lost then you still have 99% chance of a successful reception.

Great. So now you can get actual 9600bps throughput, without a bunch of ACKs.

But 9600bps is not enough. But for that we need to go to RF, and build a new modem.

How can we make it faster?

First some background on 1200 and 9600bps AX.25 and its limitations.

1200bps is two audio tones for 0 and 1, which can be sent to any radio. It “just works”. 9600bps G3RUH (the most standard 9600bps AX.25) is real FSK. You can’t send it to a radio as audio. (this is true enough, without making hardware modifications to the radios)

We could try to use a better protocol than G3RUH 9600bps. The drawback to that is that we would no longer be able to use existing TNCs, ones built into radios like the Kenwood D74 (and upcoming D75).

We could go the VARA FM route, which tops out at about 25kbps. That’s pretty great, just going via audio to an FM radio.

But of course VARA is out, since it’s proprietary windows only software. Apparently it can be run in [Wine][wine], but the closed source is enough for me to say no.

It also looks like it won’t be compatible with my more “multicast” future ideas, since it adjust its modulation according to point-to-point conditions. But more on that in a future post.

So there’s plenty of room for speed improvements in the modulation space. But that’s also not the topic of today. An open source modulation scheme will also have to wait.

If not modulation changes, then what?

If we’re not (yet) going to improve the channel, then let’s just use more channels!

We’re sending from an infinite pool of chunks (“symbols”). Ordering doesn’t matter. All we need to do is send enough of them.

At the end of the day an FM radio just sends one modulated carrier, so it can’t send more than one channels. SSB radios can, but only up to at most 4kHz of bandwidth. And similarly for receive.

But an SDR can.

So I made a multichannel modem in GNURadio. Each channel uses G3RUH 9600bps, and packets to be sent (chunks, or symbols) are load balanced round robin across the channels.

My proof of concept achieved 12.6kbps over three channels. Much of the time was spent waiting for the roundtrip of half duplex, so for larger datasets it should stabilize just under 3*9600=28.8kbps. Hey, I remember those modem speeds!

Of course the SDR doesn’t have a channel limit. It could do 384kbps per Mhz.

SDR? That sounds expensive and low power

Yes, SDRs that can transmit do still cost a bit. But we don’t need high transmit speeds from the downloader. Only the uploader needs to send on multiple channels.

I use the terms “uploader” and “downloader” instead of “sender” and “receiver”, to not confuse it with the hardware that transmits and receives.

So the downloader just needs a way to send download commands, after which it can receive using a cheap $30 RTL SDR. Receivers don’t need high power.

The radio sending the these download request commands doesn’t even need to support 9600bps. They can be sent as 1200bps.

So the communication would be:

Cheap HT -> Expensive SDR: Give me file X. a. The expensive SDR can here be a cheap HT, but since the uploader needs the expensive SDR anyway, might as well use it.
Expensive SDR -> Cheap RTL SDR: high speed bulk data

Proof of concept

I documented my proof of concept. It has a couple of dependencies, all written by me.

[ax25ms][ax25ms] & [ax25ms-rust][ax25ms-rust], for the AX.25 stack.
hamtransfer, the file transfer application.
[radiostuff][radiostuff], where I keep the GNURadio-coded modems.

I made a video:

I meant to do some voiceover annotation for the video, but I won’t have time for that for a while, so better to post it as-is.

Future work

I want to amplify the signal from the SDR, to work outside just my lab. I thought I’d use something like this amplifier, followed by this filter.

The transmit side of the modem also needs a replaced mux. Right now it’s a leaky abstraction that the modem needs equal sized packets in multiples of 3. While fine, it doesn’t scale to “many” channels.

The downloader should tell the uploader which frequencies it should use. E.g. don’t use busy frequencies. The uploader of course should also check which frequencies seem busy from where it is.

↧

The unreasonable effectiveness of radio

July 22, 2023, 5:00 pm

≫ Next: Downloading web resources

≪ Previous: Multichannel fast file transfers over AX.25

Light and radio transmissions are the same thing, just using different frequencies.

How is it reasonable that I can transmit with a radio using the energy of a low energy light bulb (10 watts), and easily chat with someone 1800km away?

Even if you imagine a perfectly dark world, where the only light bulb is a 10W bulb in Sweden; How is it even possible that this could be seen in Italy? It’s not even a spotlight! It’s only vaguely aimed away from east-west, in favour of up, north, and south.

My little light bulb (radio) could be seen in (approximately) all of Europe. ~750 million people potentially could have received it at the same time. With 10 watts.

“Blink blink” — All of Europe can see my little lightbulb.

And this isn’t some specialized super duper antenna, nor was it set up by an expert, fine tuning everything. I just put up the antenna in a PVC pipe and connected it. We can’t even credit fancy computers digging signals out of the noise. This was not FT8, this was PSK31. I’m sure voice would also have worked.

I also used FT8 with these same 10W to check off country after country. Not even a challenge.

This is not close to a record, or even a personal record. I’ve been heard in the US, and even in Brazil. 10W has even been used by amateurs to bounce signals off the moon! (though that took more effort with directional antennas)

This is completely unreasonable, and it’s great. Amateur radio is fun!

If I can do this with amateur radio, then it’s more understandable (but of course even more amazing) that NASA can talk to a probe more than 22 light-hours away.

This is the modest setup I used to chat over 1800km using 10 watts:

↧

Downloading web resources

October 12, 2023, 5:00 pm

≫ Next: RustRadio, and Roast My Rust

≪ Previous: The unreasonable effectiveness of radio

Last time I went to the dentist, they offered to use a fancy scanner to better be able to show me my teeth.

Who can say no to that? I already for fun got a 3D scan of my brain, so why not teeth too?

I requested the data, and got a link to a web UI. Unfortunately it was just a user friendly 3D viewer, without any download button.

Here’s how I extracted the 3D data:

Open Chrome developer console, e.g. by pressing Ctrl-Shift-C (I hate it that Chrome hijacked this. Every single day I press Ctrl-Shift-C to copy, and it throws up this thing)
Close the stupid “what’s new” spam, that nobody in the history of ever has wanted to see.
Go to the ‘Network’ tab.
Reload the page.
Right click on any item in the list, and choose “Save all as HAR with content”. No, I don’t know why I can’t just save that one resource.

A HAR file is a JSON file archive, essentially.

$ jq '.log | keys' foo.har
[
  "creator",
  "entries",
  "pages",
  "version"
]
$ jq '.log | .entries[0].request | keys' foo.har
[
  "bodySize",
  "cookies",
  "headers",
  "headersSize",
  "httpVersion",
  "method",
  "queryString",
  "url"
]
$ jq -r  '.log | .entries[].request.url' foo.har
[… list of the URLs …]
$ jq -r '.log | .entries[4].response.content.text' foo.har | base64 -d > file.tmp
$ file file.tmp
Zip archive data, at least v2.0 to extract, compression method=store
$ unzip file.tmp
[… a bunch of .ctm files …]
$ ctmconv Tooth08.{ctm,obj}

↧

RustRadio, and Roast My Rust

October 14, 2023, 5:00 pm

≫ Next: Setting up secure wifi

≪ Previous: Downloading web resources

I’m learning Rust. And I like playing with software defined radio (SDR). So the natural project to take on to learn Rust is to write a crate for making SDR applications. I call it RustRadio.

I have something that works, and seems pretty OK. But before marking a 1.0.0 release I want to see if I can get some opinions on my use of the Rust language. Both in terms of design, and more clippy-like suggestions.

Hence: Roast My Rust. File a github issue, email me, or tweet at me. Tell me I’m doing it wrong.

RustRadio code: https://github.com/ThomasHabets/rustradio
RustRadio docs: https://docs.rs/rustradio/latest/rustradio/
The first application: https://github.com/ThomasHabets/sparslog

What my priorities are

There are two API surfaces in RustRadio; the Block API (for writing blocks), and the Application API (for writing applications that use blocks). I want them to be good, and future proof, so that I don’t have to change every block and every application, after adding a feature or improving the API.

The blocks will need to be thread safe, even though the scheduler is currently single threaded.

For the streams between blocks I’ll eventually want to make a more fancy, but unsafe circular buffer, that hands out readable or writable slices. But that should all be hideable behind the API that currently just locks the whole stream.

Things I’m currently not happy with

Block API

Using a macro (e.g. in AddConst) for one-in-one-out blocks seems like it’s a workaround, instead of getting something nice to work using traits directly. But I’ve not gotten a default trait implementation to work, to add work(), out(), and block_name().
Max stream output capacity is not enforced, or even respected. I’m thinking that there may be a nice trade-off where a block can hand off generated output samples, that don’t fit, onto the framework, instead of needing to duplicate the buffering in all blocks that need it. Once multithreaded they can also just block.
Once multithreaded, it’s tricky for a block to correctly handle its input stream growing while being read. Maybe remove the ‘.clear()method, and it'll be up to the block to get the length before callingiter(), doing it all under a lock, or mandating atake(n)` after checking length?

Application API

Also here I’ve resorted to a macro, where surely a non-macro API improvement could do it? It’s complicated by the fact that blocks have different number and types of stream. Maybe the block could return a stream-provider block? But that sounds like it could mean a lot of boilerplate for the Block API.

Future work, that I think should fit nicely

Blocks should, in addition to providing the output streams, provide an mpsc channel to update settings.
A block that acts as the UI, by spinning up a webserver.

IKEA Sparsnäs logger

In addition to the “hello world” of SDR, the FM receiver included in RustRadio as an example, I’ve also made an IKEA Sparsnäs decoder.

IKEA Sparsnäs is a light sensor that you stick to your electricity meter’s blinking light, that sends electricity consumption data every 15 seconds on 868Mhz in a very easy to decode protocol.

I wrote a logger, and it works great.

I can receive it anywhere in my house without problems. If you can get hold of an IKEA Sparsnäs then I highly recommend it.

Two alternate implementations

I actually have three proof of concept implementations. One that is more block oriented, like GNU Radio. That’s the main one.

Another branch is iterator based. I don’t think it’ll work, because of how it would block with locks held, when one tries to read from an empty stream. Even if I started messing with condition variables and such, Rust’s borrow checker would probably (rightly) get in the way, and I think it would end up as a big mess.

A second alternative implementation is stream based, to go async. It has a higher chance of working than iterator based, but would likely be much more complex than standard thread based one.

async’s great selling point is if you need to scale to hundreds or thousands of “tasks”, or many I/O bound tasks. But radio flow graphs will mostly have blocks in the double digits. And the blocks will be CPU bound. So the task switching overhead is not there as much.

There’s also plenty of criticism of async Rust, e.g. Async Rust Is A Bad Language.

So while these alternative implementations may be interesting, they’re not the ones I’m aiming for for 1.0.

Maybe the Streams based one can be 2.0, if it turns out I’m wrong…

Resources I’ve found helpful for learning Rust

https://doc.rust-lang.org/book/
https://google.github.io/comprehensive-rust
https://www.lurklurk.org/effective-rust/
All books listed on https://lborb.github.io/book/official.html
And sometimes ChatGPT and Bard. They are both pretty bad at answering the actual question, or even writing code that compiles, but they will sometimes spit out syntax or libraries that a beginner is not aware of.

RP-SMA detour

I was stumped briefly about why sparslog was running fine on my laptop, but decoded very poorly on a raspberry pi and a RISC-V VisionFive 2 board, only occasionally decoding a packet.

At first I thought it could be due to the architecture, or problems with my interfacing with librtlsdr, dropping samples. But then it turned out to be a hardware bug. The 868Mhz antennas I bought were RP-SMA, not SMA. No wonder it had a hard time decoding; the center pin of the antenna was not connected!

RP-SMA should be illegal. It’s so stupid. It was made “wrong” for devices where you’re not supposed to change the antenna to a better one, but of course you can just buy an RP-SMA antenna, or an RP-SMA/SMA adapter.

But now we’re stuck with two extremely similar but physically incompatible standards. In my case it just meant that the antenna was disconnected. Which at least on a receive-only device like the RTL SDR is safe. But with an RP-SMA radio and SMA antenna you risk physically damaging the radio, the antenna, or both. Transmitting with a disconnected antenna can damage a radio, too.

RP-SMA should die in a fire.

↧

Setting up secure wifi

October 23, 2023, 5:00 pm

≫ Next: RISC-V optimization and -mtune

≪ Previous: RustRadio, and Roast My Rust

If you don’t set a password on your wifi, then not only can anyone connect, but it’s not even encrypted. This means that even when an open network gives you a captive portal, that could actually be an attacker giving you a fake portal. Even if the portal is HTTPS, because you may be connected to https://evil-fake-portal.com.

That is solved in WPA3, where even open networks become encrypted.

Of course, the attacker can just set up a fake access point, and you’ll connect, none the wiser. Even if the network has a password, the attacker only needs to know that password in order to fake it.

Before WPA3, passwords can easily be brute forced offline. A few years ago I calculated that it would cost about $70 to crack the default generated 8 character random passwords used by a popular ISP here in London, using some GPUs in Google Cloud. I’m sure it’s cheaper now.

That’s potentially years of free use of your neighbours wifi, for just the cost of a couple of months of paying for your own.

But that’s illegal, of course. This post is about protecting you against these attacks, not performing them.

If you don’t want to mandate WPA3, but still not have a password, then you can do what some hacker conferences do, and set up WPA Enterprise network with username/password, and accept any username and password provided.

And if you do want a password, WPA Enterprise can be set up to make them not vulnerable to brute force.

WPA Enterprise also makes the network authenticate to the client, so password or not, nobody can impersonate the network.

Android phones (and probably others) let you save the network’s CA (certificate authority) on first connect, so you only need that first connection to be secure. Alternatively you can download the CA and install it manually, if you can’t even trust that first connect.

Before you say that no way will you install some random CA; no this time it’s actually fine. It’ll only be used for the WPA Enterprise networks you explicitly use it for, and not for HTTPS.

But an even better solution is to not use passwords at all, but certificates.

So here’s my walkthrough about how to enable WPA Enterprise, with certificates, on a Ubiquiti network, using FreeRADIUS running on a raspberry pi.

Architecture summary

Access points themselves can’t run a user- and certificate database. For WPA Enterprise they are just dumb agents, asking a RADIUS server who should be allowed in or who should not.

EAP-TLS is the mode you want to use. It uses the RADIUS protocol as encapsulation to let the wifi client talk directly to the RADIUS server using TLS. This TLS session uses mTLS, meaning both sides provide certificates for the other side to verify.

After the wifi client and RADIUS server has had this tunneled conversation, the AP is told to let them in.

After that the RADIUS server is no longer involved.

Configure Unifi APs for RADIUS

This is the simple part, so if you have another brand of AP, then it shouldn’t be very different, assuming it supports RADIUS.

Under Settings -> Profiles, create a new RADIUS profile for authentication.

Do not check the Enable RADIUS assigned VLAN for wired/wireless network boxes, unless you want to do more fancy things.

Under Settings -> Wireless Networks, create a new network with WPA2 Enterprise, using this RADIUS profile.

I had trouble running with PMF turned on (optional or required), which is a requirement for WPA3, so I guess I’m not running WPA3, and disabled PMF.

That’s a shame, because WPA3 has several security improvements, including mandatory protection of management frames.

Install and configure FreeRADIUS

1. `apt install freeradius`

Check if this is at least 3.0.26! It needs to support TLS 1.3, or certificates won’t work on modern devices. 3.0.17 (what was packaged with my Raspian based on Debian 10 (buster)) was not enough.

Username/password worked with 3.0.17, but not certificates.

There are prebuilt packages at http://packages.networkradius.com, or you can build from source.

If you installed from source then the path may be /etc/raddb/ instead of the /etc/freeradius/3.0/ that Debian uses.

2. Create CA

Edit [certificate_authority] section of /etc/freeradius/3.0/certs/ca.cnf with your details.

Bump default_days to 3650, to be valid for 10 years.

Then run make ca.pem in that directory.

3. Generate radius server certificate

This will prove to the wifi client that the access point is real.

Edit [server] in /etc/freeradius/3.0/certs/server.cnf. In particular, set the commonName to something like wifi.example.com. For example:

[server]
countryName             = GB
stateOrProvinceName     = Radius
localityName            = Somewhere
organizationName        = Example Inc.
emailAddress            = admin@example.org
commonName              = wifi.example.com

Under [req] set these:

input_password = myserverkeypassword
output_password = myserverkeypassword

The default_days doesn’t really have to be raised on the server cert. You can always regenerate the server cert without messing with all wifi clients. But you do have to regenerate the cert when it expires.

Then run make server.pem.

4. Enable clients to talk to the server

Add this to /etc/freeradius/3.0/clients.conf:

client wifi_aps {
       ipaddr = 192.168.123.0/24
       secret = radiuspasswordhere
}

5. Set FreeRADIUS cert/key

Find these keys, and set them like so, in /etc/freeradius/3.0/mods-enabled/eap:

private_key_password = myserverkeypassword
private_key_file = ${certdir}/server.key
certificate_file = ${certdir}/server.pem
ca_file = ${certdir}/ca.pem

6. Create a test user

In /etc/freeradius/3.0/users, create a test user:

bob    Cleartext-Password := "hello"

7. Restart FreeRADIUS

systemctl restart freeradius

You should now be able to connect to the wifi using:

EAP method: PEAP
Phase 2 authentication: MSCHAPV2
CA certificate: Either trust on first use, or import it manually.
Domain: wifi.example.com, per above
Identity: bob
Anonymous identity: anonymous (default)
Password: hello

Hopefully this worked for you. Don’t proceed until this works. But that’s not enough, we want cert based auth.

8. Generate a client cert

Edit /etc/freeradius/3.0/certs/client.cnf, setting:

default_days for the lifetime.
[client] -> commonName to identify the client.
[req] -> input_password and output_password with a unique password.
make client.pem

This also creates a client.p12, that you can load on an android phone by searching settings for Wi-Fi certificate, and adding it e.g. via Google Drive.

9. Switch to certificates

In /etc/freeradius/3.0/mods-enables/eap:

default_eap_type = tls
private_key_password
private_key_file
certificate_file
ca_file
Set use_tunneled_reply = yes everywhere the option exists in this file, and make sure it’s under the tls-config too

User login will still work, so unless you disable the other methods, that user you created will will work. So probably best to remove it from the users file.

Then restart FreeRADIUS.

10. Change wifi settings to use the cert

On Android:

EAP method: TLS
CA certificate: The CA cert you imported earlier.
Minimum TLS version: 1.3, to avoid getting downgrade attacks.
Online certificate status: Do not verify (I’ve not enabled OCSP).
Domain: wifi.example.com
Identity: Doesn’t matter, but it’ll be visible in clear text over the wire.

Your Android should now connect.

11. Connect with a Linux client

nmcli c add type wifi ifname wlan0 con-name "My-Wifi" \
      802-11-wireless.ssid "WIFI-SSID" \
      802-11-wireless-security.key-mgmt wpa-eap \
      802-1x.eap tls \
      802-1x.identity anonymous \
      802-1x.ca-cert /home/pi/CA.crt \
      802-1x.client-cert /home/pi/test.pem \
      802-1x.private-key /home/pi/test.key \
      802-1x.private-key-password testing1234

That adds a wifi with certificate auth. Then switch to this network using

nmcli c up My-Wifi

Your client is still not safe

This is all for nothing, of course, because your phone and laptop will just connect anyway to any hotel or other free wifi you’ve ever connected to in the past. An attacker can just set up a BA-Lounge wifi and wait for you to connect.

The one good thing for clients is that when they say they’re connected to “My-Wifi”, you can know that they really are connected to the real one, and not an attacker’s fake AP.

This setup does protect the network though. Users without certificates can’t connect, and can’t brute force the password.

RISC-V optimization and -mtune

April 16, 2023, 5:00 pm

≫ Next: AX.25 and 9600bps G3RUH decoding

≪ Previous: Setting up secure wifi

I’m optimizing the inner loops of GNURadio, or in other words the volk library. I’ve been getting up to a about a doubling of the speed compared to the compiled C code, depending on the function.

But it got me thinking how far I could tweak the compiler and its options, too.

Yes, I should have done this much sooner.

Many years ago now I built some data processing thing in C++, and thought it ran too slowly. Sure, I did a debug build, but how much slower could that be? Half speed? Nope. 20x slower.

Of course this time I never compared to a debug build, so don’t expect that kind of difference. Don’t expect that it’ll reach my hand optimized assembly either, imperfect as it may be.

The test code

This may look like a synthetic benchmark, in simplified C++:

complexvolk_32fc_x2_dot_prod_32fc_generic(constvector<complex>&in1,constvector<complex>&in2){complexres;for(unsignedinti=0;i<in1.size();i++){res+=in1[i]*in2[i];}returnret;}

The actual C code is a bit more complex, because it’s been unrolled. Whether that’s needed or not, or indeed makes things worse I don’t know.

Ok, so in most cases I would use an FFT filter, whose performance is more dominated by the FFT and inverse FFT than by the volk parts.

Maybe optimizing the FFT library should be on my list…

GCC 12.2.0

First let’s see what good ol’ GCC will do:

Default options:

RUN_VOLK_TESTS: volk_32fc_x2_dot_prod_32fc(131071,1987)
sifive_u74 completed in 1938.08 ms  <--- Hand coded in assembly.
generic completed in 2718.17 ms    <--- C compiled version.
a_generic completed in 2700.02 ms
Best aligned arch: sifive_u74
Best unaligned arch: sifive_u74

Supposedly optimized for my CPU:

$ CXX=g++-12 \
CC=gcc-12 \
CXXFLAGS="-O3 -march=rv64gc -mtune=sifive-u74" \
CFLAGS="-O3 -march=rv64gc -mtune=sifive-u74" \
cmake -DCMAKE_INSTALL_PREFIX=$HOME/opt/volk ..
$ make -j4
$ make install
$ LD_LIBRARY_PATH=$HOME/opt/volk/lib  ~/opt/volk/bin/volk_profile -R 32fc_x2_dot_prod_32fc
RUN_VOLK_TESTS: volk_32fc_x2_dot_prod_32fc(131071,1987)
sifive_u74 completed in 2001.66 ms
generic completed in 2637.79 ms    <--- C
a_generic completed in 2630.31 ms
Best aligned arch: sifive_u74
Best unaligned arch: sifive_u74

3% better. Within the margin of error. Let’s just call it no difference.

clang 13.0.1

This is the clang that the VisionFive2 root came with.

Default options, except CC/CXX set to clang/clang++.

RUN_VOLK_TESTS: volk_32fc_x2_dot_prod_32fc(131071,1987)
sifive_u74 completed in 1996.03 ms
generic completed in 5559.72 ms   <-- Yikes!
a_generic completed in 5534.04 ms
Best aligned arch: sifive_u74
Best unaligned arch: sifive_u74

That’s less than half the speed of GCC!

But with tuning:

RUN_VOLK_TESTS: volk_32fc_x2_dot_prod_32fc(131071,1987)
sifive_u74 completed in 2013.59 ms
generic completed in 2987.79 ms  <---
a_generic completed in 2939.02 ms
Best aligned arch: sifive_u74
Best unaligned arch: sifive_u74

~10% worse than GCC. That’s much better than without -mtune, though. A huge difference, even though clang compiled for the same architecture.

Clang trunk

Commit: 73c258048e048b8dff0579b8621aa995aab408d4
Date: 2023-04-17

Build clang trunk

I followed the LLVM Getting Started guide.

$ git clone https://github.com/llvm/llvm-project.git
[…]
$ cd llvm-project
$ cmake \
  -S llvm \
  -B build \
  -G Ninja \
  -DLLVM_ENABLE_PROJECTS='clang;lld' \
  -DCMAKE_INSTALL_PREFIX=$HOME/opt/clang-trunk \
  -DCMAKE_BUILD_TYPE=Release \
  -DLLVM_PARALLEL_COMPILE_JOBS=$(nproc) \
  -DLLVM_PARALLEL_LINK_JOBS=$(nproc)
$ time ninja -C build -j$(nproc)
real    586m8.006s     <-- aka ~10h
user    2225m44.714s   <-- 37 CPU-hours
sys     95m51.166s
$ ninja -C build install

Yeah that took a while. Sure, I could have cross-compiled it, but I just started it in the morning before work, and it finished by the time I needed it.

Default settings:

RUN_VOLK_TESTS: volk_32fc_x2_dot_prod_32fc(131071,1987)
sifive_u74 completed in 1996.35 ms
generic completed in 5485.2 ms   <--- Yikes, still.
a_generic completed in 5473.75 ms
Best aligned arch: sifive_u74
Best unaligned arch: sifive_u74

Tuned:

RUN_VOLK_TESTS: volk_32fc_x2_dot_prod_32fc(131071,1987)
sifive_u74 completed in 1957.5 ms
generic completed in 2297.44 ms  <---- Yay!
a_generic completed in 2276.93 ms
Best aligned arch: sifive_u74
Best unaligned arch: sifive_u74

Oh wow, that’s pretty good. My hand coded assembly is just ~15% better. Clang trunk beat GCC 12.2.0.

But also: phew, I didn’t waste my time. It would have sucked to see clang beating the hand coded assembly.

But I am a bit surprised. The U74 is not a complex implementation. I’m surprised there’s anything to tune. But looking at the assembly, the untuned code is crap. Like, what’s this shit?

        fadd.s  fa3,fa3,fa1
        fneg.s  fa1,ft0
        fmul.s  fa1,ft1,fa1
        fmadd.s fa1,fa0,ft2,fa1
        fadd.s  fa4,fa4,fa1
        fmul.s  fa1,ft2,ft0
        fmadd.s fa1,fa0,ft1,fa1
        fadd.s  fa5,fa5,fa1

Additions? Negation? Non-fused multiplies? Of course that’s less efficient. More instructions, too:

   87 clang-trunk-default.txt
  243 clang-trunk-tuned.txt
   51 hand-written.txt

The tuned version also has a bunch of needless instructions. This function should inherently boil down to only fused multiply-adds (fmadd.s and fnmsub.s), for the floating point.

Summary

For my tiny sample here I can say that -mtune-ing for the sifive-u74 did nothing on GCC, but doubled the speed on clang.

Interesting.

Vector instructions

Clang trunk seems to be able to generate the instructions already, which is great! I expect some optimization to still be possible manually, but there will likely be diminishing returns.

I’ll start experimenting with these vector instructions as soon as I get hardware for it.

Comments also on this reddit post.

↧

AX.25 and 9600bps G3RUH decoding

November 18, 2023, 4:00 pm

≫ Next: SDR transmit and clean signals

≪ Previous: RISC-V optimization and -mtune

I’ve been coding more on my rust SDR framework, and want to improve my ability to send/receive data packets efficiently and reliably.

There are two main ways I use learn to do this better: designing a new protocol, and making the best implementation possible for an existing one. This post is about refining the latter.

AX.25 and APRS

First a detour, or background.

AX.25 is the standard amateur radio data protocol. It’s mostly an OSI layer 2-4 protocol, mashing the layers together into one. Contrast this with IP, which just encapsulates the next layer.

Layer 3 (IP stack equivalent: IP itself) consists of the ability to add, in addition to source and destination, a variable number of intermediate repeaters. This allows limited source routing. In APRS the repeaters are usually not named, but instead uses “virtual” hops like WIDE1-1.

Layer 4 (IP stack equivalent: TCP and UDP) allows both connected and disconnected communication channels. In my experience connected AX.25 works better over slow simplex radio than TCP. If TCP was ever optimized for high delay low bandwidth, it’s not anymore.

For the physical layer, there are three main “modems”:

300 baud bell 103, used on HF. Partly because until a few days ago, Americans could not use more than 300 baud on HF.
1200 baud bell 202, used on 144.800Mhz in region 1, and 144.390Mhz in region 2, for APRS. It’s used by BBSs and other applications too. This is by far the most common amateur radio modem, and is often just called “packet”. Anything that supports “APRS” will support 1200 baud.
9600 baud G3RUH. This is is implemented in some radios that already support 1200 baud, such as Kenwood TH-D74 and Yaesu FT5D. There are also dedicated hardware TNCs for it.

I say “baud” (symbols per second), but since all these use one bit per symbol, you can equally call it “bps”.

300 and 1200 bps both use two audio tones. They’re almost exactly as simple as you think they are. The current tone is FM modulated, and transmitted. It has a very distinct sound if you hear it over the air.

The receiver then FM demodulates as usual, and then does another demodulation, this time of the audio frequency. Because the second demodulation outputs binary, the second demodulation is usually called FSK demodulation instead of FM demodulation. So it’s FSK inside FM.

It’s worth digging into detail about how this works, to see how it’s different from 9600bps.

An FM radio can be seen as a device that takes an analog signal X as input, and the tuned frequency Y, and outputs a solid carrier at frequency Y+X.

If the analog signal X is, say, the constant positive number 6, then it outputs a solid carrier (not an FM modulated signal) on frequency Y+6. Units don’t matter for this explanation, it only matters that constant number as input means constant pure carrier at some frequency.

When the analog input is speech, which it usually is, or a sine wave (solid tone), this means that the transmitted carrier frequency Y+X is something like Y + amplitude*sin(t) (for sine wave), or Y + s[n] (for the speech sound wave s).

So 300/1200 baud modems are:

[ 0 or 1 ]
    |
    V
[ audio frequency A or B ]
    |
    V
[ FM modulator ]
    |
    V
[signal around Y Hz]

9600 bps is different. It’s direct FSK over the air. This can’t be sent as audio to most radios, since there is no audio spectrum in the first place.

[ 0 or 1 ]
    |
    V
[ FM modulator ]
    |
    V
[signal either at Y+X or Y-X Hz]

Of course it has to be filtered a bit before going out, since instant transitions between two frequencies would splatter too much, but that’s a detail not worth getting into further here.

This may explain why 9600 baud is much more rare. You have to trick the radio into accepting an input signal that is definitely not audio, and treat it as audio, without filtering away what looks like high frequencies. The input doesn’t have “frequencies”; it’s not audio!

For example, if you send a bunch of +X values in a row to the radio, the radio will just “think” that the microphone has a different reference to ground, and it’ll treat it as 0. I say “think”, but in hardware this could just be a capacitor, which will filter this offset. (I’m glossing over details)

Put another way: Just because something is values that can be put inside a .wav file, that doesn’t make it audio.

Some people do hardware modifications to their radios to make them accept this direct binary input to the FM modulator. Most of the documentation out there about 9600 packet radio assumes you’ll be doing that, which makes this a bit confusing.

This is not what interests me. Either I’ll use a radio with 9600 built in, or I’ll use a software defined radio which already gives me full spectrum access.

The lovely Kenwood TH-D74 provide a whole 9600 baud modem (TNC), so that you can send whole AX.25 frames and it’ll modulate and transmit it for you. No modding fake analog signal trickery needed.

My goal today

I want to improve my 9600bps receiver code. In order to know when it gets better, I have to have sample inputs. For 1200bps there’s a standard CD with over a thousand packets captured over the air, from various radios.

Even if this CD didn’t exist, it’d be easy to create one. Just ask someone in San Francisco to record 144.390MHz for a bit. It was basically fully saturated with various beacons, when I was last there.

For 9600, there’s nothing. Nothing that I could find, at least.

WB2OSZ has a great doc on demodulating 9600bps, where he seems to have started a recording at home, and then driven around beaconing data.

So let’s do that.

Step 1: Record data.
Step 2: compare and iterate on improving implementations.

Transmitter

First I generated some packets, using my ax25ms project:

mkdir data
for i in$(seq -w 10000); do
  ~/ax25ms/generate M0THC-1 aprs msg M0THC-2 "Decode test 2023-11-18 seq $i"> data/$i.bin
done
mkdir kiss
for f in data/*; do
  ./kiss_encode.py < "$f"> "kiss/$(basename "$f")"done

Another detour: The kiss_encode.py script was, except for the last line, all written by GPT-4. Not like I can’t write it, or haven’t written it before. But you know what’s faster than looking up the constants and writing it? Not doing it.

Now I have the data ready to send to a modem that takes packets using KISS. As you may guess by now, I used the D74.

To send the packets I put a raspberry pi zero and a battery pack in my backpack, and sent packets to the radio via bluetooth.

Yet another detour: when in the field, the easiest way to SSH to a raspberry pi is to SSH over bluetooth. I have it set up on all my raspberry pies.

To connect to the radio and send packets I started two loops, in case bluetooth decided to be a problem:

# Keep reconnecting if it disconnects.while true; do
  sudo rfcomm connect /dev/rfcomm0 24:71:89:XX:XX:XX 2
  sleep 2
done

# Send packets every 10 seconds.for i in kiss/*.bin; do
  cat "$i"> /dev/rfcomm0
  sleep 10
done

I arbitrarily chose 144.390Mhz to send on, because I didn’t want everyone on 144.800Mhz to interfere with my test. I also made sure to put the volume up, so I could hear anyone asking me to stop.

Then, because I don’t live in a car dependent dystopia, I took a walk.

Receiver

I set up a simple GNU Radio receiver using an USRP B200 connected to an Diamond VX30 on my roof, filtering/downsampling it down to 50ksps.

With SDRs, remember to avoid tuning exactly onto the signal you’re interested in. The B200 is very good in this regard, but it’s still better to tune a bit off frequency, and then frequency translate in software.

Problems capturing

Did I mention Bluetooth sucks?

The bluetooth randomly disconnected sometimes. And the radio sometimes rebooted. And sometimes the radio appeared fine, but ignored all requests for transmissions.

So while walking around I had to keep an eye on the radio, making sure the TX lamp lit approximately every 10 seconds. If it didn’t, then I restarted the radio and waited for it to start working again.

A perfect recording would have a packet exactly every 10 seconds, to make it easier to zoom in where there should be a packet, and see why it’s not decoding. Maybe another time…

Captured data

I created two big captures. One 51 minutes, one 38 minutes. The latter suffers less from packets failing to send (since I’d spotted the problem), so it’s the more interesting one.

I don’t know how many packets are actually decodable. Since i tried to send every 10s it’s at least no more than 306 and 233, respectively. But because of the aformentioned problems, it’s likely much less.

$ ls -l aprs_capture_test*c32
-rw-r--r-- 1 thomas thomas 1225467648 Nov 18 15:30 aprs_capture_test1.c32
-rw-r--r-- 1 thomas thomas  932427072 Nov 18 16:24 aprs_capture_test2.c32
$ sha1sum aprs_capture_test*
315b15a97e7d7a63205cd840802b84bc7fda07f6  aprs_capture_test1.c32
ba5f6526cdfe67938b4c000b415fc0ef32bf02a7  aprs_capture_test2.c32

If you want the data, I created a torrent for it.

Decoders

I tested four decoders, two of which I’d written myself.

GNU Radio with grsat
Direwolf, the state of the art for these modems.
My streamed implementation, which has a very primitive clock recovery.
My Whole Packet Clock Recovery (WPCR) implementation. Check out this video on WPCR for an explanation. It’s pretty amazing.

Decoding performance

This is the performance of the four implementations, both in terms of CPU usage, and ability to decode.

These results are true as of commit 339affb506a96e9633bb28349d166b198ff72223 of RustRadio, and Direwolf dev at commit 2260df15a554131b3c24209a7ed17ed509009fec.

Summary

Sorted from best to worst decoder, ignoring CPU performance.

Method	File1	File1 CPU time	File2	File 2 CPU time
Direwolf -F 1 -P +	78	2m40s	174	2m0s
Direwolf -F 1	77	24.6s	169	18.5s
Direwolf	76	12.8s	169	9.8s
WPCR threshold 0.000001	73	6.2s	134	5.4s
GNU Radio with grsat	39	50s	84	37s
Streamed	33	3.9s	32	3.0s

WPCR is surprisingly good! Its main downside is that it requires a magic value; the burst power threshold. But because this is software, we could just run many decoders, each with a different threshold. After all, running multiple decoders at once is what Direwolf does with -P +.

I’m sure the main problem with Streamed is the clock recovery. It needs to be WAY smarter. I want to write something like the one described by Andy Walls.

But now I have something to iterate on!

Details: GRSat

$ time ./grsat_decode.py > decoded1
[… wait for CPU to die down, then press enter …]

real    1m21.621s
user    0m47.201s
sys     0m2.479s
$ grep -c '0000: 82 a0 b4 60 60 62 60 9a 60 a8 90 86 40 e3 03 f0' decoded1
39

Details: WPCR

Most of the time goes to juggling PDUs. The initial FftFilter helps marginally (adds two decodes for file2).

$ cargo build \
    -F fast-math \
    --release \
    --example ax25-9600-rx \
    --example ax25-9600-wpcr
$ time ./target/release/examples/ax25-9600-wpcr \
    -o packets \
    -r aprs_capture_test1.c32 \
    -v 2 \
    --sample_rate 50000 \
    --threshold 0.000001
[…]
Block name                Seconds  Percent
------------------------------------------
FileSource                  0.510    8.26%
FftFilter                   1.060   17.15%
RationalResampler           0.278    4.49%
Tee                         0.084    1.37%
ComplexToMag2               0.048    0.78%
SinglePoleIIRFilter<T>      0.313    5.06%
QuadratureDemod             0.398    6.43%
Burst Tagger                0.467    7.56%
StreamToPdu                 2.807   45.42%
Midpointer                  0.070    1.13%
WPCR                        0.087    1.41%
VecToStream                 0.002    0.03%
BinarySlicer                0.002    0.03%
NrziDecode                  0.001    0.02%
Descrambler                 0.001    0.02%
HDLC Deframer               0.004    0.06%
PDU Writer                  0.048    0.77%
------------------------------------------
All blocks                  6.181   99.96%
Non-block time              0.002    0.04%
Elapsed seconds             6.183  100.00%

2023-11-18T19:23:17+00:00 - INFO - HDLC Deframer: Decoded 75 (incl 0 bitfixes), CRC error 1
2023-11-18T19:23:17+00:00 - INFO - PDU Writer: wrote 75

real    0m6.188s
user    0m5.837s
sys     0m0.312s

(then some manual confirmation of correct decode, which is why it’s 73 and not 75 in the summary. Similar below for 33 instead of 42)

Streaming

$ time ./target/release/examples/ax25-9600-rx \
    -o packets \
    -r aprs_capture_test1.c32 \
    -v 2 \
    --sample_rate 50000
[…]
Block name           Seconds  Percent
-------------------------------------
FileSource             0.545   14.10%
FftFilter              1.044   27.02%
RationalResampler      0.278    7.20%
QuadratureDemod        0.400   10.36%
ZeroCrossing           1.153   29.84%
BinarySlicer           0.008    0.21%
NrziDecode             0.004    0.11%
Descrambler            0.082    2.12%
HDLC Deframer          0.337    8.72%
PDU Writer             0.012    0.31%
-------------------------------------
All blocks             3.865   99.97%
Non-block time         0.001    0.03%
Elapsed seconds        3.866  100.00%

2023-11-18T19:28:10+00:00 - INFO - HDLC Deframer: Decoded 42 (incl 9 bitfixes), CRC error 4892
2023-11-18T19:28:10+00:00 - INFO - PDU Writer: wrote 42

real    0m3.869s
user    0m3.626s
sys     0m0.236s

Direwolf

First the input needs to be converted to wav, using iq_to_wav.grc.

Then I used atest from Direwolf:

$ src/atest -F 1 -B 9600 aprs_capture_test2.wav
[…]
169 packets decoded in 18.511 seconds.  125.9 x realtime

-F 1 means try to fix one bit error. -P + means try multiple decoders, per this doc.

↧

SDR transmit and clean signals

November 18, 2023, 4:00 pm

≫ Next: RustRadio improved API 0.4

≪ Previous: AX.25 and 9600bps G3RUH decoding

If you have a transmit capable SDR, you may have heard that you need to filter its output, before transmitting to the world. Certainly before amplifying the signal.

I have a TinySA Ultra spectrum analyzer, and will here show you some screenshots about just how true that is.

I tested this with my USRP B200, transmitting a pure carrier around 145MHz and 435MHz.

Oh, and a word of caution: If you want to replicate this, make sure to add an inline attenuator, to not damage your spectrum analyzer. I had a cheap 40dB one, but the values in the graphs have been adjusted to show the real signal strength, as if I hadn’t.

tl;dr

Harmonics can be almost as strong as the fundamental. You need to filter these.
Transmitting at maximum output gain may cause lots of unwanted signals right around your fundamental. You cannot filter these. You need to not generate them.

Harmonics

Reducing the output gain did not meaningfully fix the problem. The best I saw from using half output gain was to make the strongest harmonic 9dB less than the fundamental. That’s way too strong.

I added a cheap band pass filter (FBP-144), which made it all look great:

I’ve been unable to find such a quick, easy, and cheap solution for the 440Mhz band.

Too high an output gain

For my signal generator I set the Gain Type to Normalized, meaning gain between 0.0 and 1.0. I started with full power.

Zooming into the signal, everything looked mostly fine at full power:

Well, the noise floor looks uneven. Still, it’s low.

But now, while keeping the tuned frequency the same, I start generating a sine wave at 100kHz instead of 0Hz.

Yuck!

It’s the offset that triggers it. Sending a 0Hz signal at 145.1MHz directly is super clean.

Pulling down to half power fixed it. Specifically for me it disappeared at an output gain of 0.53 and lower.

So apparently it’s perfectly possible to generate a clean signal with full power exactly at the tuned frequency of my SDR, but not 100kHz off. So unless you’re doing morse code, this may mean you can’t actually get the “>10dBm” that the data sheet specifies.

Or maybe my B200 is a bit wonky?

Conclusion

When you start planning a project that involves amplifying your SDR output, don’t assume you can start from the max power of your SDR.
Adding a filter for harmonics isn’t just a good idea; it’s mandatory.

↧

RustRadio improved API 0.4

November 23, 2023, 4:00 pm

≫ Next: Apollo 11 notes

≪ Previous: SDR transmit and clean signals

Since last time, I’ve improved the API a bit. That last post was about API version 0.3. Now it’s on 0.4, and I think it’s getting pretty decent.

0.3 could never have worked very well. The API was VecDeque-based, which means it could not provide a linear view (a slice) of all the data in the buffer.

The 0.4 API is simpler. You get a typed slice, and you read or write to, it as appropriate. Because all streams are currently single writer, single reader, the code is simple, and requires minimal amount of locking.

It’s simpler, but I switched to using memory mapped circular buffers, with a slice as the stream interface. This means that the buffer is allocated only once, yet both reader and writer can use all space available to them, linearly, without having to worry about wrapping around.

The code is still at https://github.com/ThomasHabets/rustradio. I registered the github org rustyradio, too. rustradio was taken. I sent a message to the owner, since it seems to not have any real content, but have not heard back.

Unsafe code

To make this multiuser stream I did have to write some unsafe code, though. There’s definitely a risk that I made a mistake. unsafe code means that the safety is off.

But that’s every day in the life of a C++ programmer. On its trickiest day, Rust is still less tricky than C++ to get right.

I found that there are two ways to get unsafe code wrong, in Rust. One is directly, corrupting memory right then and there. The other is more subtle. If you get it wrong in this other way, you’re not creating a bug, really, but you do allow other “safe” code to be buggy.

Like what if you allow handing off two mutable references to the same range?

So you have two jobs, when writing unsafe code:

Make it safe, coding very carefully.
Make it impossible to use incorrectly. You have to collude with the borrow checker police, so that it can do its job.

For example, I need to prevent accidentally opening a stream for writing (requesting the mutable slice range) twice.

For example, this must not be allowed:

letout=self.dst.write_buf()?;self.dst.write_buf()?.produce(10,&[]);

I did that with a simple one item refcount. Sure, it’s a runtime check, but if the programmer makes this mistake then they’ll probably find out the first time they run it.

I wonder if there’s a more clever way to do it, to have the borrow checker enforce it at compile time.

Another thing I need to prevent is a block continuing to write, even after it calls produce(). Because calling produce(10) means the slice is no longer valid. The writer must no longer write to the first 10 elements.

let out = self.dst.write_buf()?;
out.produce(10, &[]);
out.slice()[0] = 1;

Having produce() update the range pointers seems doable, but would make for a very surprising API.

That’s where Rust really helps. I made the produce() method consume the object. It’s a compile time error to use the object after calling produce()! And this includes any and all aliases or borrows.

What’s fixed since 0.3?

Max stream output is now enforced.
Multithread capable.
Circular buffers are 2-10x faster than the VecDeque in 0.3.

What could be better?

Uncopiable streams (PDUs) should probably get their own stream type, to avoid misuse. This would not really be a problem if Rust allowed binding to !Copy, but it doesn’t.
It’d be nice to make it a compile time error to open an input stream for write, and vice versa.
- I tried to do this in the ReadStream branch, but I’m not sure how to solve that the compiler complains that the trait isn’t Send, when using the multithreaded scheduler. I’ll need to understand Send better instead of just unsafely slapping it onto the stream.
Multiple readers for a stream, to remove the need for the Tee block. This doesn’t seem hard, but it does mean complicating the unsafe code. And I don’t need it at the moment, so I’ll leave it for now. In any case adding that feature won’t break API compatability.
Streams use Arc to let both sides of the stream access them. It would be nicer if I could just hand out references. I’m trying it out in the borrow-instead-of-refcount branch, but it has two drawbacks:
1. The application API becomes a bit worse, because the application now has to own all the streams, not just the blocks.
2. The blocks need a lot of lifetime annotations, in order to hold on to the stream references. Possibly I could have the graph own the streams, solving at least one of these problems.
A macro (or code repetition) is still needed for block API and graph assembly.

↧

Apollo 11 notes

December 31, 2023, 4:00 pm

≫ Next: Meshtastic quick setup

≪ Previous: RustRadio improved API 0.4

I was re-reading the Apollo 11 mission reports, as one does, and decided to take some notes along the way.

If you’re interested in these things, I also highly recommend curiousmarc’s series on the Apollo comms hardware.

Notes

First time I’ve seen the word “doff”. Can’t wait to use it in daily conversation.

The rocket equation is a beast. The LM descent stage had 8’210kg of propellant. The ascent stage only 2’365kg.
– Volume 1, Page 50

In total 10’849kg out of 15’061 (72%) of the LM was propellant. (excluding the astronauts themselves)

The LM flown on Apollo 10 did not have the landing program in its computer. To prevent the temptation to land?
– Volume 1, Page 62

Armstrong’s parents were “Mr. and Mrs. Stephen Armstrong”. Michael Collins’ mother is mentioned, but her name is also lost to history, as she’s referred to as “Mrs. James L. Collins”. Only Buzz Aldrin’s mother is named (and what a name!), as Marion Moon Aldrin.

All three were born in 1930, making them turn 39 in 1969.
– Volume 1, Page 76-78

“High speed” data mode is 2400bps, divided into 240 bit blocks.
– Volume 1, Page 93

Aside from the ground stations, the communication network for Apollo 11 when near the earth also included 4 ships and 6-8 aircraft. The aircraft are used during translunar injection and during reentry
– Volume 1, Page 94, 98, 99

The Nasa communication network has radio links way beyond the horizon. Multi hop links, or HF?
– Volume 1, Page 97

Downlink telemetry apparently runs at “51.2 kilobits (12,800 binary digits) per-second”. Huh? They mean 12.8kbps payload, FEC’d up 4x to 51.2kbps?
– Volume 1, Page 98

After re-docking, SM was overpressurised a bit, and LM leak a bit, to maintain positive pressure and not get SM dirty.
– Volume 1, Page 178

Most burns were longer than I expected.

TLI from the S-IVB was 347.3 seconds, with delta-V of 3182m/s.
Lunar orbit insertion 362 seconds, 889.3m/s.
Powered Descent Initiation 712.6s, 2065.3m/s
TEI was 159.0 seconds and 999.4m/s.
– Volume 1, Page 226-227

Almost all consumables ended with more remaining than planned. The only real exception was (as you may expected watching the landing video), the descent stage, which had 2.5% (201kg) instead of the planned 5.1% (413kg). Also the LM finished light on RCS fuel.

Some others had twice as much left as planned.

A minor problem was that “the crew reported that the knob on the ascent engine arm circuit breaker was broken”. Ok, only the ascent engine, nothing important.
– Volume 1, Page 229

During ascent from the lunar surface, there was quite a bit of 5 degree oscillation, due to the center of mass changing as fuel is used.
– Volume 1, Page 237

Rougly half of the oxygen and water in the suits was left in the end. They wanted a big margin, this being the first time.
– Volume 1, Page 239

The thing they were “least prepared to handle” was the post-mission tour.
– Volume 1, Page 240

Q: Based on your own experience in space, do you or any of you feel that there will even be an opportunity for a woman to become an astronaut in our space program?
Armstrong: Gosh, I hope so.
– Volume 1, Page 243

The display on the Abort Guidance System had one of the strokes broken, so you couldn’t tell if it was showing 3 or 9. They used an 8 segment display where 9 has the bottom segment lit.
– Volume 2, Page 52. Volume 3, page 191 and 204

“Distances are deceiving. When we looked at this fairly large boulder field off to the right, it didn’t look very far away at all before we went out. […] Tendency is to think that things are good bit closer than they actually are. This says they are probably a good bit larger than what we might have initially estimated.”
– Volume 2, page 69

“I didn’t notice any temperature thermal difference in and out of the shadow. There were significant light differences and visibility changes but no thermal differences”
– Volume 2, page 75

I did some fairly high jumps and found that there was a tendency to tip over backward on a high jump. One time I came close to falling and decided that was enough of that.
– Volume 2, page 76

On earth you only worry about one or two steps ahead; on the moon, you have to keep a good eye out four or five steps ahead.
– Volume 1, page 234. Volume 2, page 77

“The other problem we had with the camera was that it was falling over all the time. I think this was the result of a little bit of difficulty in figuring out the local vertical”
– Volume 2, Page 79

The flag’s telescoping top rod did not extend fully. Apollo 12 had trouble with the latch mechanism. Funny how you can get all the way there, but then have multiple problems with the flag deployment.
– Volume 2, page 82.

When re-entering, Armstrong held on to the handrails and jumped about as high as he could. He ended up on the third step of the ladder, which is about “5-6 feet” (1.5-1.8m) up.
– Volume 2, page 89.

“COLLINS: […] When the LM is on the surface, the command module should act like a good child and be seen and not heard.”
– Volume 2, page 97

“ALDRIN: There is some discomfort when you swallow a fair amount of gas, but the biggest thing, I guess, is the fact that you just pass more gas. Of course, that’s a big odor problem in the spacecraft.
COLLINS: I beg your pardon.
ALDRIN: I beg yours.
ARMSTRONG: Let’s go on to water glycol system.”
– Volume 2, page 130.

Hatch opening for egrees to hatch closing for ingress was 2h31m40s.
– Volume 3, page 15

By creating a 66 by 54 mile orbit, with the estimate that irregularities in the lunar gravitational field would circularize the orbit at 60 miles. “However, the onboard estimate of the orbit during the rendezvous was 63.2 by 56.8 miles, indicating the ellipticity decay rate was less than expected”.
– Volume 3, page 27

The landed lunar module was perfectly able to track the orbiting command module using radar.
– Volume 3, page 42, with a nice graph of page 63.

Volume 2, page 50 has an interesting graph of altitude to altitude rate.

Remaining engine fire time remaining at landing: 43 seconds. Expected RCS usage for descent: 18kg (40 lb). Actual: about 43kg (95 lb).
– Volume 3, page 60

The DC bus was 28.8 volts, and max current during the mission was 81 Amps. That’s 2’333 watts.
– Volume 3, page 109

Oxygen leak rate was about 22 grams (0.05 pounds) per hour, compared to the specced rate of 90 grams per hour.
– Volume 3, page 125

Average heart rates during the entire mission were 71, 60, and 67 beats/min for the Commander, Command Module Pilot, and the Lunar Module Pilot, respectively. During the powered descent and ascent phases, the only data planned to be available were the Commander’s heart rates, which ranged from 100 to 150 beats/min during descent and from 68 to 120 during ascent.
– Volume 3, page 169

The total dose for each crewman is estimated to have been less than 0.2 rad, which is well below the medically significant level.
– Volume 3, page 170

Armstrong’s heart rate was above 120bpm from approximately 600m above the surface, peaking at about 150bpm at landing two minutes later, and went back below 120 another four minutes.
– Volume 3, page 175

↧

Terminator 1 is the best Terminator movie

The behavior of the terminator

The time travel theory

Actions make sense

T2

T3

Terminator Salvation

Genisys

Dark Fate

The three best Terminator movies ranked.

Perfect movie?

Side note: Harlan’s claims are bullshit

Interview question

Finding bugs

How about a tricky one?

Conclusion

Copies

Aliases

Const

Ownership

Unicode

A thought through language

ChatGPT

RISC-V small computer

It’s just like when I was a kid!

Write SD card

Repartition SD card

Boot up

Upgrade firmware

Running

Wireguard and the kernel

Weird things

Future

The test code

GCC 12.2.0

clang 13.0.1

Clang trunk

Build clang trunk

Summary

Vector instructions

bpftrace

GDB

Which to use?

SNI multiplexing

Continued Rust thoughts

nginx config

Building & running

So what do you do?

ZModem on ax25ms (my AX.25 implementation)

The Linux AX.25 implementation is completely broken

The D74 firmware is not much better

Ham radio file transfer

File transfers on the application layer

How can we make it faster?

If not modulation changes, then what?

SDR? That sounds expensive and low power

Proof of concept

Future work

What my priorities are

Things I’m currently not happy with

Block API

Application API

Future work, that I think should fit nicely

Two alternate implementations

Resources I’ve found helpful for learning Rust

RP-SMA detour

Architecture summary

Configure Unifi APs for RADIUS

Install and configure FreeRADIUS

1. apt install freeradius

2. Create CA

3. Generate radius server certificate

4. Enable clients to talk to the server

5. Set FreeRADIUS cert/key

6. Create a test user

7. Restart FreeRADIUS

8. Generate a client cert

9. Switch to certificates

10. Change wifi settings to use the cert

11. Connect with a Linux client

1. `apt install freeradius`