source/blog/now_what.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66

# `Instant::now()` what?

until Rust 1.60, `Instant::now()` included a [heavyweight hammer](https://github.com/rust-lang/rust/blob/bb1e42599d0062b9a43e83b5486d61eb1fcf0771/src/libstd/time.rs#L153-L194) to enforce the standard library's guarantee that `Instant::now()` monotonically increases. that is, it does not go backwards. some hardware has clocks that go backwards, and Rust sees fit to guard against such ridiculousness.

[tl;dr at the bottom](./now_what.html#tldr)

so, which hardware/software pairs are `actually_monotonic()`? according to [this comment](https://github.com/rust-lang/rust/blob/bb1e42599d0062b9a43e83b5486d61eb1fcf0771/src/libstd/time.rs#L153-L194):  

* (`OpenBSD`, `x86_64`) is [not monotonic](https://github.com/rust-lang/rust/issues/48514)
* (`linux`, `arm64`) is [not monotonic](https://github.com/rust-lang/rust/issues/49281) ([and again](https://github.com/rust-lang/rust/issues/56940))
* (`linux`, `s390x`) is [not monotonic](https://github.com/rust-lang/rust/issues/49281#issuecomment-375469099)
* (`windows`, `x86`) is [not monotonic](https://github.com/rust-lang/rust/issues/51648)
  - hardware here might be a haswell chip, but under xen (details at the bottom of OP: `Intel64 Family 6 Model 63 Stepping 2 GenuineIntel ~2400 Mhz`, [lookup](https://en.wikichip.org/wiki/intel/cpuid#Big_Cores_.28Server.29))
* (`windows`, `x86_64`) is [not monotonic](https://github.com/rust-lang/rust/issues/56560)
  - unknown hardware, also aws
* (`windows`, `x86`) is [not monotonic](https://github.com/rust-lang/rust/issues/56612)

and Firefox has a [similar hammer](https://bugzilla.mozilla.org/show_bug.cgi?id=1487778) to force apparent monotonicity of "now".

i've seen people talk about this before, with shock and awe and horror. i've [tweeted about this before](https://twitter.com/iximeow/status/1114677717897580544). there have been [sharp words](https://lwn.net/Articles/388286/) about x86 TSCs in linux discussions.

what's interesting to me today is that Rust concludes that on the same inconsistent hardware, windows and openbsd get clocks wrong in a way linux does not.

so: on windows, Rust uses [`QueryPerformanceCounter`](https://github.com/rust-lang/rust/blob/bb1e42599d0062b9a43e83b5486d61eb1fcf0771/src/libstd/sys/windows/time.rs#L35-L43). for macos, [`mach_absolute_time`](https://github.com/rust-lang/rust/blob/bb1e42599d0062b9a43e83b5486d61eb1fcf0771/src/libstd/sys/unix/time.rs#L150-L155), and linux, [`clock_gettime(CLOCK_MONOTONIC)`](https://github.com/rust-lang/rust/blob/bb1e42599d0062b9a43e83b5486d61eb1fcf0771/src/libstd/sys/unix/time.rs#L301-L303). these all seem like the reasonable hardware-abstracting ways to get a monotonic clock, letting the OS paper over broken hardware when possible.

and they do ([linux](https://github.com/torvalds/linux/blob/b91c8c42ffdd5c983923edb38b3c3e112bfe6263/lib/vdso/gettimeofday.c#L105-L107), [windows](./now_what.html#ntdll_RtlQueryPerformanceCounter)). i didn't care to figure out what openbsd does, but it certainly also tries to fast-path time checks on reasonable hardware.

so how does linux get steady monotonically increasing times on hardware that windows can't make consistent? [a random comment on stackoverflow](https://stackoverflow.com/questions/28921328/why-does-windows-switch-processes-between-processors#comment101077103_28921779) believes that windows aggressively moves processes between cores, where linux tightly couples processes and cores, which might mean that windows happens to expose inconsistent more often. it also proceeds right into claims about product claims for no good reason. linux just [does what Rust also now does](https://github.com/torvalds/linux/blob/1831fed559732b132aef0ea8261ac77e73f7eadf/arch/x86/include/asm/vdso/gettimeofday.h#L294-L318) - "if the clock looks wrong just saturate and say [it didn't change](https://github.com/torvalds/linux/blob/65c61de9d090edb8a3cfb3f45541e268eb2cdb13/lib/vdso/gettimeofday.c#L78)" (as of [1.60 anyway](https://github.com/rust-lang/rust/commit/9d8ef1160747a4d033f21803770641f2deb32b25)), generally. on x86 specifically, linux [falls back to the kernel](https://github.com/torvalds/linux/blob/65c61de9d090edb8a3cfb3f45541e268eb2cdb13/lib/vdso/gettimeofday.c#L258-L261) if it decided a clock is no longer trustworthy, as determined by the last vdso data update.

`clock_gettime(CLOCK_MONOTONIC)` also makes stronger claims than `QueryPerformanceCounter`, asserting that the returned time is with respect to the system's startup, where QPC says that it's independent of any external time source (so, not comparable to wallclock times). according to microsoft's documentation for QPC, windows XP may be tripped up by hardware incorrectly reporting TSC variance, Vista chose to use HPET instead of a TSC, windows 7 was back to using a TSC if available (modulo incorrect hardware reporting), and windows 8+ use TSCs. i didn't bother looking to see what windows 10 does in the kernel, but in `ntdll.dll!RtlQueryPerformanceCounter`, on x86, it certainly does rely on `rdtsc` with appropriate barriers for serialization.

but then linux developers report that some hardware will change TSCs and lie about the current time, which may lead to incorrect time reports from `clock_gettime` in the fallback kernel code anyway.

so why did Rust decide that windows is untrustworthy due to the presence of broken hardware, while linux is trusted to not give totally bogus times? idk, probably because there were reports of broken windows times on x86, and not reports of broken linux times on x86. maybe linux' attempt at monotonization is sufficient for the worst cases of whacky hardware. maybe windows has a particularly bad time migrating between VMs, might be hinted by a section from this [high-resolution time stamps](https://docs.microsoft.com/en-us/windows/win32/sysinfo/acquiring-high-resolution-time-stamps) document: `; and on Hyper-V, the performance counter frequency is always 10 MHz when the guest virtual machine runs under a hypervisor that implements the hypervisor version 1.0 interface`. the windows issues all have some evidence of being related to times gathered in a VM (maybe even AWS specifically). the Firefox issue seems to relate to older hardware, but [some comments](https://bugzilla.mozilla.org/show_bug.cgi?id=1487778#c5) suggest they actually saw instability on linux as well. i can't see the old crash reports, so i don't have any hope of seeing implicated hardware.

even if windows was penalized for what might be a primarily-in-VMs time issue, the hammer fixes what was an uncontrolled, unpredictable crash due to hardware-level behavior into just a performance issue. that's a good improvement.

<h1 id="tldr">tl;dr? is rust bad?</h1>

given that this was a fix for crashes with murky circumstances where the only clear information - especially easily available - is that the circumstance should be impossible and that buggy hardware is prevalent, the technical decisions made here were reasonable given what the parties knew at the time and the constraints they were subject to. it's fine.

<a href="/index.html">index</a>

<h3 id="ntdll_RtlQueryPerformanceCounter">ps: some windows stuff</h3>

windows is closed source. so to know how it handles hardware differences in tsc consistency we get to read compiled code.

so here's `ntdll.dll!RtlQueryPerformanceCounter`.

<div class="codebox">
#eval radare2 -q -c 'pd 43 @ 0x180040150' ./now_what/ntdll.dll | aha --no-header --stylesheet
</div>

first, `mov r8b, byte [0x7ffe03c6]` loads a byte that will be used to check which way we should read time counters. `r8b` will be reused several times in this function.

all early `je` checks are to branch off to some cold code far away from this function. the happy path is to fall through to `0x18004019d` where either we believe `rdtscp` is sufficient to read timers, or we should `lfence; rdtsc` and come back. either way this loads the TSC into `edx:eax`, which is reassembled into a 64-bit number before being offset and scaled (?) by some core-local (?) information in `r9`. and if this compares less than something (?), branch back and see if we should use the cold path anyway. the cold path code returns here, where we eventually write to the out-pointer parameter in the `mov` at `0x1800401cf`.

the cold path is interesting and worth looking at too:

<div class="codebox">
#eval radare2 -q -c 'pd 21 @ 0x1800b6a3e' ./now_what/ntdll.dll | aha --no-header --stylesheet
</div>

again we're consulting `r8b` for which mechanism we can safely use. down at `0x1800b6a6a` is the worst case, calling into `NtQueryPerformanceCounter` - a wrapper to make the syscall into the kernel for whatever fallback mechanism it has available. this is how windows eventually falls back to HPET if something is seriously wrong.

all in all, not dissimilar from linux's implementation of `tsc`-based timers.