QE5OVlHZ4QyOcMqdjXhS1MgsoZHvUqxOHNZwyfpsDIU/reviews/2022-01-package-jqCnIw.proof.crev - crev-proofs blob - Chris Morgan’s Git repositories

   1 ----- BEGIN CREV PROOF -----
   2 kind: package review
   3 version: -1
   4 date: "2022-01-08T02:03:54.105679370+11:00"
   5 from:
   6   id-type: crev
   7   id: QE5OVlHZ4QyOcMqdjXhS1MgsoZHvUqxOHNZwyfpsDIU
   8   url: "https://git.chrismorgan.info/crev-proofs"
   9 package:
  10   source: "https://crates.io"
  11   name: verhoeff
  12   version: 1.0.0
  13   revision: 67417456c81ccc241cb0e7257d6ca3a955e1d29e
  14   digest: mM9owAVRQvXct8MS3WF2Fw8WZPgSv6lpM1WdjxWwwyQ
  15 review:
  16   thoroughness: high
  17   understanding: high
  18   rating: strong
  19 comment: |-
  20   I wrote this carefully. It’s well-documented, well-tested, and robust against
  21   panicking except as documented (Verhoeff::calculate_verhoeff_check_digit or
  22   verhoeff::calculate, on invalid input). Actually, I realised while writing
  23   this review that VerhoeffMut::push_verhoeff_check_digit is also panicky
  24   because it calculates a check digit, but I don’t think that warrants reducing
  25   the rating from strong to positive, so here we are. (I’ve pushed an
  26   appropriate change, but I don’t think it warrants even a 1.0.1 release.)
  27 ----- SIGN CREV PROOF -----
  28 LJ2SYWIPHSNC49lmO1Bd4hGDAvT8lm-I4heOGrtdZPIRcDoOw2e9P4YxtKT_72r1Qwm3UfnpkCJqtM1UBkfkCw
  29 ----- END CREV PROOF -----
  30
  31 ----- BEGIN CREV PROOF -----
  32 kind: package review
  33 version: -1
  34 date: "2022-01-08T02:06:31.277120223+11:00"
  35 from:
  36   id-type: crev
  37   id: QE5OVlHZ4QyOcMqdjXhS1MgsoZHvUqxOHNZwyfpsDIU
  38   url: "https://git.chrismorgan.info/crev-proofs"
  39 package:
  40   source: "https://crates.io"
  41   name: sanitise-file-name
  42   version: 1.0.0
  43   revision: 2f75e28d1f5d097ae5931fd05dd1b4fe39ec8c08
  44   digest: egqrezU_0kaH9V4t1GOLIZe9W2aKoGc-isplQ-9l3d0
  45 review:
  46   thoroughness: high
  47   understanding: high
  48   rating: strong
  49 comment: |-
  50   I wrote this crate, with great care. (Sometimes it needed it!)
  51
  52   It has unfortunately ended up with rather a lot of code, but a fair bit of
  53   that is related to Doing Things Right. The calculating of *exactly* how many
  54   bytes may be needed is painfully involved (again a part of Doing Things
  55   Right, especially as regards extension cleverness), but I’ve reviewed it
  56   multiple times on different days and each time convinced myself that it is in
  57   fact correct.
  58
  59   I believe that there are only two ways of causing this to panic, both easily
  60   avoided by following instructions, and otherwise harmless. Firstly, giving
  61   `sanitise_to` a `tinyvec_string::ArrayString` that doesn’t have enough space
  62   remaining. That’s definite careless user error: `max_alloc_size_const` was
  63   right there and clearly identified in the docs with explanation about the
  64   whys and wherefores. Secondly, a slight variant of that: implementing
  65   `Stringy` on some other type in a panicky fashion.
  66
  67   (Writing this review led me to add a mention of the possible panic to
  68   sanitise_to’s docs, but I don’t think it warrants a 1.0.1 release.)
  69 ----- SIGN CREV PROOF -----
  70 ymZLNHdzGamyaPo5JyBuSNUY398pyzpjsFMsup1knXtscSZumHP3ZgzNC8KPO7VCQ-23XnQxapenIX27x_DdDw
  71 ----- END CREV PROOF -----
  72
  73 ----- BEGIN CREV PROOF -----
  74 kind: package review
  75 version: -1
  76 date: "2022-01-08T02:07:39.869637389+11:00"
  77 from:
  78   id-type: crev
  79   id: QE5OVlHZ4QyOcMqdjXhS1MgsoZHvUqxOHNZwyfpsDIU
  80   url: "https://git.chrismorgan.info/crev-proofs"
  81 package:
  82   source: "https://crates.io"
  83   name: human-string-filler
  84   version: 1.0.0
  85   revision: a02907505ed1b65bb08d6a77c3b9676ab6c07d02
  86   digest: ZfyKwrp12UPlfOiQPJ9qhbKYeoXy5Il9oJgL_ePwUlA
  87 review:
  88   thoroughness: high
  89   understanding: high
  90   rating: strong
  91 comment: |-
  92   I wrote this crate, carefully. It’s robust, well documented, well tested, and
  93   entirely devoid of panics.
  94 ----- SIGN CREV PROOF -----
  95 uGwLNFZ_p6EV3AXkJYjJle9N4joGx7LkwEMkdONuNWlolx0UWuz9GVM9IqQrBbmdMorpDYkYSPW1wDxJJBuaAA
  96 ----- END CREV PROOF -----
  97
  98 ----- BEGIN CREV PROOF -----
  99 kind: package review
 100 version: -1
 101 date: "2022-01-11T20:07:22.213361263+11:00"
 102 from:
 103   id-type: crev
 104   id: QE5OVlHZ4QyOcMqdjXhS1MgsoZHvUqxOHNZwyfpsDIU
 105   url: "https://git.chrismorgan.info/crev-proofs"
 106 package:
 107   source: "https://crates.io"
 108   name: sanitize-filename
 109   version: 0.3.0
 110   revision: ece3260eb1f7b959125e5ca8f25781b31e875934
 111   digest: WpQbem5BHnp6FnMIwNny-oY5xqc_7W0x4anVu2N-gnw
 112 review:
 113   thoroughness: high
 114   understanding: high
 115   rating: neutral
 116 alternatives:
 117   - source: "https://crates.io"
 118     name: sanitise-file-name
 119 comment: |-
 120   This crate is straightforward and simple in a fairly good way, but it’s not
 121   as efficient as it could be, has a couple of small but significant bugs,
 122   has some defaults and behaviours that I reckon aren’t ideal.
 123
 124   The good:
 125
 126   • The code is simple. The actual sanitisation code is 27 lines
 127     (not including the regular expression definitions).
 128
 129   • If nothing needs to be changed about the name (which is probably the most
 130     common case), exactly one precisely-sized allocation is made. (This is
 131     definitely a positive side-effect of using Regex::replace_all rather than
 132     the obvious standard library alternative, str::replace, which would have
 133     led to a minimum of three or five allocations.)
 134
 135   The not-so-good (that is, things that aren’t ideal but aren’t real problems):
 136
 137   • Unnecessary dependencies: regex and lazy_static are used, but I reckon
 138     they’re not warranted. A character iterator and some push(char) or
 139     push_str(replacement), plus some trim_end_matches() at the end in lieu of
 140     WINDOWS_TRAILING_RE, would do the same job probably faster—though I will
 141     admit that such a naive implementation would be prone to allocate slightly
 142     more than this does when no replacements are made (though starting with
 143     `String::with_capacity(name.len())` would nigh completely counter this),
 144     and a smarter implementation that deferred allocation would be a bit longer
 145     and not quite so “obviously correct”.
 146
 147     Mitigating this claim of unnecessary dependencies and heaviness, regex’s
 148     default features have been turned off, which is good.
 149
 150   • It strips more than is necessary: U+0080–U+009F are not reserved by any
 151     file system or operating system I know of, but this crate is stripping
 152     them; and the names COM0 and LPT0 are not reserved by Windows, but this
 153     crate is stripping them.
 154
 155   • Poor style on Regex and RegexBuilder: ILLEGAL_RE includes colon twice;
 156     and WINDOWS_RESERVED_RE sets case insensitivity in two ways, `(?i)` and
 157     `RegexBuilder::case_insensitive(true)` (so it could have just used
 158     `Regex::new`).
 159
 160   • It unnecessarily enables the unicode-case feature (which is heavy) on regex:
 161     In its favour it did disable the default features, but unicode-case isn’t
 162     needed, because WINDOWS_RESERVED_RE only matches ASCII. It would be better
 163     to change (?i) to (?i-u) and disable the feature again.
 164
 165   • The defaults depend on your platform: the default value for
 166     Options::windows is true on Windows and false elsewhere; varying things
 167     like this is risky, and I think a poor choice in this particular case,
 168     because you could be writing to a device subject to Windows’ limitations
 169     (e.g. a local NTFS partition, a FAT32 or exFAT USB disk, a SMB network
 170     share), and the difference in what Windows requires is so slight (assuming
 171     the excessive stripping mentioned in the next point).
 172
 173   • With Options::windows false, it strips much more than is necessary:
 174     ILLEGAL_RE is still applied, and it corresponds to a mix of NTFS/FAT32 and
 175     Windows limitations; other platforms pretty much only need to worry about ␀
 176     (0x00) and / (0x2F).
 177
 178   • Truncation operates on the entire name, not the base name: this is a
 179     flaw found in almost all libraries like this; although it’s not
 180     intrinsically a security vulnerability, it contributes to the critical
 181     security vulnerability where you thought you checked the extension of a
 182     file was OK, but then you sanitised it and thte extension changed. But
 183     quite apart from the security aspect, file extensions changing unexpectedly
 184     is a usability hazard.
 185
 186   • Documentation-wise, it’s kind of a black box: the README is fair, but
 187     doesn’t explain in detail any of what’s done, and there are no doc comments
 188     on anything. When it turns "aux.h" into "", you’ll say, “wait, *what*!?” if
 189     you don’t know the deal.
 190
 191   The bad (that is, it’ll cause actual problems):
 192
 193   • It misses a character that Microsoft says isn’t allowed: U+007F is supposed
 194     to be stripped, but isn’t.
 195
 196   • WINDOWS_TRAILING_RE is wrong: it should be stripping trailing spaces and
 197     dots (`[. ]+$`), but is instead a duplicate of RESERVED_RE (`^\.+$`) and so
 198     does nothing.
 199
 200   My opinions:
 201
 202   • Options::truncate shouldn’t be a bool with the number 255 hard-coded,
 203     because it’s common to want to append an extension afterwards (especially
 204     if you know about the truncation hazard!).
 205
 206   • I think "" is a mildly poor default for Options::replacement; I think "_"
 207     would be a better choice, so that “1 < 0? A treatise”
 208     becomes “1 _ 0_ A treatise” rather than “1  0 A treatise”.
 209
 210   • The treatment of reserved names is poor: they’re entirely replaced with the
 211     replacement string, so by default "aux.h" will become "" (eh, I suppose
 212     that’s not too awful), and with an underscore "aux.h" will become "_" (uh
 213     oh, it lost its extension). I think a better choice is to alter the base
 214     name; I’ve seen a double underscore prefix and an underscore prefix, and I
 215     chose an underscore suffix in my crate.
 216
 217   • There should also be an option to remove leading dots, which hide files on
 218     most platforms, and are fairly problematic under Windows.
 219
 220   • “Sanitise” looks so much better than “sanitize”. 😛
 221
 222   My recommendation: use my sanitise-file-name crate instead of this.
 223   My crate’s code is certainly longer-winded because of its performance focus,
 224   but it’s more thorough in its processing, with better filtering and better
 225   defaults, quite apart from the (unbenchmarked) performance thing.
 226
 227   sanitise-file-name doesn’t cover *precisely* the same functionality as this
 228   crate, but the differences are generally minor and fairly clearly described.
 229   Here are the option translations:
 230
 231     sanitize_filename::Options.truncate == true (the default)
 232     sanitise_file_name::Options.length_limit == 255 (the default)
 233
 234     sanitize_filename::Options.truncate == false
 235     sanitise_file_name::Options.length_limit == usize::MAX
 236
 237     sanitize_filename::Options.windows (default cfg!(windows))
 238     sanitise_file_name::Options.windows_safe (default true)
 239
 240     sanitize_filename::Options.replacement == "" (the default)
 241     sanitise_file_name::Options.replace_with == None (NOT the default)
 242
 243     sanitize_filename::Options.replacement == "_" (any one character)
 244     sanitise_file_name::Options.replace_with == Some('_') (the default)
 245
 246     sanitize_filename::Options.replacement == "string" (longer than one character)
 247     (not supported in sanitise-file-name)
 248
 249   I haven’t factored the sanitize-filename binary into my reckoning here.
 250   Some may find it useful, but I don’t think it will often be useful as a
 251   standalone binary. And the binary’s usage string is found in the README, but
 252   is not emitted by the binary itself, which is odd. Kinda fits in with not
 253   documenting anything inside the code files!
 254 ----- SIGN CREV PROOF -----
 255 16uMNqMzp_nrGOi5JyJro5meslEjUV7coVZ5qxyobQ6So-H8DrkbjBSOntYC-1fP2VthjzzdfvvB__i6KakhCw
 256 ----- END CREV PROOF -----
 257
 258 ----- BEGIN CREV PROOF -----
 259 kind: package review
 260 version: -1
 261 date: "2022-01-12T05:37:47.367354687+11:00"
 262 from:
 263   id-type: crev
 264   id: QE5OVlHZ4QyOcMqdjXhS1MgsoZHvUqxOHNZwyfpsDIU
 265   url: "https://git.chrismorgan.info/crev-proofs"
 266 package:
 267   source: "https://crates.io"
 268   name: sanitize-filename-reader-friendly
 269   version: 2.2.1
 270   revision: 245eca36c588cb70f907e2167ecee93b0fd54d33
 271   digest: f6Pot3fE2Kp4y5hrg6gx34D4YFt6oJGexGqMNwT2yko
 272 review:
 273   thoroughness: high
 274   understanding: high
 275   rating: negative
 276 alternatives:
 277   - source: "https://crates.io"
 278     name: sanitise-file-name
 279 comment: |-
 280   This crate does approximately what it’s supposed to, but it’s overengineered
 281   in a bad way (complicated for no good reason), confusingly and insufficiently
 282   documented, contains some perplexing functionality that I can’t think of any
 283   good reason for the existence of, is not as efficient as it could be, and
 284   misses a couple of fairly significant things in a file name sanitiser that
 285   mean that what it returns might still not work.
 286
 287   I recommend my own crate sanitise-file-name instead, which is unfortunately
 288   at least as complex, but for the good reasons of runtime efficiency and it
 289   actually caring about extensions and correctness, and far better documented
 290   and explained.
 291
 292   The good:
 293
 294   • It does what it says on the tin. The sanitisation it performs *is* fairly
 295     reader-friendly (though its rules are more convoluted than I think is
 296     warranted).
 297
 298   The not-so-good:
 299
 300   • There’s no name length limitation: this is actually a little subjective,
 301     because implemented and used carelessly it leads to file extension change
 302     attacks (which in the right environment could allow remote code execution),
 303     but it’s generally expected.
 304
 305   • It strips or replaces some characters it probably shouldn’t:
 306
 307     • For example, U+200B is branded a non-printing character and removed. For
 308       starters, there’s no such concept in Unicode; the closest is control
 309       characters, and U+200B ZERO-WIDTH SPACE isn’t one of those. Removing it
 310       serves no security purpose (unlike most of the Bidi_Control characters),
 311       and will mangle some Unicode text where ZWSP is significant. You could
 312       argue over it being a fingerprinting vector, but in that case why aren’t
 313       ZWNJ and ZWJ filtered out, and after that how about confusables, &c.
 314       So I say there’s just no good reason to remove U+200B.
 315
 316     • It replaces various characters that no file system objects to, such as ~
 317       (replaced with _) and #, %, {, }, ^ and ` (replaced with space).
 318       Most of those can be genuinely useful characters in file names.
 319
 320   • Some of the rules are fairly arbitrary, and some excessive: the decisions
 321     about what gets turned into a space, what gets turned into an underscore,
 322     and what gets removed are idiosyncratic with no explanation, and a couple
 323     of them seem a little odd. Not harmful in any way, just… why those
 324     particular rules? FILTER_ORIG_AFTER_LAST_PROCESSED_WAS_WHITESPACE is
 325     perhaps the most bizarre and troubling: drop a few characters, including
 326     `.`, the extension separator, if they come after a space. Remember my
 327     earlier point about length limitation potentially opening a security
 328     vulnerability? Well, this is a similar sort of issue, though unlikely to be
 329     exploitable, that you can change the extension significantly:
 330     `foo.exe .jpg` becomes `foo.exe jpg`.
 331
 332     I think it’s telling, philosophically, that the library completely ignores
 333     extensions. It’s not worrying about security. (In the tests, only two or
 334     three test cases have extensions, and those are ones copied from elsewhere.)
 335
 336   The bad:
 337
 338   • It’s inefficient: an absolute minimum of five allocations, and in practice
 339     it’ll tend to be a few more. (It wouldn’t be *terribly* difficult to make
 340     it perform exactly one allocation in all cases, like I did. A couple of the
 341     allocations are almost trivially removable.) Also it uses str.find(char) to
 342     check if characters can be removed, on strings as long as 10 chars/30 code
 343     units, which is much less efficient than a match block and probably less
 344     efficient even than using [char], while also writing boolean conditions in
 345     a way that doesn’t take advantage of short-circuiting
 346     (`STRING.find(c_orig).is_some() && last_processed_chr == '&'`, which would
 347     be better flipped to do the cheap char comparison first).
 348
 349   • It’s confusing with some surprising scope: it’s not sanitising simple
 350     names, but rather some kind of thing where you can put parts on multiple
 351     lines and it’ll sanitise them individually and join them up and I can’t
 352     think of any serious reason why you’d actually want this—the potential
 353     benefits are negligible, and it makes it much more confusing.
 354
 355   • Its code style is inconsistent: useful comments occasionally, superfluous
 356     comments sometimes but not other times, erroneous comments sometimes, a doc
 357     comment with a spurious asterisk in the middle.
 358
 359   • It overuses constants and its documentation is poorly prepared: it’s
 360     written in an idiosyncratic private const-heavy style that makes reading
 361     the code hard and doesn’t translate to rustdoc at all, so you have to read
 362     the code to understand what’s actually going on, and the code’s overly
 363     complicated, so it takes quite some time to figure out what’s going on,
 364     constantly having to jump between the code and the consts as well.
 365     Actually, it looks like the author *tried* to make it rustdoc-friendly by
 366     pulling in const_format::concatcp! (clever compile-time magic) and exposing
 367     a couple of public consts, but quite apart from them being incomplete (as
 368     documented), it doesn’t even work because rustdoc doesn’t include the
 369     values of the consts in its output. (In theory perhaps the constants could
 370     be used by dependers as well, but I can’t imagine any genuine use case.)
 371
 372   • It doesn’t do all it needs to: Windows reserved names are a TODO in the
 373     code. I contributed the essence of a patch, but no action has been taken
 374     after some weeks. (Actually, I started patching it further to make it more
 375     efficient, then decided I just wanted to start from scratch, and so I made
 376     sanitise-file-name.)
 377
 378   • I have a problem with its entire philosphical approach, given the domain:
 379     as noted in the previous section, length limitations, not even
 380     acknowledging extensions are a thing and doing dangerous things in
 381     consequence, it’s just not the right approach for sanitising file names, an
 382     area where you should be caring about security. Ultimately, this is why I’m
 383     giving a rating of negative rather than neutral as I originally intended.
 384
 385   My recommendation: avoid this crate. It’s not awful by any means, but it’s
 386   not well-written either, and my crate sanitise-file-name provides much the
 387   same functionality, but better. Its default options are quite similar to this
 388   crate’s, but more thorough. It doesn’t offer the strangne multi-line thing
 389   this crate does.
 390 ----- SIGN CREV PROOF -----
 391 C30SLGF6ibx4QaLXo6uazHuIio1iP_rSmab7Vo6nLEeFZfuoaFTqiaMhPYflhvcGHEu6jImRBBIvmworq13bBg
 392 ----- END CREV PROOF -----
 393
 394 ----- BEGIN CREV PROOF -----
 395 kind: package review
 396 version: -1
 397 date: "2022-01-13T00:42:12.174512813+11:00"
 398 from:
 399   id-type: crev
 400   id: QE5OVlHZ4QyOcMqdjXhS1MgsoZHvUqxOHNZwyfpsDIU
 401   url: "https://git.chrismorgan.info/crev-proofs"
 402 package:
 403   source: "https://crates.io"
 404   name: cfg-if
 405   version: 1.0.0
 406   revision: e60fa1efeab0ec6e90c50d93ec526e1410459c23
 407   digest: YmRVto2mw7U7GJhrOe13k_5Sgg9TcQWK1NuLN2Qiz5E
 408 review:
 409   thoroughness: high
 410   understanding: high
 411   rating: positive
 412 comment: |-
 413   This is a robust crate, clear, simple and well-documented within and without,
 414   powered by a well-designed tt gobbler, and written by Alex Crichton of
 415   exemplar reputation.
 416
 417   I’m quite fond of the ``$(if … { … })else* else { … }`` trick, using the
 418   keyword ``else`` as the repetition delimiter. I knew you could put a variety
 419   of things there, not just characters like comma or semicolon, but I hadn’t
 420   connected the dots that that would extended to keywords. Hey, you can even
 421   use *string literals*! Crazy stuff. Anyway, that’s a neat way of keeping the
 422   public API to two rules rather than three as I would probably have done.
 423
 424   I found one surprise in this: ``cfg_if! { else { true } }`` works, yielding
 425   true, because that repetition I just showed used ``*`` instead of ``+``.
 426   It’s not covered by the test suite, but it’s harmless.
 427   <https://github.com/alexcrichton/cfg-if/issues/23> confirms it’s unintended.
 428
 429   Since the 1.0.0 release, there have been [a few commits
 430   <https://github.com/alexcrichton/cfg-if/commits/main>]. Most are minor
 431   trivialities (yeah, I’ll even call Dependabot that, though I hate it and it’s
 432   not even useful here because this crate has no dependencies), but
 433   <https://github.com/alexcrichton/cfg-if/commit/349913689695130c2c63d1075e57b0917c98ee6e>
 434   is significant and very mildly concerning: as well as twiddling some
 435   formatting (for the good), it changes that ``*`` to ``+`` to deny bare else,
 436   which is a breaking change, for unlikely but reasonable and plausible code.
 437   I’ve commented on #23 and #39 to this effect; we’ll see what comes of it.
 438   (It could be reverted, it could be ignored, it could be released as 2.0.0.)
 439
 440   There are a couple of extraneous files in the package (.gitignore,
 441   .github/workflows/main.yml), but they’re small enough that I won’t complain.
 442
 443   I think this crate is overused and most places it’s used would be better
 444   served by writing the negations themselves, which are often no longer and
 445   often clearer.
 446 ----- SIGN CREV PROOF -----
 447 ex_TABmVr_CTEY6B5C-NDKk7ciKdCK1bLf6ltjpXyoVknvyAC2AnRjA5rlPX_5ORQMeHmzJ2UxWUR0RdgBjYAQ
 448 ----- END CREV PROOF -----
 449