----- BEGIN CREV PROOF ----- kind: package review version: -1 date: "2022-01-08T02:03:54.105679370+11:00" from: id-type: crev id: QE5OVlHZ4QyOcMqdjXhS1MgsoZHvUqxOHNZwyfpsDIU url: "https://git.chrismorgan.info/crev-proofs" package: source: "https://crates.io" name: verhoeff version: 1.0.0 revision: 67417456c81ccc241cb0e7257d6ca3a955e1d29e digest: mM9owAVRQvXct8MS3WF2Fw8WZPgSv6lpM1WdjxWwwyQ review: thoroughness: high understanding: high rating: strong comment: |- I wrote this carefully. It’s well-documented, well-tested, and robust against panicking except as documented (Verhoeff::calculate_verhoeff_check_digit or verhoeff::calculate, on invalid input). Actually, I realised while writing this review that VerhoeffMut::push_verhoeff_check_digit is also panicky because it calculates a check digit, but I don’t think that warrants reducing the rating from strong to positive, so here we are. (I’ve pushed an appropriate change, but I don’t think it warrants even a 1.0.1 release.) ----- SIGN CREV PROOF ----- LJ2SYWIPHSNC49lmO1Bd4hGDAvT8lm-I4heOGrtdZPIRcDoOw2e9P4YxtKT_72r1Qwm3UfnpkCJqtM1UBkfkCw ----- END CREV PROOF ----- ----- BEGIN CREV PROOF ----- kind: package review version: -1 date: "2022-01-08T02:06:31.277120223+11:00" from: id-type: crev id: QE5OVlHZ4QyOcMqdjXhS1MgsoZHvUqxOHNZwyfpsDIU url: "https://git.chrismorgan.info/crev-proofs" package: source: "https://crates.io" name: sanitise-file-name version: 1.0.0 revision: 2f75e28d1f5d097ae5931fd05dd1b4fe39ec8c08 digest: egqrezU_0kaH9V4t1GOLIZe9W2aKoGc-isplQ-9l3d0 review: thoroughness: high understanding: high rating: strong comment: |- I wrote this crate, with great care. (Sometimes it needed it!) It has unfortunately ended up with rather a lot of code, but a fair bit of that is related to Doing Things Right. The calculating of *exactly* how many bytes may be needed is painfully involved (again a part of Doing Things Right, especially as regards extension cleverness), but I’ve reviewed it multiple times on different days and each time convinced myself that it is in fact correct. I believe that there are only two ways of causing this to panic, both easily avoided by following instructions, and otherwise harmless. Firstly, giving `sanitise_to` a `tinyvec_string::ArrayString` that doesn’t have enough space remaining. That’s definite careless user error: `max_alloc_size_const` was right there and clearly identified in the docs with explanation about the whys and wherefores. Secondly, a slight variant of that: implementing `Stringy` on some other type in a panicky fashion. (Writing this review led me to add a mention of the possible panic to sanitise_to’s docs, but I don’t think it warrants a 1.0.1 release.) ----- SIGN CREV PROOF ----- ymZLNHdzGamyaPo5JyBuSNUY398pyzpjsFMsup1knXtscSZumHP3ZgzNC8KPO7VCQ-23XnQxapenIX27x_DdDw ----- END CREV PROOF ----- ----- BEGIN CREV PROOF ----- kind: package review version: -1 date: "2022-01-08T02:07:39.869637389+11:00" from: id-type: crev id: QE5OVlHZ4QyOcMqdjXhS1MgsoZHvUqxOHNZwyfpsDIU url: "https://git.chrismorgan.info/crev-proofs" package: source: "https://crates.io" name: human-string-filler version: 1.0.0 revision: a02907505ed1b65bb08d6a77c3b9676ab6c07d02 digest: ZfyKwrp12UPlfOiQPJ9qhbKYeoXy5Il9oJgL_ePwUlA review: thoroughness: high understanding: high rating: strong comment: |- I wrote this crate, carefully. It’s robust, well documented, well tested, and entirely devoid of panics. ----- SIGN CREV PROOF ----- uGwLNFZ_p6EV3AXkJYjJle9N4joGx7LkwEMkdONuNWlolx0UWuz9GVM9IqQrBbmdMorpDYkYSPW1wDxJJBuaAA ----- END CREV PROOF ----- ----- BEGIN CREV PROOF ----- kind: package review version: -1 date: "2022-01-11T20:07:22.213361263+11:00" from: id-type: crev id: QE5OVlHZ4QyOcMqdjXhS1MgsoZHvUqxOHNZwyfpsDIU url: "https://git.chrismorgan.info/crev-proofs" package: source: "https://crates.io" name: sanitize-filename version: 0.3.0 revision: ece3260eb1f7b959125e5ca8f25781b31e875934 digest: WpQbem5BHnp6FnMIwNny-oY5xqc_7W0x4anVu2N-gnw review: thoroughness: high understanding: high rating: neutral alternatives: - source: "https://crates.io" name: sanitise-file-name comment: |- This crate is straightforward and simple in a fairly good way, but it’s not as efficient as it could be, has a couple of small but significant bugs, has some defaults and behaviours that I reckon aren’t ideal. The good: • The code is simple. The actual sanitisation code is 27 lines (not including the regular expression definitions). • If nothing needs to be changed about the name (which is probably the most common case), exactly one precisely-sized allocation is made. (This is definitely a positive side-effect of using Regex::replace_all rather than the obvious standard library alternative, str::replace, which would have led to a minimum of three or five allocations.) The not-so-good (that is, things that aren’t ideal but aren’t real problems): • Unnecessary dependencies: regex and lazy_static are used, but I reckon they’re not warranted. A character iterator and some push(char) or push_str(replacement), plus some trim_end_matches() at the end in lieu of WINDOWS_TRAILING_RE, would do the same job probably faster—though I will admit that such a naive implementation would be prone to allocate slightly more than this does when no replacements are made (though starting with `String::with_capacity(name.len())` would nigh completely counter this), and a smarter implementation that deferred allocation would be a bit longer and not quite so “obviously correct”. Mitigating this claim of unnecessary dependencies and heaviness, regex’s default features have been turned off, which is good. • It strips more than is necessary: U+0080–U+009F are not reserved by any file system or operating system I know of, but this crate is stripping them; and the names COM0 and LPT0 are not reserved by Windows, but this crate is stripping them. • Poor style on Regex and RegexBuilder: ILLEGAL_RE includes colon twice; and WINDOWS_RESERVED_RE sets case insensitivity in two ways, `(?i)` and `RegexBuilder::case_insensitive(true)` (so it could have just used `Regex::new`). • It unnecessarily enables the unicode-case feature (which is heavy) on regex: In its favour it did disable the default features, but unicode-case isn’t needed, because WINDOWS_RESERVED_RE only matches ASCII. It would be better to change (?i) to (?i-u) and disable the feature again. • The defaults depend on your platform: the default value for Options::windows is true on Windows and false elsewhere; varying things like this is risky, and I think a poor choice in this particular case, because you could be writing to a device subject to Windows’ limitations (e.g. a local NTFS partition, a FAT32 or exFAT USB disk, a SMB network share), and the difference in what Windows requires is so slight (assuming the excessive stripping mentioned in the next point). • With Options::windows false, it strips much more than is necessary: ILLEGAL_RE is still applied, and it corresponds to a mix of NTFS/FAT32 and Windows limitations; other platforms pretty much only need to worry about ␀ (0x00) and / (0x2F). • Truncation operates on the entire name, not the base name: this is a flaw found in almost all libraries like this; although it’s not intrinsically a security vulnerability, it contributes to the critical security vulnerability where you thought you checked the extension of a file was OK, but then you sanitised it and thte extension changed. But quite apart from the security aspect, file extensions changing unexpectedly is a usability hazard. • Documentation-wise, it’s kind of a black box: the README is fair, but doesn’t explain in detail any of what’s done, and there are no doc comments on anything. When it turns "aux.h" into "", you’ll say, “wait, *what*!?” if you don’t know the deal. The bad (that is, it’ll cause actual problems): • It misses a character that Microsoft says isn’t allowed: U+007F is supposed to be stripped, but isn’t. • WINDOWS_TRAILING_RE is wrong: it should be stripping trailing spaces and dots (`[. ]+$`), but is instead a duplicate of RESERVED_RE (`^\.+$`) and so does nothing. My opinions: • Options::truncate shouldn’t be a bool with the number 255 hard-coded, because it’s common to want to append an extension afterwards (especially if you know about the truncation hazard!). • I think "" is a mildly poor default for Options::replacement; I think "_" would be a better choice, so that “1 < 0? A treatise” becomes “1 _ 0_ A treatise” rather than “1 0 A treatise”. • The treatment of reserved names is poor: they’re entirely replaced with the replacement string, so by default "aux.h" will become "" (eh, I suppose that’s not too awful), and with an underscore "aux.h" will become "_" (uh oh, it lost its extension). I think a better choice is to alter the base name; I’ve seen a double underscore prefix and an underscore prefix, and I chose an underscore suffix in my crate. • There should also be an option to remove leading dots, which hide files on most platforms, and are fairly problematic under Windows. • “Sanitise” looks so much better than “sanitize”. 😛 My recommendation: use my sanitise-file-name crate instead of this. My crate’s code is certainly longer-winded because of its performance focus, but it’s more thorough in its processing, with better filtering and better defaults, quite apart from the (unbenchmarked) performance thing. sanitise-file-name doesn’t cover *precisely* the same functionality as this crate, but the differences are generally minor and fairly clearly described. Here are the option translations: sanitize_filename::Options.truncate == true (the default) sanitise_file_name::Options.length_limit == 255 (the default) sanitize_filename::Options.truncate == false sanitise_file_name::Options.length_limit == usize::MAX sanitize_filename::Options.windows (default cfg!(windows)) sanitise_file_name::Options.windows_safe (default true) sanitize_filename::Options.replacement == "" (the default) sanitise_file_name::Options.replace_with == None (NOT the default) sanitize_filename::Options.replacement == "_" (any one character) sanitise_file_name::Options.replace_with == Some('_') (the default) sanitize_filename::Options.replacement == "string" (longer than one character) (not supported in sanitise-file-name) I haven’t factored the sanitize-filename binary into my reckoning here. Some may find it useful, but I don’t think it will often be useful as a standalone binary. And the binary’s usage string is found in the README, but is not emitted by the binary itself, which is odd. Kinda fits in with not documenting anything inside the code files! ----- SIGN CREV PROOF ----- 16uMNqMzp_nrGOi5JyJro5meslEjUV7coVZ5qxyobQ6So-H8DrkbjBSOntYC-1fP2VthjzzdfvvB__i6KakhCw ----- END CREV PROOF ----- ----- BEGIN CREV PROOF ----- kind: package review version: -1 date: "2022-01-12T05:37:47.367354687+11:00" from: id-type: crev id: QE5OVlHZ4QyOcMqdjXhS1MgsoZHvUqxOHNZwyfpsDIU url: "https://git.chrismorgan.info/crev-proofs" package: source: "https://crates.io" name: sanitize-filename-reader-friendly version: 2.2.1 revision: 245eca36c588cb70f907e2167ecee93b0fd54d33 digest: f6Pot3fE2Kp4y5hrg6gx34D4YFt6oJGexGqMNwT2yko review: thoroughness: high understanding: high rating: negative alternatives: - source: "https://crates.io" name: sanitise-file-name comment: |- This crate does approximately what it’s supposed to, but it’s overengineered in a bad way (complicated for no good reason), confusingly and insufficiently documented, contains some perplexing functionality that I can’t think of any good reason for the existence of, is not as efficient as it could be, and misses a couple of fairly significant things in a file name sanitiser that mean that what it returns might still not work. I recommend my own crate sanitise-file-name instead, which is unfortunately at least as complex, but for the good reasons of runtime efficiency and it actually caring about extensions and correctness, and far better documented and explained. The good: • It does what it says on the tin. The sanitisation it performs *is* fairly reader-friendly (though its rules are more convoluted than I think is warranted). The not-so-good: • There’s no name length limitation: this is actually a little subjective, because implemented and used carelessly it leads to file extension change attacks (which in the right environment could allow remote code execution), but it’s generally expected. • It strips or replaces some characters it probably shouldn’t: • For example, U+200B is branded a non-printing character and removed. For starters, there’s no such concept in Unicode; the closest is control characters, and U+200B ZERO-WIDTH SPACE isn’t one of those. Removing it serves no security purpose (unlike most of the Bidi_Control characters), and will mangle some Unicode text where ZWSP is significant. You could argue over it being a fingerprinting vector, but in that case why aren’t ZWNJ and ZWJ filtered out, and after that how about confusables, &c. So I say there’s just no good reason to remove U+200B. • It replaces various characters that no file system objects to, such as ~ (replaced with _) and #, %, {, }, ^ and ` (replaced with space). Most of those can be genuinely useful characters in file names. • Some of the rules are fairly arbitrary, and some excessive: the decisions about what gets turned into a space, what gets turned into an underscore, and what gets removed are idiosyncratic with no explanation, and a couple of them seem a little odd. Not harmful in any way, just… why those particular rules? FILTER_ORIG_AFTER_LAST_PROCESSED_WAS_WHITESPACE is perhaps the most bizarre and troubling: drop a few characters, including `.`, the extension separator, if they come after a space. Remember my earlier point about length limitation potentially opening a security vulnerability? Well, this is a similar sort of issue, though unlikely to be exploitable, that you can change the extension significantly: `foo.exe .jpg` becomes `foo.exe jpg`. I think it’s telling, philosophically, that the library completely ignores extensions. It’s not worrying about security. (In the tests, only two or three test cases have extensions, and those are ones copied from elsewhere.) The bad: • It’s inefficient: an absolute minimum of five allocations, and in practice it’ll tend to be a few more. (It wouldn’t be *terribly* difficult to make it perform exactly one allocation in all cases, like I did. A couple of the allocations are almost trivially removable.) Also it uses str.find(char) to check if characters can be removed, on strings as long as 10 chars/30 code units, which is much less efficient than a match block and probably less efficient even than using [char], while also writing boolean conditions in a way that doesn’t take advantage of short-circuiting (`STRING.find(c_orig).is_some() && last_processed_chr == '&'`, which would be better flipped to do the cheap char comparison first). • It’s confusing with some surprising scope: it’s not sanitising simple names, but rather some kind of thing where you can put parts on multiple lines and it’ll sanitise them individually and join them up and I can’t think of any serious reason why you’d actually want this—the potential benefits are negligible, and it makes it much more confusing. • Its code style is inconsistent: useful comments occasionally, superfluous comments sometimes but not other times, erroneous comments sometimes, a doc comment with a spurious asterisk in the middle. • It overuses constants and its documentation is poorly prepared: it’s written in an idiosyncratic private const-heavy style that makes reading the code hard and doesn’t translate to rustdoc at all, so you have to read the code to understand what’s actually going on, and the code’s overly complicated, so it takes quite some time to figure out what’s going on, constantly having to jump between the code and the consts as well. Actually, it looks like the author *tried* to make it rustdoc-friendly by pulling in const_format::concatcp! (clever compile-time magic) and exposing a couple of public consts, but quite apart from them being incomplete (as documented), it doesn’t even work because rustdoc doesn’t include the values of the consts in its output. (In theory perhaps the constants could be used by dependers as well, but I can’t imagine any genuine use case.) • It doesn’t do all it needs to: Windows reserved names are a TODO in the code. I contributed the essence of a patch, but no action has been taken after some weeks. (Actually, I started patching it further to make it more efficient, then decided I just wanted to start from scratch, and so I made sanitise-file-name.) • I have a problem with its entire philosphical approach, given the domain: as noted in the previous section, length limitations, not even acknowledging extensions are a thing and doing dangerous things in consequence, it’s just not the right approach for sanitising file names, an area where you should be caring about security. Ultimately, this is why I’m giving a rating of negative rather than neutral as I originally intended. My recommendation: avoid this crate. It’s not awful by any means, but it’s not well-written either, and my crate sanitise-file-name provides much the same functionality, but better. Its default options are quite similar to this crate’s, but more thorough. It doesn’t offer the strangne multi-line thing this crate does. ----- SIGN CREV PROOF ----- C30SLGF6ibx4QaLXo6uazHuIio1iP_rSmab7Vo6nLEeFZfuoaFTqiaMhPYflhvcGHEu6jImRBBIvmworq13bBg ----- END CREV PROOF ----- ----- BEGIN CREV PROOF ----- kind: package review version: -1 date: "2022-01-13T00:42:12.174512813+11:00" from: id-type: crev id: QE5OVlHZ4QyOcMqdjXhS1MgsoZHvUqxOHNZwyfpsDIU url: "https://git.chrismorgan.info/crev-proofs" package: source: "https://crates.io" name: cfg-if version: 1.0.0 revision: e60fa1efeab0ec6e90c50d93ec526e1410459c23 digest: YmRVto2mw7U7GJhrOe13k_5Sgg9TcQWK1NuLN2Qiz5E review: thoroughness: high understanding: high rating: positive comment: |- This is a robust crate, clear, simple and well-documented within and without, powered by a well-designed tt gobbler, and written by Alex Crichton of exemplar reputation. I’m quite fond of the ``$(if … { … })else* else { … }`` trick, using the keyword ``else`` as the repetition delimiter. I knew you could put a variety of things there, not just characters like comma or semicolon, but I hadn’t connected the dots that that would extended to keywords. Hey, you can even use *string literals*! Crazy stuff. Anyway, that’s a neat way of keeping the public API to two rules rather than three as I would probably have done. I found one surprise in this: ``cfg_if! { else { true } }`` works, yielding true, because that repetition I just showed used ``*`` instead of ``+``. It’s not covered by the test suite, but it’s harmless. confirms it’s unintended. Since the 1.0.0 release, there have been [a few commits ]. Most are minor trivialities (yeah, I’ll even call Dependabot that, though I hate it and it’s not even useful here because this crate has no dependencies), but is significant and very mildly concerning: as well as twiddling some formatting (for the good), it changes that ``*`` to ``+`` to deny bare else, which is a breaking change, for unlikely but reasonable and plausible code. I’ve commented on #23 and #39 to this effect; we’ll see what comes of it. (It could be reverted, it could be ignored, it could be released as 2.0.0.) There are a couple of extraneous files in the package (.gitignore, .github/workflows/main.yml), but they’re small enough that I won’t complain. I think this crate is overused and most places it’s used would be better served by writing the negations themselves, which are often no longer and often clearer. ----- SIGN CREV PROOF ----- ex_TABmVr_CTEY6B5C-NDKk7ciKdCK1bLf6ltjpXyoVknvyAC2AnRjA5rlPX_5ORQMeHmzJ2UxWUR0RdgBjYAQ ----- END CREV PROOF ----- ----- BEGIN CREV PROOF ----- kind: package review version: -1 date: "2022-01-14T02:22:52.039384447+11:00" from: id-type: crev id: QE5OVlHZ4QyOcMqdjXhS1MgsoZHvUqxOHNZwyfpsDIU url: "https://git.chrismorgan.info/crev-proofs" package: source: "https://crates.io" name: same-file version: 1.0.6 revision: 5799cd323b8eefd17a089c950dac113f66c89c9e digest: Yf7qyYw3W3WoHK4t74epRoJkZehfhWh-Hh-R0LfB8UQ review: thoroughness: high understanding: high rating: neutral comment: |- The good: ✓ Decently tested, including CI coverage of the declared MSRV (1.34.0). (As noted in src/lib.rs lines 485–491, edge case tests are difficult to synthesise; Andrew doesn’t know how to create those environmental conditions and I don’t really either. I suppose it’d entail reading file system specs and creating suitable file systems from scratch, and mounting them—not terribly difficult, but tedious work for limited gain.) ✓ The crate only supports its *functionality* on cfg(any(unix, windows)), but still provides a sane fallback on cfg(not(any(unix, windows))) so that all methods returns an explanatory I/O error. ✓ It’s unfortunate that false positives can arise, but given that it’s unavoidable, it is pleasingly well-document. ✓ Cargo.toml excludes CI files from the package (keep things small!) The neutral: • The I/O error returned on unknown platforms is of kind Other. I look forward to an MSRV bump to 1.53.0 so that Unsupported can be used. • The code has some cfg(target_os = "redox") that can be removed when the MSRV reaches Rust 1.38.0 (that’s when redox was moved to the unix family, in ). • Andrew uses his preferred Unlicense/MIT combo; [I have problems with Unlicense ]. • src/win.rs lines 17–32 acknowledge a straightforwardly-remediable gap that, if resolved, would by the sound of it eliminate false positives on at least the Windows Server + ReFS combo. This is something that someone that cared could contribute fairly easily, with access to Windows Server 2012 or later. (Microsoft’s documentation of FILE_ID_INFO *says* it’s Server-only, but given that ReFS is supported from Windows 8.1 onwards as well, I wonder if it might be *de facto* available elsewhere too.) • Handle::{stdin, stdout, stderr} bypass std::io::{Stdin, Stdout, Stderr} locking. I don’t much like this, but given that avoiding it would be very painful to the API, and that locking isn’t a memory-safety matter, already not being inviolable due to foreign code (though the std::io docs don’t seem to mention that), so I think this was the pragmatic choice. The bad: ✗ The file size comparison that is documented in two places as being there on Windows as a false positive avoidance heuristic is accidentally missing. This was reported and acknowledged as an error on 2021-05-19 in , but it still hasn’t been reinstated or the comments removed. ✗ Handle::{dev, ino} are Unix-specific methods; it would be better not to expose them on the type itself, but to move them to a platform-specific extension trait to make it more explicit, e.g. HandleUnixExt. ✗ std::os::unix::io::IntoRawFd implementation panics on Handle::{stdin, stdout, stderr}, due to careless Option handling (and the unwraps are accompanied by comments assuring you that it won’t panic 🙁). ManuallyDrop would be a better choice than Option here. Demonstration: ``std::os::unix::io::IntoRawFd::into_raw_fd(same_file::Handle::stdin().unwrap())``. Reported at https://github.com/BurntSushi/same-file/issues/55. I was going to rate it positive until I found this panic (the first two bads don’t warrant a downgrade). Fix just that, and I’ll update it to positive. ----- SIGN CREV PROOF ----- GlyWB-6rwNvY0qOppjx4AcRh1ta-btcpZ6OBaqPUg_Nr5WGCYWiNsfNWYENFHDLZpnPViTt7YaW78oPepAHhCA ----- END CREV PROOF -----