From: Chris Morgan Date: Wed, 12 Jan 2022 05:00:08 +0000 (+1100) Subject: Add review for sanitize-filename-reader-friendly v2.2.1 X-Git-Url: https://git.chrismorgan.info/crev-proofs/commitdiff_plain/97a29736275f23864bee3d838225c2db29305889 Add review for sanitize-filename-reader-friendly v2.2.1 --- diff --git a/QE5OVlHZ4QyOcMqdjXhS1MgsoZHvUqxOHNZwyfpsDIU/reviews/2022-01-package-jqCnIw.proof.crev b/QE5OVlHZ4QyOcMqdjXhS1MgsoZHvUqxOHNZwyfpsDIU/reviews/2022-01-package-jqCnIw.proof.crev index af932fd..e71be1a 100644 --- a/QE5OVlHZ4QyOcMqdjXhS1MgsoZHvUqxOHNZwyfpsDIU/reviews/2022-01-package-jqCnIw.proof.crev +++ b/QE5OVlHZ4QyOcMqdjXhS1MgsoZHvUqxOHNZwyfpsDIU/reviews/2022-01-package-jqCnIw.proof.crev @@ -255,3 +255,139 @@ comment: |- 16uMNqMzp_nrGOi5JyJro5meslEjUV7coVZ5qxyobQ6So-H8DrkbjBSOntYC-1fP2VthjzzdfvvB__i6KakhCw ----- END CREV PROOF ----- +----- BEGIN CREV PROOF ----- +kind: package review +version: -1 +date: "2022-01-12T05:37:47.367354687+11:00" +from: + id-type: crev + id: QE5OVlHZ4QyOcMqdjXhS1MgsoZHvUqxOHNZwyfpsDIU + url: "https://git.chrismorgan.info/crev-proofs" +package: + source: "https://crates.io" + name: sanitize-filename-reader-friendly + version: 2.2.1 + revision: 245eca36c588cb70f907e2167ecee93b0fd54d33 + digest: f6Pot3fE2Kp4y5hrg6gx34D4YFt6oJGexGqMNwT2yko +review: + thoroughness: high + understanding: high + rating: negative +alternatives: + - source: "https://crates.io" + name: sanitise-file-name +comment: |- + This crate does approximately what it’s supposed to, but it’s overengineered + in a bad way (complicated for no good reason), confusingly and insufficiently + documented, contains some perplexing functionality that I can’t think of any + good reason for the existence of, is not as efficient as it could be, and + misses a couple of fairly significant things in a file name sanitiser that + mean that what it returns might still not work. + + I recommend my own crate sanitise-file-name instead, which is unfortunately + at least as complex, but for the good reasons of runtime efficiency and it + actually caring about extensions and correctness, and far better documented + and explained. + + The good: + + • It does what it says on the tin. The sanitisation it performs *is* fairly + reader-friendly (though its rules are more convoluted than I think is + warranted). + + The not-so-good: + + • There’s no name length limitation: this is actually a little subjective, + because implemented and used carelessly it leads to file extension change + attacks (which in the right environment could allow remote code execution), + but it’s generally expected. + + • It strips or replaces some characters it probably shouldn’t: + + • For example, U+200B is branded a non-printing character and removed. For + starters, there’s no such concept in Unicode; the closest is control + characters, and U+200B ZERO-WIDTH SPACE isn’t one of those. Removing it + serves no security purpose (unlike most of the Bidi_Control characters), + and will mangle some Unicode text where ZWSP is significant. You could + argue over it being a fingerprinting vector, but in that case why aren’t + ZWNJ and ZWJ filtered out, and after that how about confusables, &c. + So I say there’s just no good reason to remove U+200B. + + • It replaces various characters that no file system objects to, such as ~ + (replaced with _) and #, %, {, }, ^ and ` (replaced with space). + Most of those can be genuinely useful characters in file names. + + • Some of the rules are fairly arbitrary, and some excessive: the decisions + about what gets turned into a space, what gets turned into an underscore, + and what gets removed are idiosyncratic with no explanation, and a couple + of them seem a little odd. Not harmful in any way, just… why those + particular rules? FILTER_ORIG_AFTER_LAST_PROCESSED_WAS_WHITESPACE is + perhaps the most bizarre and troubling: drop a few characters, including + `.`, the extension separator, if they come after a space. Remember my + earlier point about length limitation potentially opening a security + vulnerability? Well, this is a similar sort of issue, though unlikely to be + exploitable, that you can change the extension significantly: + `foo.exe .jpg` becomes `foo.exe jpg`. + + I think it’s telling, philosophically, that the library completely ignores + extensions. It’s not worrying about security. (In the tests, only two or + three test cases have extensions, and those are ones copied from elsewhere.) + + The bad: + + • It’s inefficient: an absolute minimum of five allocations, and in practice + it’ll tend to be a few more. (It wouldn’t be *terribly* difficult to make + it perform exactly one allocation in all cases, like I did. A couple of the + allocations are almost trivially removable.) Also it uses str.find(char) to + check if characters can be removed, on strings as long as 10 chars/30 code + units, which is much less efficient than a match block and probably less + efficient even than using [char], while also writing boolean conditions in + a way that doesn’t take advantage of short-circuiting + (`STRING.find(c_orig).is_some() && last_processed_chr == '&'`, which would + be better flipped to do the cheap char comparison first). + + • It’s confusing with some surprising scope: it’s not sanitising simple + names, but rather some kind of thing where you can put parts on multiple + lines and it’ll sanitise them individually and join them up and I can’t + think of any serious reason why you’d actually want this—the potential + benefits are negligible, and it makes it much more confusing. + + • Its code style is inconsistent: useful comments occasionally, superfluous + comments sometimes but not other times, erroneous comments sometimes, a doc + comment with a spurious asterisk in the middle. + + • It overuses constants and its documentation is poorly prepared: it’s + written in an idiosyncratic private const-heavy style that makes reading + the code hard and doesn’t translate to rustdoc at all, so you have to read + the code to understand what’s actually going on, and the code’s overly + complicated, so it takes quite some time to figure out what’s going on, + constantly having to jump between the code and the consts as well. + Actually, it looks like the author *tried* to make it rustdoc-friendly by + pulling in const_format::concatcp! (clever compile-time magic) and exposing + a couple of public consts, but quite apart from them being incomplete (as + documented), it doesn’t even work because rustdoc doesn’t include the + values of the consts in its output. (In theory perhaps the constants could + be used by dependers as well, but I can’t imagine any genuine use case.) + + • It doesn’t do all it needs to: Windows reserved names are a TODO in the + code. I contributed the essence of a patch, but no action has been taken + after some weeks. (Actually, I started patching it further to make it more + efficient, then decided I just wanted to start from scratch, and so I made + sanitise-file-name.) + + • I have a problem with its entire philosphical approach, given the domain: + as noted in the previous section, length limitations, not even + acknowledging extensions are a thing and doing dangerous things in + consequence, it’s just not the right approach for sanitising file names, an + area where you should be caring about security. Ultimately, this is why I’m + giving a rating of negative rather than neutral as I originally intended. + + My recommendation: avoid this crate. It’s not awful by any means, but it’s + not well-written either, and my crate sanitise-file-name provides much the + same functionality, but better. Its default options are quite similar to this + crate’s, but more thorough. It doesn’t offer the strangne multi-line thing + this crate does. +----- SIGN CREV PROOF ----- +C30SLGF6ibx4QaLXo6uazHuIio1iP_rSmab7Vo6nLEeFZfuoaFTqiaMhPYflhvcGHEu6jImRBBIvmworq13bBg +----- END CREV PROOF ----- +