1 ----- BEGIN CREV PROOF -----
4 date: "2022-01-08T02:03:54.105679370+11:00"
7 id: QE5OVlHZ4QyOcMqdjXhS1MgsoZHvUqxOHNZwyfpsDIU
8 url: "https://git.chrismorgan.info/crev-proofs"
10 source: "https://crates.io"
13 revision: 67417456c81ccc241cb0e7257d6ca3a955e1d29e
14 digest: mM9owAVRQvXct8MS3WF2Fw8WZPgSv6lpM1WdjxWwwyQ
20 I wrote this carefully. It’s well-documented, well-tested, and robust against
21 panicking except as documented (Verhoeff::calculate_verhoeff_check_digit or
22 verhoeff::calculate, on invalid input). Actually, I realised while writing
23 this review that VerhoeffMut::push_verhoeff_check_digit is also panicky
24 because it calculates a check digit, but I don’t think that warrants reducing
25 the rating from strong to positive, so here we are. (I’ve pushed an
26 appropriate change, but I don’t think it warrants even a 1.0.1 release.)
27 ----- SIGN CREV PROOF -----
28 LJ2SYWIPHSNC49lmO1Bd4hGDAvT8lm-I4heOGrtdZPIRcDoOw2e9P4YxtKT_72r1Qwm3UfnpkCJqtM1UBkfkCw
29 ----- END CREV PROOF -----
31 ----- BEGIN CREV PROOF -----
34 date: "2022-01-08T02:06:31.277120223+11:00"
37 id: QE5OVlHZ4QyOcMqdjXhS1MgsoZHvUqxOHNZwyfpsDIU
38 url: "https://git.chrismorgan.info/crev-proofs"
40 source: "https://crates.io"
41 name: sanitise-file-name
43 revision: 2f75e28d1f5d097ae5931fd05dd1b4fe39ec8c08
44 digest: egqrezU_0kaH9V4t1GOLIZe9W2aKoGc-isplQ-9l3d0
50 I wrote this crate, with great care. (Sometimes it needed it!)
52 It has unfortunately ended up with rather a lot of code, but a fair bit of
53 that is related to Doing Things Right. The calculating of *exactly* how many
54 bytes may be needed is painfully involved (again a part of Doing Things
55 Right, especially as regards extension cleverness), but I’ve reviewed it
56 multiple times on different days and each time convinced myself that it is in
59 I believe that there are only two ways of causing this to panic, both easily
60 avoided by following instructions, and otherwise harmless. Firstly, giving
61 `sanitise_to` a `tinyvec_string::ArrayString` that doesn’t have enough space
62 remaining. That’s definite careless user error: `max_alloc_size_const` was
63 right there and clearly identified in the docs with explanation about the
64 whys and wherefores. Secondly, a slight variant of that: implementing
65 `Stringy` on some other type in a panicky fashion.
67 (Writing this review led me to add a mention of the possible panic to
68 sanitise_to’s docs, but I don’t think it warrants a 1.0.1 release.)
69 ----- SIGN CREV PROOF -----
70 ymZLNHdzGamyaPo5JyBuSNUY398pyzpjsFMsup1knXtscSZumHP3ZgzNC8KPO7VCQ-23XnQxapenIX27x_DdDw
71 ----- END CREV PROOF -----
73 ----- BEGIN CREV PROOF -----
76 date: "2022-01-08T02:07:39.869637389+11:00"
79 id: QE5OVlHZ4QyOcMqdjXhS1MgsoZHvUqxOHNZwyfpsDIU
80 url: "https://git.chrismorgan.info/crev-proofs"
82 source: "https://crates.io"
83 name: human-string-filler
85 revision: a02907505ed1b65bb08d6a77c3b9676ab6c07d02
86 digest: ZfyKwrp12UPlfOiQPJ9qhbKYeoXy5Il9oJgL_ePwUlA
92 I wrote this crate, carefully. It’s robust, well documented, well tested, and
93 entirely devoid of panics.
94 ----- SIGN CREV PROOF -----
95 uGwLNFZ_p6EV3AXkJYjJle9N4joGx7LkwEMkdONuNWlolx0UWuz9GVM9IqQrBbmdMorpDYkYSPW1wDxJJBuaAA
96 ----- END CREV PROOF -----
98 ----- BEGIN CREV PROOF -----
101 date: "2022-01-11T20:07:22.213361263+11:00"
104 id: QE5OVlHZ4QyOcMqdjXhS1MgsoZHvUqxOHNZwyfpsDIU
105 url: "https://git.chrismorgan.info/crev-proofs"
107 source: "https://crates.io"
108 name: sanitize-filename
110 revision: ece3260eb1f7b959125e5ca8f25781b31e875934
111 digest: WpQbem5BHnp6FnMIwNny-oY5xqc_7W0x4anVu2N-gnw
117 - source: "https://crates.io"
118 name: sanitise-file-name
120 This crate is straightforward and simple in a fairly good way, but it’s not
121 as efficient as it could be, has a couple of small but significant bugs,
122 has some defaults and behaviours that I reckon aren’t ideal.
126 • The code is simple. The actual sanitisation code is 27 lines
127 (not including the regular expression definitions).
129 • If nothing needs to be changed about the name (which is probably the most
130 common case), exactly one precisely-sized allocation is made. (This is
131 definitely a positive side-effect of using Regex::replace_all rather than
132 the obvious standard library alternative, str::replace, which would have
133 led to a minimum of three or five allocations.)
135 The not-so-good (that is, things that aren’t ideal but aren’t real problems):
137 • Unnecessary dependencies: regex and lazy_static are used, but I reckon
138 they’re not warranted. A character iterator and some push(char) or
139 push_str(replacement), plus some trim_end_matches() at the end in lieu of
140 WINDOWS_TRAILING_RE, would do the same job probably faster—though I will
141 admit that such a naive implementation would be prone to allocate slightly
142 more than this does when no replacements are made (though starting with
143 `String::with_capacity(name.len())` would nigh completely counter this),
144 and a smarter implementation that deferred allocation would be a bit longer
145 and not quite so “obviously correct”.
147 Mitigating this claim of unnecessary dependencies and heaviness, regex’s
148 default features have been turned off, which is good.
150 • It strips more than is necessary: U+0080–U+009F are not reserved by any
151 file system or operating system I know of, but this crate is stripping
152 them; and the names COM0 and LPT0 are not reserved by Windows, but this
153 crate is stripping them.
155 • Poor style on Regex and RegexBuilder: ILLEGAL_RE includes colon twice;
156 and WINDOWS_RESERVED_RE sets case insensitivity in two ways, `(?i)` and
157 `RegexBuilder::case_insensitive(true)` (so it could have just used
160 • It unnecessarily enables the unicode-case feature (which is heavy) on regex:
161 In its favour it did disable the default features, but unicode-case isn’t
162 needed, because WINDOWS_RESERVED_RE only matches ASCII. It would be better
163 to change (?i) to (?i-u) and disable the feature again.
165 • The defaults depend on your platform: the default value for
166 Options::windows is true on Windows and false elsewhere; varying things
167 like this is risky, and I think a poor choice in this particular case,
168 because you could be writing to a device subject to Windows’ limitations
169 (e.g. a local NTFS partition, a FAT32 or exFAT USB disk, a SMB network
170 share), and the difference in what Windows requires is so slight (assuming
171 the excessive stripping mentioned in the next point).
173 • With Options::windows false, it strips much more than is necessary:
174 ILLEGAL_RE is still applied, and it corresponds to a mix of NTFS/FAT32 and
175 Windows limitations; other platforms pretty much only need to worry about ␀
178 • Truncation operates on the entire name, not the base name: this is a
179 flaw found in almost all libraries like this; although it’s not
180 intrinsically a security vulnerability, it contributes to the critical
181 security vulnerability where you thought you checked the extension of a
182 file was OK, but then you sanitised it and thte extension changed. But
183 quite apart from the security aspect, file extensions changing unexpectedly
184 is a usability hazard.
186 • Documentation-wise, it’s kind of a black box: the README is fair, but
187 doesn’t explain in detail any of what’s done, and there are no doc comments
188 on anything. When it turns "aux.h" into "", you’ll say, “wait, *what*!?” if
189 you don’t know the deal.
191 The bad (that is, it’ll cause actual problems):
193 • It misses a character that Microsoft says isn’t allowed: U+007F is supposed
194 to be stripped, but isn’t.
196 • WINDOWS_TRAILING_RE is wrong: it should be stripping trailing spaces and
197 dots (`[. ]+$`), but is instead a duplicate of RESERVED_RE (`^\.+$`) and so
202 • Options::truncate shouldn’t be a bool with the number 255 hard-coded,
203 because it’s common to want to append an extension afterwards (especially
204 if you know about the truncation hazard!).
206 • I think "" is a mildly poor default for Options::replacement; I think "_"
207 would be a better choice, so that “1 < 0? A treatise”
208 becomes “1 _ 0_ A treatise” rather than “1 0 A treatise”.
210 • The treatment of reserved names is poor: they’re entirely replaced with the
211 replacement string, so by default "aux.h" will become "" (eh, I suppose
212 that’s not too awful), and with an underscore "aux.h" will become "_" (uh
213 oh, it lost its extension). I think a better choice is to alter the base
214 name; I’ve seen a double underscore prefix and an underscore prefix, and I
215 chose an underscore suffix in my crate.
217 • There should also be an option to remove leading dots, which hide files on
218 most platforms, and are fairly problematic under Windows.
220 • “Sanitise” looks so much better than “sanitize”. 😛
222 My recommendation: use my sanitise-file-name crate instead of this.
223 My crate’s code is certainly longer-winded because of its performance focus,
224 but it’s more thorough in its processing, with better filtering and better
225 defaults, quite apart from the (unbenchmarked) performance thing.
227 sanitise-file-name doesn’t cover *precisely* the same functionality as this
228 crate, but the differences are generally minor and fairly clearly described.
229 Here are the option translations:
231 sanitize_filename::Options.truncate == true (the default)
232 sanitise_file_name::Options.length_limit == 255 (the default)
234 sanitize_filename::Options.truncate == false
235 sanitise_file_name::Options.length_limit == usize::MAX
237 sanitize_filename::Options.windows (default cfg!(windows))
238 sanitise_file_name::Options.windows_safe (default true)
240 sanitize_filename::Options.replacement == "" (the default)
241 sanitise_file_name::Options.replace_with == None (NOT the default)
243 sanitize_filename::Options.replacement == "_" (any one character)
244 sanitise_file_name::Options.replace_with == Some('_') (the default)
246 sanitize_filename::Options.replacement == "string" (longer than one character)
247 (not supported in sanitise-file-name)
249 I haven’t factored the sanitize-filename binary into my reckoning here.
250 Some may find it useful, but I don’t think it will often be useful as a
251 standalone binary. And the binary’s usage string is found in the README, but
252 is not emitted by the binary itself, which is odd. Kinda fits in with not
253 documenting anything inside the code files!
254 ----- SIGN CREV PROOF -----
255 16uMNqMzp_nrGOi5JyJro5meslEjUV7coVZ5qxyobQ6So-H8DrkbjBSOntYC-1fP2VthjzzdfvvB__i6KakhCw
256 ----- END CREV PROOF -----
258 ----- BEGIN CREV PROOF -----
261 date: "2022-01-12T05:37:47.367354687+11:00"
264 id: QE5OVlHZ4QyOcMqdjXhS1MgsoZHvUqxOHNZwyfpsDIU
265 url: "https://git.chrismorgan.info/crev-proofs"
267 source: "https://crates.io"
268 name: sanitize-filename-reader-friendly
270 revision: 245eca36c588cb70f907e2167ecee93b0fd54d33
271 digest: f6Pot3fE2Kp4y5hrg6gx34D4YFt6oJGexGqMNwT2yko
277 - source: "https://crates.io"
278 name: sanitise-file-name
280 This crate does approximately what it’s supposed to, but it’s overengineered
281 in a bad way (complicated for no good reason), confusingly and insufficiently
282 documented, contains some perplexing functionality that I can’t think of any
283 good reason for the existence of, is not as efficient as it could be, and
284 misses a couple of fairly significant things in a file name sanitiser that
285 mean that what it returns might still not work.
287 I recommend my own crate sanitise-file-name instead, which is unfortunately
288 at least as complex, but for the good reasons of runtime efficiency and it
289 actually caring about extensions and correctness, and far better documented
294 • It does what it says on the tin. The sanitisation it performs *is* fairly
295 reader-friendly (though its rules are more convoluted than I think is
300 • There’s no name length limitation: this is actually a little subjective,
301 because implemented and used carelessly it leads to file extension change
302 attacks (which in the right environment could allow remote code execution),
303 but it’s generally expected.
305 • It strips or replaces some characters it probably shouldn’t:
307 • For example, U+200B is branded a non-printing character and removed. For
308 starters, there’s no such concept in Unicode; the closest is control
309 characters, and U+200B ZERO-WIDTH SPACE isn’t one of those. Removing it
310 serves no security purpose (unlike most of the Bidi_Control characters),
311 and will mangle some Unicode text where ZWSP is significant. You could
312 argue over it being a fingerprinting vector, but in that case why aren’t
313 ZWNJ and ZWJ filtered out, and after that how about confusables, &c.
314 So I say there’s just no good reason to remove U+200B.
316 • It replaces various characters that no file system objects to, such as ~
317 (replaced with _) and #, %, {, }, ^ and ` (replaced with space).
318 Most of those can be genuinely useful characters in file names.
320 • Some of the rules are fairly arbitrary, and some excessive: the decisions
321 about what gets turned into a space, what gets turned into an underscore,
322 and what gets removed are idiosyncratic with no explanation, and a couple
323 of them seem a little odd. Not harmful in any way, just… why those
324 particular rules? FILTER_ORIG_AFTER_LAST_PROCESSED_WAS_WHITESPACE is
325 perhaps the most bizarre and troubling: drop a few characters, including
326 `.`, the extension separator, if they come after a space. Remember my
327 earlier point about length limitation potentially opening a security
328 vulnerability? Well, this is a similar sort of issue, though unlikely to be
329 exploitable, that you can change the extension significantly:
330 `foo.exe .jpg` becomes `foo.exe jpg`.
332 I think it’s telling, philosophically, that the library completely ignores
333 extensions. It’s not worrying about security. (In the tests, only two or
334 three test cases have extensions, and those are ones copied from elsewhere.)
338 • It’s inefficient: an absolute minimum of five allocations, and in practice
339 it’ll tend to be a few more. (It wouldn’t be *terribly* difficult to make
340 it perform exactly one allocation in all cases, like I did. A couple of the
341 allocations are almost trivially removable.) Also it uses str.find(char) to
342 check if characters can be removed, on strings as long as 10 chars/30 code
343 units, which is much less efficient than a match block and probably less
344 efficient even than using [char], while also writing boolean conditions in
345 a way that doesn’t take advantage of short-circuiting
346 (`STRING.find(c_orig).is_some() && last_processed_chr == '&'`, which would
347 be better flipped to do the cheap char comparison first).
349 • It’s confusing with some surprising scope: it’s not sanitising simple
350 names, but rather some kind of thing where you can put parts on multiple
351 lines and it’ll sanitise them individually and join them up and I can’t
352 think of any serious reason why you’d actually want this—the potential
353 benefits are negligible, and it makes it much more confusing.
355 • Its code style is inconsistent: useful comments occasionally, superfluous
356 comments sometimes but not other times, erroneous comments sometimes, a doc
357 comment with a spurious asterisk in the middle.
359 • It overuses constants and its documentation is poorly prepared: it’s
360 written in an idiosyncratic private const-heavy style that makes reading
361 the code hard and doesn’t translate to rustdoc at all, so you have to read
362 the code to understand what’s actually going on, and the code’s overly
363 complicated, so it takes quite some time to figure out what’s going on,
364 constantly having to jump between the code and the consts as well.
365 Actually, it looks like the author *tried* to make it rustdoc-friendly by
366 pulling in const_format::concatcp! (clever compile-time magic) and exposing
367 a couple of public consts, but quite apart from them being incomplete (as
368 documented), it doesn’t even work because rustdoc doesn’t include the
369 values of the consts in its output. (In theory perhaps the constants could
370 be used by dependers as well, but I can’t imagine any genuine use case.)
372 • It doesn’t do all it needs to: Windows reserved names are a TODO in the
373 code. I contributed the essence of a patch, but no action has been taken
374 after some weeks. (Actually, I started patching it further to make it more
375 efficient, then decided I just wanted to start from scratch, and so I made
378 • I have a problem with its entire philosphical approach, given the domain:
379 as noted in the previous section, length limitations, not even
380 acknowledging extensions are a thing and doing dangerous things in
381 consequence, it’s just not the right approach for sanitising file names, an
382 area where you should be caring about security. Ultimately, this is why I’m
383 giving a rating of negative rather than neutral as I originally intended.
385 My recommendation: avoid this crate. It’s not awful by any means, but it’s
386 not well-written either, and my crate sanitise-file-name provides much the
387 same functionality, but better. Its default options are quite similar to this
388 crate’s, but more thorough. It doesn’t offer the strangne multi-line thing
390 ----- SIGN CREV PROOF -----
391 C30SLGF6ibx4QaLXo6uazHuIio1iP_rSmab7Vo6nLEeFZfuoaFTqiaMhPYflhvcGHEu6jImRBBIvmworq13bBg
392 ----- END CREV PROOF -----
394 ----- BEGIN CREV PROOF -----
397 date: "2022-01-13T00:42:12.174512813+11:00"
400 id: QE5OVlHZ4QyOcMqdjXhS1MgsoZHvUqxOHNZwyfpsDIU
401 url: "https://git.chrismorgan.info/crev-proofs"
403 source: "https://crates.io"
406 revision: e60fa1efeab0ec6e90c50d93ec526e1410459c23
407 digest: YmRVto2mw7U7GJhrOe13k_5Sgg9TcQWK1NuLN2Qiz5E
413 This is a robust crate, clear, simple and well-documented within and without,
414 powered by a well-designed tt gobbler, and written by Alex Crichton of
417 I’m quite fond of the ``$(if … { … })else* else { … }`` trick, using the
418 keyword ``else`` as the repetition delimiter. I knew you could put a variety
419 of things there, not just characters like comma or semicolon, but I hadn’t
420 connected the dots that that would extended to keywords. Hey, you can even
421 use *string literals*! Crazy stuff. Anyway, that’s a neat way of keeping the
422 public API to two rules rather than three as I would probably have done.
424 I found one surprise in this: ``cfg_if! { else { true } }`` works, yielding
425 true, because that repetition I just showed used ``*`` instead of ``+``.
426 It’s not covered by the test suite, but it’s harmless.
427 <https://github.com/alexcrichton/cfg-if/issues/23> confirms it’s unintended.
429 Since the 1.0.0 release, there have been [a few commits
430 <https://github.com/alexcrichton/cfg-if/commits/main>]. Most are minor
431 trivialities (yeah, I’ll even call Dependabot that, though I hate it and it’s
432 not even useful here because this crate has no dependencies), but
433 <https://github.com/alexcrichton/cfg-if/commit/349913689695130c2c63d1075e57b0917c98ee6e>
434 is significant and very mildly concerning: as well as twiddling some
435 formatting (for the good), it changes that ``*`` to ``+`` to deny bare else,
436 which is a breaking change, for unlikely but reasonable and plausible code.
437 I’ve commented on #23 and #39 to this effect; we’ll see what comes of it.
438 (It could be reverted, it could be ignored, it could be released as 2.0.0.)
440 There are a couple of extraneous files in the package (.gitignore,
441 .github/workflows/main.yml), but they’re small enough that I won’t complain.
443 I think this crate is overused and most places it’s used would be better
444 served by writing the negations themselves, which are often no longer and
446 ----- SIGN CREV PROOF -----
447 ex_TABmVr_CTEY6B5C-NDKk7ciKdCK1bLf6ltjpXyoVknvyAC2AnRjA5rlPX_5ORQMeHmzJ2UxWUR0RdgBjYAQ
448 ----- END CREV PROOF -----