1 ----- BEGIN CREV PROOF -----
4 date: "2022-01-08T02:03:54.105679370+11:00"
7 id: QE5OVlHZ4QyOcMqdjXhS1MgsoZHvUqxOHNZwyfpsDIU
8 url: "https://git.chrismorgan.info/crev-proofs"
10 source: "https://crates.io"
13 revision: 67417456c81ccc241cb0e7257d6ca3a955e1d29e
14 digest: mM9owAVRQvXct8MS3WF2Fw8WZPgSv6lpM1WdjxWwwyQ
20 I wrote this carefully. It’s well-documented, well-tested, and robust against
21 panicking except as documented (Verhoeff::calculate_verhoeff_check_digit or
22 verhoeff::calculate, on invalid input). Actually, I realised while writing
23 this review that VerhoeffMut::push_verhoeff_check_digit is also panicky
24 because it calculates a check digit, but I don’t think that warrants reducing
25 the rating from strong to positive, so here we are. (I’ve pushed an
26 appropriate change, but I don’t think it warrants even a 1.0.1 release.)
27 ----- SIGN CREV PROOF -----
28 LJ2SYWIPHSNC49lmO1Bd4hGDAvT8lm-I4heOGrtdZPIRcDoOw2e9P4YxtKT_72r1Qwm3UfnpkCJqtM1UBkfkCw
29 ----- END CREV PROOF -----
31 ----- BEGIN CREV PROOF -----
34 date: "2022-01-08T02:06:31.277120223+11:00"
37 id: QE5OVlHZ4QyOcMqdjXhS1MgsoZHvUqxOHNZwyfpsDIU
38 url: "https://git.chrismorgan.info/crev-proofs"
40 source: "https://crates.io"
41 name: sanitise-file-name
43 revision: 2f75e28d1f5d097ae5931fd05dd1b4fe39ec8c08
44 digest: egqrezU_0kaH9V4t1GOLIZe9W2aKoGc-isplQ-9l3d0
50 I wrote this crate, with great care. (Sometimes it needed it!)
52 It has unfortunately ended up with rather a lot of code, but a fair bit of
53 that is related to Doing Things Right. The calculating of *exactly* how many
54 bytes may be needed is painfully involved (again a part of Doing Things
55 Right, especially as regards extension cleverness), but I’ve reviewed it
56 multiple times on different days and each time convinced myself that it is in
59 I believe that there are only two ways of causing this to panic, both easily
60 avoided by following instructions, and otherwise harmless. Firstly, giving
61 `sanitise_to` a `tinyvec_string::ArrayString` that doesn’t have enough space
62 remaining. That’s definite careless user error: `max_alloc_size_const` was
63 right there and clearly identified in the docs with explanation about the
64 whys and wherefores. Secondly, a slight variant of that: implementing
65 `Stringy` on some other type in a panicky fashion.
67 (Writing this review led me to add a mention of the possible panic to
68 sanitise_to’s docs, but I don’t think it warrants a 1.0.1 release.)
69 ----- SIGN CREV PROOF -----
70 ymZLNHdzGamyaPo5JyBuSNUY398pyzpjsFMsup1knXtscSZumHP3ZgzNC8KPO7VCQ-23XnQxapenIX27x_DdDw
71 ----- END CREV PROOF -----
73 ----- BEGIN CREV PROOF -----
76 date: "2022-01-08T02:07:39.869637389+11:00"
79 id: QE5OVlHZ4QyOcMqdjXhS1MgsoZHvUqxOHNZwyfpsDIU
80 url: "https://git.chrismorgan.info/crev-proofs"
82 source: "https://crates.io"
83 name: human-string-filler
85 revision: a02907505ed1b65bb08d6a77c3b9676ab6c07d02
86 digest: ZfyKwrp12UPlfOiQPJ9qhbKYeoXy5Il9oJgL_ePwUlA
92 I wrote this crate, carefully. It’s robust, well documented, well tested, and
93 entirely devoid of panics.
94 ----- SIGN CREV PROOF -----
95 uGwLNFZ_p6EV3AXkJYjJle9N4joGx7LkwEMkdONuNWlolx0UWuz9GVM9IqQrBbmdMorpDYkYSPW1wDxJJBuaAA
96 ----- END CREV PROOF -----
98 ----- BEGIN CREV PROOF -----
101 date: "2022-01-11T20:07:22.213361263+11:00"
104 id: QE5OVlHZ4QyOcMqdjXhS1MgsoZHvUqxOHNZwyfpsDIU
105 url: "https://git.chrismorgan.info/crev-proofs"
107 source: "https://crates.io"
108 name: sanitize-filename
110 revision: ece3260eb1f7b959125e5ca8f25781b31e875934
111 digest: WpQbem5BHnp6FnMIwNny-oY5xqc_7W0x4anVu2N-gnw
117 - source: "https://crates.io"
118 name: sanitise-file-name
120 This crate is straightforward and simple in a fairly good way, but it’s not
121 as efficient as it could be, has a couple of small but significant bugs,
122 has some defaults and behaviours that I reckon aren’t ideal.
126 • The code is simple. The actual sanitisation code is 27 lines
127 (not including the regular expression definitions).
129 • If nothing needs to be changed about the name (which is probably the most
130 common case), exactly one precisely-sized allocation is made. (This is
131 definitely a positive side-effect of using Regex::replace_all rather than
132 the obvious standard library alternative, str::replace, which would have
133 led to a minimum of three or five allocations.)
135 The not-so-good (that is, things that aren’t ideal but aren’t real problems):
137 • Unnecessary dependencies: regex and lazy_static are used, but I reckon
138 they’re not warranted. A character iterator and some push(char) or
139 push_str(replacement), plus some trim_end_matches() at the end in lieu of
140 WINDOWS_TRAILING_RE, would do the same job probably faster—though I will
141 admit that such a naive implementation would be prone to allocate slightly
142 more than this does when no replacements are made (though starting with
143 `String::with_capacity(name.len())` would nigh completely counter this),
144 and a smarter implementation that deferred allocation would be a bit longer
145 and not quite so “obviously correct”.
147 Mitigating this claim of unnecessary dependencies and heaviness, regex’s
148 default features have been turned off, which is good.
150 • It strips more than is necessary: U+0080–U+009F are not reserved by any
151 file system or operating system I know of, but this crate is stripping
152 them; and the names COM0 and LPT0 are not reserved by Windows, but this
153 crate is stripping them.
155 • Poor style on Regex and RegexBuilder: ILLEGAL_RE includes colon twice;
156 and WINDOWS_RESERVED_RE sets case insensitivity in two ways, `(?i)` and
157 `RegexBuilder::case_insensitive(true)` (so it could have just used
160 • It unnecessarily enables the unicode-case feature (which is heavy) on regex:
161 In its favour it did disable the default features, but unicode-case isn’t
162 needed, because WINDOWS_RESERVED_RE only matches ASCII. It would be better
163 to change (?i) to (?i-u) and disable the feature again.
165 • The defaults depend on your platform: the default value for
166 Options::windows is true on Windows and false elsewhere; varying things
167 like this is risky, and I think a poor choice in this particular case,
168 because you could be writing to a device subject to Windows’ limitations
169 (e.g. a local NTFS partition, a FAT32 or exFAT USB disk, a SMB network
170 share), and the difference in what Windows requires is so slight (assuming
171 the excessive stripping mentioned in the next point).
173 • With Options::windows false, it strips much more than is necessary:
174 ILLEGAL_RE is still applied, and it corresponds to a mix of NTFS/FAT32 and
175 Windows limitations; other platforms pretty much only need to worry about ␀
178 • Truncation operates on the entire name, not the base name: this is a
179 flaw found in almost all libraries like this; although it’s not
180 intrinsically a security vulnerability, it contributes to the critical
181 security vulnerability where you thought you checked the extension of a
182 file was OK, but then you sanitised it and thte extension changed. But
183 quite apart from the security aspect, file extensions changing unexpectedly
184 is a usability hazard.
186 • Documentation-wise, it’s kind of a black box: the README is fair, but
187 doesn’t explain in detail any of what’s done, and there are no doc comments
188 on anything. When it turns "aux.h" into "", you’ll say, “wait, *what*!?” if
189 you don’t know the deal.
191 The bad (that is, it’ll cause actual problems):
193 • It misses a character that Microsoft says isn’t allowed: U+007F is supposed
194 to be stripped, but isn’t.
196 • WINDOWS_TRAILING_RE is wrong: it should be stripping trailing spaces and
197 dots (`[. ]+$`), but is instead a duplicate of RESERVED_RE (`^\.+$`) and so
202 • Options::truncate shouldn’t be a bool with the number 255 hard-coded,
203 because it’s common to want to append an extension afterwards (especially
204 if you know about the truncation hazard!).
206 • I think "" is a mildly poor default for Options::replacement; I think "_"
207 would be a better choice, so that “1 < 0? A treatise”
208 becomes “1 _ 0_ A treatise” rather than “1 0 A treatise”.
210 • The treatment of reserved names is poor: they’re entirely replaced with the
211 replacement string, so by default "aux.h" will become "" (eh, I suppose
212 that’s not too awful), and with an underscore "aux.h" will become "_" (uh
213 oh, it lost its extension). I think a better choice is to alter the base
214 name; I’ve seen a double underscore prefix and an underscore prefix, and I
215 chose an underscore suffix in my crate.
217 • There should also be an option to remove leading dots, which hide files on
218 most platforms, and are fairly problematic under Windows.
220 • “Sanitise” looks so much better than “sanitize”. 😛
222 My recommendation: use my sanitise-file-name crate instead of this.
223 My crate’s code is certainly longer-winded because of its performance focus,
224 but it’s more thorough in its processing, with better filtering and better
225 defaults, quite apart from the (unbenchmarked) performance thing.
227 sanitise-file-name doesn’t cover *precisely* the same functionality as this
228 crate, but the differences are generally minor and fairly clearly described.
229 Here are the option translations:
231 sanitize_filename::Options.truncate == true (the default)
232 sanitise_file_name::Options.length_limit == 255 (the default)
234 sanitize_filename::Options.truncate == false
235 sanitise_file_name::Options.length_limit == usize::MAX
237 sanitize_filename::Options.windows (default cfg!(windows))
238 sanitise_file_name::Options.windows_safe (default true)
240 sanitize_filename::Options.replacement == "" (the default)
241 sanitise_file_name::Options.replace_with == None (NOT the default)
243 sanitize_filename::Options.replacement == "_" (any one character)
244 sanitise_file_name::Options.replace_with == Some('_') (the default)
246 sanitize_filename::Options.replacement == "string" (longer than one character)
247 (not supported in sanitise-file-name)
249 I haven’t factored the sanitize-filename binary into my reckoning here.
250 Some may find it useful, but I don’t think it will often be useful as a
251 standalone binary. And the binary’s usage string is found in the README, but
252 is not emitted by the binary itself, which is odd. Kinda fits in with not
253 documenting anything inside the code files!
254 ----- SIGN CREV PROOF -----
255 16uMNqMzp_nrGOi5JyJro5meslEjUV7coVZ5qxyobQ6So-H8DrkbjBSOntYC-1fP2VthjzzdfvvB__i6KakhCw
256 ----- END CREV PROOF -----