1 These are miscellaneous tests of my own division.
4 The quick brown fox jumps over the lazy doggerel.txt
5 well _ then _ . __ how about this _
6 Once upon a time there was a file name sanitiser; it was a good file name sanitiser, and never exposed security vulnerabilities to the World. “It’s a dangerous place,” its grand. “If a wolf should come out of the forest, then what would you do_”
7 (Some Peter and the Wolf snuck in there.)
9 C__WINDOWS_system32_driver_etc_hosts
10 %WINDIR%_system32_driver_etc_hosts
11 Kinda funny how Windows has a _etc_hosts.
18 I’m basically just typing random stuff here.
19 OK, time for some more serious stuff.
21 For Unicode paths, some file systems limit paths to roughly 255 UTF-8 code units, others to roughly 255 UTF-16 code units. UTF-8 is the tighter of these restrictions in all circumstances_ UTF-16 uses one code unit until U+F. Now then_ one-byte characters_
22 # One-byte characters_
23 123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345
24 12345678901234567890.abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz
25 123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678.abcdefghijklmnopqrstuvwxyz
26 # Two-byte characters_
27 áɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠ
28 áɓç.°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³
29 áɓçđéƒɠ.°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³
30 áɓçđéƒɠɦïķá.°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³
31 áɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓç.°¹²³
32 # Three-byte characters_
33 ‐‑‒–—―‖‗‘’‚‛“”„‟†‡•‣․‥…‧‐‑‒–—―‖‗‘’‚‛“”„‟†‡•‣․‥…‧‐‑‒–—―‖‗‘’‚‛“”„‟†‡•‣․‥…‧‐‑‒–—―‖‗‘’‚‛“
34 ‐‑‒–—.₁₂₃₄₅₆₇₈₉₀₁₂₃₄₅₆₇₈₉₀₁₂₃₄₅₆₇₈₉₀₁₂₃₄₅₆₇₈₉₀₁₂₃₄₅₆₇₈₉₀₁₂₃₄₅₆₇₈₉₀₁₂₃₄₅₆₇₈₉₀₁₂₃₄₅₆₇₈₉
35 ‐‑‒–—―‖‗‘’‚‛“”„‟†‡•‣․‥…‧‐‑‒–—―‖‗‘’‚‛“”„‟†‡•‣․‥…‧‐‑‒–—―‖‗‘’‚‛“”„‟†‡•‣․‥…‧‐‑.₁₂₃₄₅₆₇₈₉₀
36 # Four-byte characters_
37 𐀀𐀁𐀂𐀃𐀄𐀅𐀆𐀇𐀈𐀉𐀊𐀋𐀍𐀎𐀏𐀐𐀑𐀒𐀓𐀔𐀕𐀖𐀗𐀘𐀙𐀚𐀛𐀜𐀝𐀞𐀟𐀠𐀡𐀢𐀣𐀤𐀥𐀦𐀨𐀩𐀪𐀫𐀬𐀭𐀮𐀯𐀰𐀱𐀲𐀳𐀴𐀵𐀶𐀷𐀸𐀹𐀺𐀼𐀽𐀿𐁀𐁁𐁂
38 𐀀𐀁𐀂𐀃𐀄𐀅𐀆.𐂀𐂁𐂂𐂃𐂄𐂅𐂆𐂇𐂈𐂉𐂊𐂋𐂌𐂍𐂀𐂁𐂂𐂃𐂄𐂅𐂆𐂇𐂈𐂉𐂊𐂋𐂌𐂍𐂀𐂁𐂂𐂃𐂄𐂅𐂆𐂇𐂈𐂉𐂊𐂋𐂌𐂍𐂀𐂁𐂂𐂃𐂄𐂅𐂆𐂇𐂈𐂉𐂊𐂋𐂌𐂍
39 𐀀𐀁𐀂𐀃𐀄𐀅𐀆𐀇𐀈𐀉𐀊𐀋𐀍𐀎𐀏𐀐𐀑𐀒𐀓𐀔𐀕𐀖𐀗𐀘𐀙𐀚𐀛𐀜𐀝𐀞𐀟𐀠𐀡𐀢𐀣𐀤𐀥𐀦𐀨𐀩𐀪𐀫𐀬𐀭𐀮𐀯𐀰𐀱𐀲.𐂀𐂁𐂂𐂃𐂄𐂅𐂆𐂇𐂈𐂉𐂊𐂋𐂌𐂍
53 Some sanitisers try stripping out ZWSP (), which can be used as a fingerprinting vector and has no particularly legitimate purpose in a file name; I’m not, because removing it doesn’t solve the fingerprinting risk, as you can use ZWNJ and ZWJ (.)