1.0.0
[sanitise-file-name] / tests / misc.no-extension_cleverness.sanitised
1 These are miscellaneous tests of my own division
2 _
3 hello_world
4 The quick brown fox jumps over the lazy doggerel.txt
5 well _ then . how about this
6 Once upon a time there was a file name sanitiser; it was a good file name sanitiser, and never exposed security vulnerabilities to the World. “It’s a dangerous place,” its grandfather said. “If a wolf should come out of the forest, then what would
7 (Some Peter and the Wolf snuck in there.)
8 hidden
9 C__WINDOWS_system32_driver_etc_hosts
10 %WINDIR%_system32_driver_etc_hosts
11 Kinda funny how Windows has a _etc_hosts
12 _
13 _
14 _
15 _
16 _
17 _
18 I’m basically just typing random stuff here
19 OK, time for some more serious stuff
20 _
21 For Unicode paths, some file systems limit paths to roughly 255 UTF-8 code units, others to roughly 255 UTF-16 code units. UTF-8 is the tighter of these restrictions in all circumstances_ UTF-16 uses one code unit until U+FFFF, and two beyond that, while
22 # One-byte characters
23 123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345
24 123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345
25 123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345
26 # Two-byte characters
27 áɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠ
28 áɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠ
29 áɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠ
30 áɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠ
31 áɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠ
32 # Three-byte characters
33 ‐‑‒–—―‖‗‘’‚‛“”„‟†‡•‣․‥…‧‐‑‒–—―‖‗‘’‚‛“”„‟†‡•‣․‥…‧‐‑‒–—―‖‗‘’‚‛“”„‟†‡•‣․‥…‧‐‑‒–—―‖‗‘’‚‛“
34 ‐‑‒–—―‖‗‘’‚‛“”„‟†‡•‣․‥…‧‐‑‒–—―‖‗‘’‚‛“”„‟†‡•‣․‥…‧‐‑‒–—―‖‗‘’‚‛“”„‟†‡•‣․‥…‧‐‑‒–—―‖‗‘’‚‛“
35 ‐‑‒–—―‖‗‘’‚‛“”„‟†‡•‣․‥…‧‐‑‒–—―‖‗‘’‚‛“”„‟†‡•‣․‥…‧‐‑‒–—―‖‗‘’‚‛“”„‟†‡•‣․‥…‧‐‑‒–—―‖‗‘’‚‛“
36 # Four-byte characters
37 𐀀𐀁𐀂𐀃𐀄𐀅𐀆𐀇𐀈𐀉𐀊𐀋𐀍𐀎𐀏𐀐𐀑𐀒𐀓𐀔𐀕𐀖𐀗𐀘𐀙𐀚𐀛𐀜𐀝𐀞𐀟𐀠𐀡𐀢𐀣𐀤𐀥𐀦𐀨𐀩𐀪𐀫𐀬𐀭𐀮𐀯𐀰𐀱𐀲𐀳𐀴𐀵𐀶𐀷𐀸𐀹𐀺𐀼𐀽𐀿𐁀𐁁𐁂
38 𐀀𐀁𐀂𐀃𐀄𐀅𐀆𐀇𐀈𐀉𐀊𐀋𐀍𐀎𐀏𐀐𐀑𐀒𐀓𐀔𐀕𐀖𐀗𐀘𐀙𐀚𐀛𐀜𐀝𐀞𐀟𐀠𐀡𐀢𐀣𐀤𐀥𐀦𐀨𐀩𐀪𐀫𐀬𐀭𐀮𐀯𐀰𐀱𐀲𐀳𐀴𐀵𐀶𐀷𐀸𐀹𐀺𐀼𐀽𐀿𐁀𐁁𐁂
39 𐀀𐀁𐀂𐀃𐀄𐀅𐀆𐀇𐀈𐀉𐀊𐀋𐀍𐀎𐀏𐀐𐀑𐀒𐀓𐀔𐀕𐀖𐀗𐀘𐀙𐀚𐀛𐀜𐀝𐀞𐀟𐀠𐀡𐀢𐀣𐀤𐀥𐀦𐀨𐀩𐀪𐀫𐀬𐀭𐀮𐀯𐀰𐀱𐀲𐀳𐀴𐀵𐀶𐀷𐀸𐀹𐀺𐀼𐀽𐀿𐁀𐁁𐁂
40 _
41 abcdef.ghij
42 abcde.fghij
43 AUX.abcdef
44 lpT7.abcdef
45 cOm6.abcdef
46 CON_
47 aux.h
48 Lpt1.exe
49 xyz
50 nül
51 COM1.jpg.png
52 _
53 Some sanitisers try stripping out ZWSP (​), which can be used as a fingerprinting vector and has no particularly legitimate purpose in a file name; I’m not, because removing it doesn’t solve the fingerprinting risk, as you can use ZWNJ and ZWJ (‌