Declare the tinyvec_string feature semver-excluded
[sanitise-file-name] / tests / misc.realistic-length_limit-reduction.sanitised
1 These are miscellaneous tests of my own division.
2 _
3 hello_world
4 The quick brown fox jumps over the lazy doggerel.txt
5 well _ then.how about this
6 Once upon a time there was a file name sanitiser; it was a good file name sanitiser, and never exposed security vulnerabilities to the World. “It’s a dangerous place,” its gr.“If a wolf should come out of the forest, then what would you do_”
7 (Some Peter and the Wolf snuck in there.)
8 .hidden
9 C__WINDOWS_system32_driver_etc_hosts
10 %WINDIR%_system32_driver_etc_hosts
11 Kinda funny how Windows has a _etc_hosts.
12 _
13 _
14 _
15 _
16 _
17 _
18 I’m basically just typing random stuff here.
19 OK, time for some more serious stuff.
20 _
21 For Unicode paths, some file systems limit paths to roughly 255 UTF-8 code units, others to roughly 255 UTF-16 code units. UTF-8 is the tighter of these restrictions in all circumstances_ UTF-16 uses one code unit until U.Now then_ one-byte characters
22 # One-byte characters
23 12345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901
24 1234567890123456.abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz
25 12345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234.abcdefghijklmnopqrstuvwxyz
26 # Two-byte characters
27 áɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđé
28 áɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđé
29 áɓçđé.°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³
30 áɓçđéƒɠɦï.°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³°¹²³
31 áɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķáɓçđéƒɠɦïķá.°¹²³
32 # Three-byte characters
33 ‐‑‒–—―‖‗‘’‚‛“”„‟†‡•‣․‥…‧‐‑‒–—―‖‗‘’‚‛“”„‟†‡•‣․‥…‧‐‑‒–—―‖‗‘’‚‛“”„‟†‡•‣․‥…‧‐‑‒–—―‖‗‘’‚
34 ‐‑‒–.₁₂₃₄₅₆₇₈₉₀₁₂₃₄₅₆₇₈₉₀₁₂₃₄₅₆₇₈₉₀₁₂₃₄₅₆₇₈₉₀₁₂₃₄₅₆₇₈₉₀₁₂₃₄₅₆₇₈₉₀₁₂₃₄₅₆₇₈₉₀₁₂₃₄₅₆₇₈₉
35 ‐‑‒–—―‖‗‘’‚‛“”„‟†‡•‣․‥…‧‐‑‒–—―‖‗‘’‚‛“”„‟†‡•‣․‥…‧‐‑‒–—―‖‗‘’‚‛“”„‟†‡•‣․‥…‧‐.₁₂₃₄₅₆₇₈₉₀
36 # Four-byte characters
37 𐀀𐀁𐀂𐀃𐀄𐀅𐀆𐀇𐀈𐀉𐀊𐀋𐀍𐀎𐀏𐀐𐀑𐀒𐀓𐀔𐀕𐀖𐀗𐀘𐀙𐀚𐀛𐀜𐀝𐀞𐀟𐀠𐀡𐀢𐀣𐀤𐀥𐀦𐀨𐀩𐀪𐀫𐀬𐀭𐀮𐀯𐀰𐀱𐀲𐀳𐀴𐀵𐀶𐀷𐀸𐀹𐀺𐀼𐀽𐀿𐁀𐁁
38 𐀀𐀁𐀂𐀃𐀄𐀅.𐂀𐂁𐂂𐂃𐂄𐂅𐂆𐂇𐂈𐂉𐂊𐂋𐂌𐂍𐂀𐂁𐂂𐂃𐂄𐂅𐂆𐂇𐂈𐂉𐂊𐂋𐂌𐂍𐂀𐂁𐂂𐂃𐂄𐂅𐂆𐂇𐂈𐂉𐂊𐂋𐂌𐂍𐂀𐂁𐂂𐂃𐂄𐂅𐂆𐂇𐂈𐂉𐂊𐂋𐂌𐂍
39 𐀀𐀁𐀂𐀃𐀄𐀅𐀆𐀇𐀈𐀉𐀊𐀋𐀍𐀎𐀏𐀐𐀑𐀒𐀓𐀔𐀕𐀖𐀗𐀘𐀙𐀚𐀛𐀜𐀝𐀞𐀟𐀠𐀡𐀢𐀣𐀤𐀥𐀦𐀨𐀩𐀪𐀫𐀬𐀭𐀮𐀯𐀰𐀱.𐂀𐂁𐂂𐂃𐂄𐂅𐂆𐂇𐂈𐂉𐂊𐂋𐂌𐂍
40 _
41 abcdef.ghij
42 abcde.fghij
43 AUX_.abcdef
44 lpT7_.abcdef
45 cOm6_.abcdef
46 CON_
47 aux_.h
48 Lpt1_.exe
49 xyz
50 nül
51 COM1.jpg.png
52 _
53 Some sanitisers try stripping out ZWSP (​), which can be used as a fingerprinting vector and has no particularly legitimate purpose in a file name; I’m not, because removing it doesn’t solve the fingerprinting risk, as you can use ZWNJ and ZWJ.)