[mirror] HTML entity data for Zig https://github.com/kivikakk/htmlentities.zig
Go to file
Amelia Cuss 8558d57327 README: fix link. 2024-04-28 11:19:48 +03:00
.github/workflows flake.nix: use zig 0.12.0 from zig-overlay. 2024-04-22 13:58:42 +03:00
src zig 0.12 adjustments. 2024-04-22 11:24:31 +03:00
.gitattributes add .gitattributes 2020-08-18 19:30:54 +10:00
.gitignore zig master 2021-12-06 11:19:02 +11:00
LICENSE add MIT license 2020-08-18 18:39:18 +10:00
README.md README: fix link. 2024-04-28 11:19:48 +03:00
build.zig zig 0.12 adjustments. 2024-04-22 11:24:31 +03:00
entities.json initial commit 2020-08-18 18:25:06 +10:00
flake.lock flake.lock: fix ~_~. 2024-04-22 14:01:09 +03:00
flake.nix flake.nix: use zig 0.12.0 from zig-overlay. 2024-04-22 13:58:42 +03:00

htmlentities.zig

Build status

The bundled entities.json is sourced from https://www.w3.org/TR/html5/entities.json.

Modelled on Philip Jackson's entities crate for Rust.

Overview

The core datatypes are:

pub const Entity = struct {
    entity: []u8,
    codepoints: Codepoints,
    characters: []u8,
};

pub const Codepoints = union(enum) {
    Single: u32,
    Double: [2]u32,
};

The list of entities is directly exposed, as well as a binary search function:

pub const ENTITIES: [_]Entity
pub fn lookup(entity: []const u8) ?Entity

Usage

build.zig:

    exe.addPackagePath("htmlentities", "vendor/htmlentities.zig/src/main.zig");

main.zig:

const std = @import("std");
const htmlentities = @import("htmlentities");

pub fn main() !void {
    var eacute = htmlentities.lookup("é").?;
    std.debug.print("eacute: {}\n", .{eacute});
}

Output:

eacute: Entity{ .entity = é, .codepoints = Codepoints{ .Single = 233 }, .characters = é }

Help wanted

Ideally we'd do the JSON parsing and struct creation at comptime. The std JSON tokeniser uses ~80GB of RAM and millions of backtracks to handle the whole entities.json at comptime, so it's not gonna happen yet. Maybe once we get a comptime allocator we can use the regular parser.

As it is, we do codegen. Ideally we'd piece together an AST and render that instead of just writing Zig directly -- I did try it with a 'template' input string (see some broken wip at 63b9393), but it's hard to do since std.zig.render expects all tokens, including string literal, to be available in the originally parsed source. At the moment we parse our generated source and format it so we can at least validate it syntactically in the build step.