Skip to the content.

This document demonstrates the emoji removal feature in SlimJSON, which can significantly reduce token count for LLM contexts.

Why Remove Emoji?

Emoji and non-ASCII characters consume multiple tokens in most LLM tokenizers:

Character Tokens Example
ASCII letter 1 a = 1 token
Emoji 2-4 πŸ‘‹ = 2-4 tokens
Chinese character 2-3 δΈ­ = 2-3 tokens
Arabic character 2-3 ΨΉ = 2-3 tokens

For large JSON payloads with emoji, this can significantly increase API costs.

Basic Usage

CLI

# Simple emoji removal
echo '{"message":"Hello πŸ‘‹ World 🌍!"}' | slimjson -strip-emoji
# Output: {"message":"Hello  World !"}

# From file
slimjson -strip-emoji input.json > output.json

# With pretty printing
slimjson -strip-emoji -pretty data.json

Go Library

package main

import (
    "encoding/json"
    "fmt"
    "github.com/tradik/slimjson"
)

func main() {
    data := map[string]interface{}{
        "message": "Hello πŸ‘‹ World 🌍!",
        "status":  "βœ… Completed",
    }

    cfg := slimjson.Config{
        StripUTF8Emoji: true,
    }

    slimmer := slimjson.New(cfg)
    result := slimmer.Slim(data)

    output, _ := json.MarshalIndent(result, "", "  ")
    fmt.Println(string(output))
}

Config File

[llm-optimized]
strip-emoji=true
string-pooling=true
depth=4
list-len=15

Real-World Examples

Example 1: Social Media Post

Input:

{
  "user": "John Doe 😊",
  "post": "Just launched our new product! πŸš€πŸŽ‰",
  "reactions": "❀️ πŸ‘ πŸ”₯",
  "location": "San Francisco πŸŒ‰"
}

Command:

slimjson -strip-emoji -pretty input.json

Output:

{
  "location": "San Francisco ",
  "post": "Just launched our new product! ",
  "reactions": "  ",
  "user": "John Doe "
}

Token Savings:

Example 2: Product Catalog

Input:

{
  "products": [
    {
      "name": "Coffee β˜•",
      "description": "Premium coffee beans 🌱",
      "rating": "⭐⭐⭐⭐⭐",
      "price": "$19.99 πŸ’°"
    },
    {
      "name": "Tea 🍡",
      "description": "Organic green tea 🌿",
      "rating": "⭐⭐⭐⭐",
      "price": "$14.99 πŸ’΅"
    }
  ]
}

Command:

slimjson -strip-emoji -list-len 2 -pretty data.json

Output:

{
  "products": [
    {
      "description": "Premium coffee beans ",
      "name": "Coffee ",
      "price": "$19.99 ",
      "rating": ""
    },
    {
      "description": "Organic green tea ",
      "name": "Tea ",
      "price": "$14.99 ",
      "rating": ""
    }
  ]
}

Token Savings:

Example 3: Chat Messages

Input:

{
  "messages": [
    {
      "user": "Alice πŸ‘©β€πŸ’»",
      "text": "Hey! How are you? πŸ˜ƒ",
      "timestamp": "2024-01-15T10:30:00Z"
    },
    {
      "user": "Bob πŸ‘¨β€πŸ’Ό",
      "text": "Great! Working on the new feature πŸ’ͺ",
      "timestamp": "2024-01-15T10:31:00Z"
    }
  ]
}

Command:

slimjson -strip-emoji -timestamp-compression -pretty messages.json

Output:

{
  "messages": [
    {
      "text": "Hey! How are you? ",
      "timestamp": 1705315800,
      "user": "Alice "
    },
    {
      "text": "Great! Working on the new feature ",
      "timestamp": 1705315860,
      "user": "Bob "
    }
  ]
}

Example 4: Multilingual Content

Input:

{
  "title": "Welcome! 欒迎! Ω…Ψ±Ψ­Ψ¨Ψ§! 🌍",
  "description": "Global platform for everyone 🌐",
  "languages": ["English", "δΈ­ζ–‡", "Ψ§Ω„ΨΉΨ±Ψ¨ΩŠΨ©", "ζ—₯本θͺž"],
  "status": "βœ… Active"
}

Command:

slimjson -strip-emoji -pretty data.json

Output:

{
  "description": "Global platform for everyone ",
  "languages": [
    "English",
    "",
    "",
    ""
  ],
  "status": " Active",
  "title": "Welcome! ! ! "
}

Note: This removes ALL non-ASCII characters, including Chinese, Arabic, and Japanese characters. Use with caution if you need to preserve multilingual content.

Combined with Other Features

Maximum Token Reduction

slimjson \
  -strip-emoji \
  -string-pooling \
  -type-inference \
  -timestamp-compression \
  -enum-detection \
  -depth 3 \
  -list-len 10 \
  -decimal-places 2 \
  -pretty \
  data.json

LLM Context Optimization

# Use with ai-optimized profile
slimjson -profile ai-optimized -strip-emoji data.json

# Or create custom profile
cat > .slimjson << EOF
[llm-context]
strip-emoji=true
depth=4
list-len=15
string-pooling=true
type-inference=true
bool-compression=true
timestamp-compression=true
block=avatar_url,url,html_url
EOF

slimjson -profile llm-context data.json

HTTP API Usage

# Start daemon
slimjson -d -port 8080

# Use with profile
curl -X POST 'http://localhost:8080/slim?profile=llm-context' \
  -H "Content-Type: application/json" \
  -d '{
    "message": "Hello πŸ‘‹ World 🌍!",
    "status": "βœ… Completed"
  }'

# Response:
# {
#   "message": "Hello  World !",
#   "status": " Completed"
# }

Performance Impact

The emoji stripping operation is very efficient:

BenchmarkStripEmoji-8    5000000    250 ns/op    128 B/op    1 allocs/op

Best Practices

βœ… DO Use When:

❌ DON’T Use When:

Character Preservation

The feature preserves:

The feature removes:

Token Count Comparison

Example: GitHub API Response

Original (with emoji):

{
  "user": "octocat πŸ™",
  "bio": "GitHub's mascot 🎭",
  "location": "San Francisco πŸŒ‰",
  "status": "βœ… Available"
}

Tokens: ~28 tokens

After stripping:

{
  "bio": "GitHub's mascot ",
  "location": "San Francisco ",
  "status": " Available",
  "user": "octocat "
}

Tokens: ~18 tokens

Savings: 36% fewer tokens

Integration Examples

Python

import requests
import json

data = {
    "message": "Hello πŸ‘‹ World 🌍!",
    "status": "βœ… Completed"
}

response = requests.post(
    'http://localhost:8080/slim?profile=llm-context',
    json=data
)

cleaned = response.json()
print(json.dumps(cleaned, indent=2))

JavaScript

const data = {
  message: "Hello πŸ‘‹ World 🌍!",
  status: "βœ… Completed"
};

const response = await fetch('http://localhost:8080/slim?profile=llm-context', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify(data)
});

const cleaned = await response.json();
console.log(cleaned);

Go

package main

import (
    "bytes"
    "encoding/json"
    "fmt"
    "net/http"
)

func main() {
    data := map[string]interface{}{
        "message": "Hello πŸ‘‹ World 🌍!",
        "status":  "βœ… Completed",
    }

    jsonData, _ := json.Marshal(data)
    
    resp, _ := http.Post(
        "http://localhost:8080/slim?profile=llm-context",
        "application/json",
        bytes.NewBuffer(jsonData),
    )
    defer resp.Body.Close()

    var result map[string]interface{}
    json.NewDecoder(resp.Body).Decode(&result)
    
    fmt.Printf("%+v\n", result)
}

Troubleshooting

Issue: Too much whitespace after removal

Problem:

{"text": "Hello     World  "}

Solution: Combine with string trimming in post-processing or use additional text normalization.

Issue: Need to preserve some non-ASCII characters

Problem: Accented characters (Γ©, Γ±, ΓΌ) are also removed.

Solution: Currently, the feature removes ALL non-ASCII. If you need to preserve accented Latin characters, you can modify the stripEmoji function in slimjson.go to include range 128-255:

// Uncomment in stripEmoji function:
else if r >= 128 && r <= 255 {
    result.WriteRune(r)
}

See Also