Skip to content

📋 Core Functions

The main serialization and deserialization functions including the perfect JSON module replacement and traditional comprehensive APIs.

🔄 JSON Module Drop-in Replacement

Zero migration effort - use datason exactly like Python's json module with optional enhanced features.

JSON Compatibility API

# Perfect drop-in replacement for Python's json module
import datason.json as json

# Exact same behavior as stdlib json
data = json.loads('{"timestamp": "2024-01-01T00:00:00Z", "value": 42}')
# Returns: {'timestamp': '2024-01-01T00:00:00Z', 'value': 42}

output = json.dumps({"key": "value"}, indent=2, sort_keys=True)
# All json.dumps() parameters work exactly the same

Enhanced API with Smart Defaults

# Enhanced features with same simple API
import datason

# Smart datetime parsing automatically enabled
data = datason.loads('{"timestamp": "2024-01-01T00:00:00Z", "value": 42}')
# Returns: {'timestamp': datetime.datetime(2024, 1, 1, 0, 0, tzinfo=timezone.utc), 'value': 42}

# Enhanced serialization with dict output
result = datason.dumps({"timestamp": datetime.now(), "data": [1, 2, 3]})
# Returns: dict (not string) with smart type handling
Function Purpose Output Type Enhanced Features
datason.loads() JSON string parsing dict ✅ Smart datetime parsing
datason.dumps() Object serialization dict ✅ Enhanced type handling
datason.loads_json() JSON compatibility dict ❌ Exact stdlib behavior
datason.dumps_json() JSON string output str ❌ Exact stdlib behavior

🎯 Traditional API Overview

The traditional core functions provide comprehensive, configuration-based serialization with maximum control and flexibility.

Function Purpose Best For
serialize() Main serialization function Custom configurations
deserialize() Main deserialization function Structured data restoration
auto_deserialize() Automatic type detection Quick data exploration
safe_deserialize() Error-resilient deserialization Untrusted data sources

📦 Detailed Function Documentation

serialize()

The primary serialization function with full configuration support.

datason.serialize(obj: Any, config: Any = None, **kwargs: Any) -> Any

Serialize an object (DEPRECATED - use dump/dumps instead).

DEPRECATION WARNING: Direct use of serialize() is discouraged. Use the clearer API functions instead: - dump(obj, file) - write to file (like json.dump) - dumps(obj) - convert to string (like json.dumps) - serialize_enhanced(obj, **options) - enhanced serialization with clear options

Parameters:

Name Type Description Default
obj Any

Object to serialize

required
config Any

Optional configuration

None
**kwargs Any

Additional options

{}

Returns:

Type Description
Any

Serialized object

Source code in datason/__init__.py
def serialize(obj: Any, config: Any = None, **kwargs: Any) -> Any:
    """Serialize an object (DEPRECATED - use dump/dumps instead).

    DEPRECATION WARNING: Direct use of serialize() is discouraged.
    Use the clearer API functions instead:
    - dump(obj, file) - write to file (like json.dump)
    - dumps(obj) - convert to string (like json.dumps)
    - serialize_enhanced(obj, **options) - enhanced serialization with clear options

    Args:
        obj: Object to serialize
        config: Optional configuration
        **kwargs: Additional options

    Returns:
        Serialized object
    """
    import warnings

    warnings.warn(
        "serialize() is deprecated. Use dump/dumps for JSON compatibility or "
        "serialize_enhanced() for advanced features. Direct serialize() will be "
        "removed in a future version.",
        DeprecationWarning,
        stacklevel=2,
    )
    return _serialize_core(obj, config, **kwargs)

Configuration Example:

import datason as ds
from datetime import datetime
import pandas as pd

# Basic serialization
data = {"values": [1, 2, 3], "timestamp": datetime.now()}
result = ds.serialize(data)

# With custom configuration
config = ds.SerializationConfig(
    include_type_info=True,
    compress_arrays=True,
    date_format=ds.DateFormat.ISO_8601,
    nan_handling=ds.NanHandling.NULL
)

complex_data = {
    "dataframe": pd.DataFrame({"x": [1, 2, 3]}),
    "timestamp": datetime.now(),
    "metadata": {"version": 1.0}
}

result = ds.serialize(complex_data, config=config)

deserialize()

The primary deserialization function with configuration support.

datason.deserialize(obj: Any, parse_dates: bool = True, parse_uuids: bool = True) -> Any

Recursively deserialize JSON-compatible data back to Python objects.

Attempts to intelligently restore datetime objects, UUIDs, and other types that were serialized to strings by the serialize function.

Parameters:

Name Type Description Default
obj Any

The JSON-compatible object to deserialize

required
parse_dates bool

Whether to attempt parsing ISO datetime strings back to datetime objects

True
parse_uuids bool

Whether to attempt parsing UUID strings back to UUID objects

True

Returns:

Type Description
Any

Python object with restored types where possible

Examples:

>>> data = {"date": "2023-01-01T12:00:00", "id": "12345678-1234-5678-9012-123456789abc"}
>>> deserialize(data)
{"date": datetime(2023, 1, 1, 12, 0), "id": UUID('12345678-1234-5678-9012-123456789abc')}
Source code in datason/deserializers_new.py
def deserialize(obj: Any, parse_dates: bool = True, parse_uuids: bool = True) -> Any:
    """Recursively deserialize JSON-compatible data back to Python objects.

    Attempts to intelligently restore datetime objects, UUIDs, and other types
    that were serialized to strings by the serialize function.

    Args:
        obj: The JSON-compatible object to deserialize
        parse_dates: Whether to attempt parsing ISO datetime strings back to datetime objects
        parse_uuids: Whether to attempt parsing UUID strings back to UUID objects

    Returns:
        Python object with restored types where possible

    Examples:
        >>> data = {"date": "2023-01-01T12:00:00", "id": "12345678-1234-5678-9012-123456789abc"}
        >>> deserialize(data)
        {"date": datetime(2023, 1, 1, 12, 0), "id": UUID('12345678-1234-5678-9012-123456789abc')}
    """
    from ._profiling import stage

    # ==================================================================================
    # IDEMPOTENCY CHECKS: Prevent double deserialization
    # ==================================================================================

    with stage("eligibility_check"):
        # IDEMPOTENCY CHECK 1: Check if object is already in final deserialized form
        if _is_already_deserialized(obj):
            return obj

        if obj is None:
            return None

        # NEW: Handle type metadata for round-trip serialization
        if isinstance(obj, dict) and TYPE_METADATA_KEY in obj:
            return _deserialize_with_type_metadata(obj)

    # Handle basic types (already in correct format)
    with stage("smart_scalars"):
        if isinstance(obj, (int, float, bool)):
            return obj

        # Handle strings - attempt intelligent parsing
        if isinstance(obj, str):
            # Try to parse as UUID first (more specific pattern)
            if parse_uuids and _looks_like_uuid(obj):
                try:
                    import uuid as uuid_module  # Fresh import to avoid state issues

                    return uuid_module.UUID(obj)
                except (ValueError, ImportError):
                    # Log parsing failure but continue with string
                    warnings.warn(f"Failed to parse UUID string: {obj}", stacklevel=2)

            # Try to parse as datetime if enabled
            if parse_dates and _looks_like_datetime(obj):
                try:
                    import sys
                    from datetime import datetime as datetime_class  # Fresh import

                    # Handle 'Z' timezone suffix for Python < 3.11
                    date_str = obj.replace("Z", "+00:00") if obj.endswith("Z") and sys.version_info < (3, 11) else obj
                    return datetime_class.fromisoformat(date_str)
                except (ValueError, ImportError):
                    # Log parsing failure but continue with string
                    warnings.warn(
                        f"Failed to parse datetime string: {obj[:50]}{'...' if len(obj) > 50 else ''}",
                        stacklevel=2,
                    )

            # Return as string if no parsing succeeded
            return obj

    # Handle lists and dicts (recursive structures)
    with stage("postprocess"):
        if isinstance(obj, list):
            return [deserialize(item, parse_dates, parse_uuids) for item in obj]

        if isinstance(obj, dict):
            # PERFORMANCE OPTIMIZATION: Fast path for dictionaries with only JSON-basic values
            # This avoids calling deserialize on every simple value
            if len(obj) <= 10000 and all(
                isinstance(k, str) and isinstance(v, (str, int, float, bool, type(None))) for k, v in obj.items()
            ):
                # Check if any strings might need special parsing (UUID, datetime, etc.)
                needs_parsing = False
                if parse_dates or parse_uuids:  # Only check if parsing is enabled
                    for v in obj.values():
                        if isinstance(v, str) and (
                            len(v) > 8
                            and (parse_uuids and _looks_like_uuid(v))
                            or (parse_dates and _looks_like_datetime(v))
                        ):
                            needs_parsing = True
                            break

                if not needs_parsing:
                    # All values are simple JSON types that don't need deserialization
                    return obj

            return {k: deserialize(v, parse_dates, parse_uuids) for k, v in obj.items()}

    # For any other type, return as-is
    return obj

Deserialization Example:

# Basic deserialization
restored_data = ds.deserialize(serialized_result)

# With custom configuration for specific type handling
config = ds.SerializationConfig(
    strict_types=True,
    preserve_numpy_arrays=True,
    datetime_parsing=True
)

restored_data = ds.deserialize(serialized_result, config=config)
print(type(restored_data["dataframe"]))  # <class 'pandas.core.frame.DataFrame'>

auto_deserialize()

Automatic type detection and intelligent deserialization.

datason.auto_deserialize(obj: Any, aggressive: bool = False, config: Optional[SerializationConfig] = None) -> Any

NEW: Intelligent auto-detection deserialization with heuristics.

Uses pattern recognition and heuristics to automatically detect and restore complex data types without explicit configuration.

Parameters:

Name Type Description Default
obj Any

JSON-compatible object to deserialize

required
aggressive bool

Whether to use aggressive type detection (may have false positives)

False
config Optional[SerializationConfig]

Configuration object to control deserialization behavior

None

Returns:

Type Description
Any

Python object with auto-detected types restored

Examples:

>>> data = {"records": [{"a": 1, "b": 2}, {"a": 3, "b": 4}]}
>>> auto_deserialize(data, aggressive=True)
{"records": DataFrame(...)}  # May detect as DataFrame
>>> # API-compatible UUID handling
>>> from datason.config import get_api_config
>>> auto_deserialize("12345678-1234-5678-9012-123456789abc", config=get_api_config())
"12345678-1234-5678-9012-123456789abc"  # Stays as string
Source code in datason/deserializers_new.py
def auto_deserialize(obj: Any, aggressive: bool = False, config: Optional["SerializationConfig"] = None) -> Any:
    """NEW: Intelligent auto-detection deserialization with heuristics.

    Uses pattern recognition and heuristics to automatically detect and restore
    complex data types without explicit configuration.

    Args:
        obj: JSON-compatible object to deserialize
        aggressive: Whether to use aggressive type detection (may have false positives)
        config: Configuration object to control deserialization behavior

    Returns:
        Python object with auto-detected types restored

    Examples:
        >>> data = {"records": [{"a": 1, "b": 2}, {"a": 3, "b": 4}]}
        >>> auto_deserialize(data, aggressive=True)
        {"records": DataFrame(...)}  # May detect as DataFrame

        >>> # API-compatible UUID handling
        >>> from datason.config import get_api_config
        >>> auto_deserialize("12345678-1234-5678-9012-123456789abc", config=get_api_config())
        "12345678-1234-5678-9012-123456789abc"  # Stays as string
    """
    # ==================================================================================
    # IDEMPOTENCY CHECKS: Prevent double deserialization
    # ==================================================================================

    # IDEMPOTENCY CHECK 1: Check if object is already in final deserialized form
    if _is_already_deserialized(obj):
        return obj

    if obj is None:
        return None

    # Get default config if none provided
    if config is None and _config_available:
        config = get_default_config()

    # Handle type metadata first
    if isinstance(obj, dict) and TYPE_METADATA_KEY in obj:
        return _deserialize_with_type_metadata(obj)

    # Handle basic types
    if isinstance(obj, (int, float, bool)):
        return obj

    # Handle strings with auto-detection
    if isinstance(obj, str):
        return _auto_detect_string_type(obj, aggressive, config)

    # Handle lists with auto-detection
    if isinstance(obj, list):
        deserialized_list = [auto_deserialize(item, aggressive, config) for item in obj]

        if aggressive and pd is not None and _looks_like_series_data(deserialized_list):
            # Try to detect if this should be a pandas Series or DataFrame
            try:
                return pd.Series(deserialized_list)
            except Exception:  # nosec B110
                pass

        return deserialized_list

    # Handle dictionaries with auto-detection
    if isinstance(obj, dict):
        # Check for pandas DataFrame patterns first
        if aggressive and pd is not None and _looks_like_dataframe_dict(obj):
            try:
                return _reconstruct_dataframe(obj)
            except Exception:  # nosec B110
                pass

        # Check for pandas split format
        if pd is not None and _looks_like_split_format(obj):
            try:
                return _reconstruct_from_split(obj)
            except Exception:  # nosec B110
                pass

        # PERFORMANCE OPTIMIZATION: Fast path for dictionaries with only JSON-basic values
        # This avoids calling auto_deserialize on every simple value
        if len(obj) <= 10000 and all(
            isinstance(k, str) and isinstance(v, (str, int, float, bool, type(None))) for k, v in obj.items()
        ):
            # Check if any strings might need special parsing (UUID, datetime, etc.)
            needs_parsing = False
            for v in obj.values():
                if isinstance(v, str) and (
                    (len(v) > 8 and (_looks_like_uuid(v) or _looks_like_datetime(v)))
                    or (aggressive and _looks_like_number(v))
                ):
                    needs_parsing = True
                    break

            if not needs_parsing:
                # All values are simple JSON types that don't need deserialization
                return obj

        # Standard dictionary deserialization
        return {k: auto_deserialize(v, aggressive, config) for k, v in obj.items()}

    return obj

Auto-Detection Example:

# Automatically detect and restore types from JSON
json_data = '{"timestamp": "2024-01-01T12:00:00", "values": [1, 2, 3]}'

# Intelligent type detection
auto_restored = ds.auto_deserialize(json_data)
print(type(auto_restored["timestamp"]))  # <class 'datetime.datetime'>

# Works with complex nested structures
complex_json = ds.serialize({
    "df": pd.DataFrame({"x": [1, 2, 3]}),
    "date": datetime.now(),
    "array": np.array([1, 2, 3])
})

auto_complex = ds.auto_deserialize(complex_json)

safe_deserialize()

Error-resilient deserialization for untrusted or malformed data.

datason.safe_deserialize(json_str: str, allow_pickle: bool = False, **kwargs: Any) -> Any

Safely deserialize a JSON string, handling parse errors gracefully.

Parameters:

Name Type Description Default
json_str str

JSON string to parse and deserialize

required
allow_pickle bool

Whether to allow deserialization of pickle-serialized objects

False
**kwargs Any

Arguments passed to deserialize()

{}

Returns:

Type Description
Any

Deserialized Python object, or the original string if parsing fails

Raises:

Type Description
DeserializationSecurityError

If pickle data is detected and allow_pickle=False

Source code in datason/deserializers_new.py
def safe_deserialize(json_str: str, allow_pickle: bool = False, **kwargs: Any) -> Any:
    """Safely deserialize a JSON string, handling parse errors gracefully.

    Args:
        json_str: JSON string to parse and deserialize
        allow_pickle: Whether to allow deserialization of pickle-serialized objects
        **kwargs: Arguments passed to deserialize()

    Returns:
        Deserialized Python object, or the original string if parsing fails

    Raises:
        DeserializationSecurityError: If pickle data is detected and allow_pickle=False
    """
    try:
        # First check for pickle data in the raw JSON string before processing
        if not allow_pickle:
            # Parse with stdlib json first to check for pickle data
            import json as stdlib_json

            raw_parsed = stdlib_json.loads(json_str)
            if _contains_pickle_data(raw_parsed):
                raise DeserializationSecurityError(
                    "Detected pickle-serialized objects which are unsafe to deserialize. "
                    "Set allow_pickle=True to override this security check."
                )

        # Parse JSON using DataSON's loads_json
        parsed = loads_json(json_str)

        return deserialize(parsed, **kwargs)
    except DeserializationSecurityError:
        # Re-raise security errors - these should not be caught
        raise
    except (ValueError, TypeError):  # Standard Python errors
        return json_str  # Return original string on error
    except Exception:  # Catch any JSON parsing errors including DataSON's JSONDecodeError
        return json_str  # Return original string on error

Safe Processing Example:

# Handle potentially malformed data
untrusted_data = '{"timestamp": "invalid-date", "values": [1, "bad", 3]}'

try:
    # Regular deserialization might fail
    result = ds.deserialize(untrusted_data)
except Exception as e:
    # Safe deserialization provides fallbacks
    safe_result = ds.safe_deserialize(untrusted_data)
    print("Safely processed:", safe_result)

# With custom error handling
safe_result = ds.safe_deserialize(
    untrusted_data,
    fallback_values={"timestamp": None, "values": []},
    skip_invalid=True
)

🔧 Configuration System Integration

The core functions work seamlessly with datason's configuration system:

Preset Configurations

# Use predefined configurations for common scenarios
ml_config = ds.get_ml_config()
ml_result = ds.serialize(ml_data, config=ml_config)

api_config = ds.get_api_config()
api_result = ds.serialize(api_data, config=api_config)

strict_config = ds.get_strict_config()
strict_result = ds.serialize(data, config=strict_config)

performance_config = ds.get_performance_config()
fast_result = ds.serialize(data, config=performance_config)

Custom Configuration

# Build custom configurations
custom_config = ds.SerializationConfig(
    # Type handling
    include_type_info=True,
    strict_types=False,
    preserve_numpy_arrays=True,

    # Performance
    compress_arrays=True,
    optimize_memory=True,

    # Data handling
    date_format=ds.DateFormat.TIMESTAMP,
    nan_handling=ds.NanHandling.STRING,
    dataframe_orient=ds.DataFrameOrient.RECORDS,

    # Security
    redact_patterns=["ssn", "password"],
    max_depth=100
)

result = ds.serialize(data, config=custom_config)

🔄 Error Handling Patterns

Graceful Degradation

def robust_serialize(data):
    """Serialize with multiple fallback strategies."""
    try:
        # Try with full configuration
        return ds.serialize(data, config=ds.get_ml_config())
    except MemoryError:
        # Fall back to chunked processing
        return ds.serialize_chunked(data)
    except SecurityError:
        # Fall back to safe mode
        safe_config = ds.SerializationConfig(secure_mode=True)
        return ds.serialize(data, config=safe_config)
    except Exception:
        # Last resort: safe deserialization
        return ds.safe_deserialize(data)

Validation and Recovery

def validate_and_deserialize(serialized_data):
    """Validate data before deserialization."""
    try:
        # First attempt: auto deserialization
        result = ds.auto_deserialize(serialized_data)
        return result
    except ValueError:
        # Second attempt: safe deserialization
        return ds.safe_deserialize(serialized_data)

📊 Performance Considerations

Function Performance Characteristics

Function Speed Reliability Features
serialize() ⚡⚡ 🛡️🛡️🛡️ ⭐⭐⭐
deserialize() ⚡⚡ 🛡️🛡️🛡️ ⭐⭐⭐
auto_deserialize() 🛡️🛡️ ⭐⭐
safe_deserialize() 🛡️🛡️🛡️🛡️

Optimization Tips

# Reuse configurations for better performance
config = ds.get_ml_config()
for batch in data_batches:
    result = ds.serialize(batch, config=config)

# Use appropriate function for your needs
if data_is_trusted:
    result = ds.deserialize(data)  # Fastest
else:
    result = ds.safe_deserialize(data)  # Most reliable