Skip to content

Service Layer

The PackageStatsService class is the primary entry point for programmatic use of pkgdb. It provides a clean abstraction over database and API operations.

PackageStatsService

PackageStatsService

High-level service for managing package statistics.

Provides a clean abstraction over database and API operations, making it easier to test, mock, and extend.

__init__

__init__(db_path: str)

Initialize the service with a database path.

Parameters:

Name Type Description Default
db_path str

Path to the SQLite database file.

required

add_package

add_package(name: str, verify: bool = True) -> bool

Add a package to tracking.

Parameters:

Name Type Description Default
name str

Package name to add.

required
verify bool

If True, verify package exists on PyPI before adding. Network errors are logged as warnings but don't block addition.

True

Returns:

Type Description
bool

True if package was added, False if it already exists.

Raises:

Type Description
ValueError

If package name is invalid or package not found on PyPI (when verify=True).

remove_package

remove_package(name: str) -> bool

Remove a package from tracking.

Parameters:

Name Type Description Default
name str

Package name to remove.

required

Returns:

Type Description
bool

True if package was removed, False if it didn't exist.

list_packages

list_packages() -> list[PackageInfo]

Get list of tracked packages with their added dates.

Returns:

Type Description
list[PackageInfo]

List of PackageInfo objects.

import_packages

import_packages(
    file_path: str, verify: bool = True
) -> tuple[int, int, list[str], list[str]]

Import packages from a file.

Parameters:

Name Type Description Default
file_path str

Path to file (JSON or plain text).

required
verify bool

If True, verify each package exists on PyPI before adding.

True

Returns:

Type Description
tuple[int, int, list[str], list[str]]

Tuple of (added_count, skipped_count, invalid_names, not_found_names).

Raises:

Type Description
FileNotFoundError

If file doesn't exist.

sync_packages_from_user

sync_packages_from_user(
    username: str, prune: bool = False
) -> SyncResult | None

Sync tracked packages with a PyPI user's current packages.

Fetches the user's packages from PyPI and adds any that aren't already being tracked. Optionally removes packages no longer associated with the user.

Parameters:

Name Type Description Default
username str

PyPI username to fetch packages from.

required
prune bool

If True, remove locally tracked packages not in user's PyPI account.

False

Returns:

Type Description
SyncResult | None

SyncResult with lists of added, already tracked, packages

SyncResult | None

not on remote, and pruned packages.

SyncResult | None

Returns None if unable to fetch from PyPI.

fetch_all_stats

fetch_all_stats(
    progress_callback: Callable[
        [int, int, str, PackageStats | None], None
    ]
    | None = None,
) -> FetchResult

Fetch and store stats for all tracked packages.

Skips packages that have been attempted within the last 24 hours. Uses batch commits for better performance when storing multiple packages.

Parameters:

Name Type Description Default
progress_callback Callable[[int, int, str, PackageStats | None], None] | None

Optional callback called for each package with (current_index, total_count, package_name, stats_or_none).

None

Returns:

Type Description
FetchResult

FetchResult with success/failure/skipped counts and results.

fetch_package_details

fetch_package_details(package: str) -> PackageDetails

Fetch detailed statistics for a single package.

Parameters:

Name Type Description Default
package str

Package name.

required

Returns:

Type Description
PackageDetails

PackageDetails with stats, Python versions, and OS breakdown.

get_stats

get_stats(
    with_growth: bool = False,
) -> list[dict[str, Any]]

Get latest stats for all packages.

Parameters:

Name Type Description Default
with_growth bool

If True, include growth metrics.

False

Returns:

Type Description
list[dict[str, Any]]

List of stats dictionaries ordered by total downloads.

get_history

get_history(
    package: str, limit: int = 30
) -> list[dict[str, Any]]

Get historical stats for a package.

Parameters:

Name Type Description Default
package str

Package name.

required
limit int

Maximum number of days to return.

30

Returns:

Type Description
list[dict[str, Any]]

List of historical stats ordered by date descending.

get_all_history

get_all_history(
    limit_per_package: int = 30,
) -> dict[str, list[dict[str, Any]]]

Get historical stats for all packages.

Parameters:

Name Type Description Default
limit_per_package int

Maximum days per package.

30

Returns:

Type Description
dict[str, list[dict[str, Any]]]

Dict mapping package names to their history.

generate_report

generate_report(
    output_file: str,
    include_env: bool = False,
    include_github: bool = False,
) -> bool

Generate HTML report for all packages.

Parameters:

Name Type Description Default
output_file str

Path to write HTML file.

required
include_env bool

If True, include Python/OS distribution summary.

False
include_github bool

If True, include GitHub stats (stars, forks, etc.) from cache. Packages without cached data are skipped.

False

Returns:

Type Description
bool

True if report was generated, False if no data available.

Raises:

Type Description
ValueError

If output path is invalid or not writable.

generate_package_report

generate_package_report(
    package: str, output_file: str
) -> bool

Generate detailed HTML report for a single package.

Parameters:

Name Type Description Default
package str

Package name.

required
output_file str

Path to write HTML file.

required

Returns:

Type Description
bool

True if report was generated.

Raises:

Type Description
ValueError

If output path is invalid or not writable.

fetch_package_releases

fetch_package_releases(
    package: str,
) -> tuple[list[PyPIRelease], list[GitHubRelease]]

Fetch PyPI and GitHub releases for a package.

Uses cached data when available (24h TTL).

Parameters:

Name Type Description Default
package str

Package name.

required

Returns:

Type Description
tuple[list[PyPIRelease], list[GitHubRelease]]

Tuple of (pypi_releases, github_releases).

generate_project_report

generate_project_report(
    package: str, output_file: str
) -> bool

Generate a project view HTML report for a single package.

Shows download history with release markers, release timeline, and environment distribution.

Parameters:

Name Type Description Default
package str

Package name.

required
output_file str

Path to write HTML file.

required

Returns:

Type Description
bool

True if report was generated.

Raises:

Type Description
ValueError

If output path is invalid.

export

export(
    format: str, output_file: str | None = None
) -> str | None

Export stats in the specified format.

Parameters:

Name Type Description Default
format str

One of 'csv', 'json', 'markdown', 'md'.

required
output_file str | None

Optional path to write output. If None, returns string.

None

Returns:

Type Description
str | None

Exported string, or None if no data available.

Raises:

Type Description
ValueError

If format is unknown or output path is invalid.

generate_badge

generate_badge(
    package: str,
    period: str = "total",
    color: str | None = None,
) -> str | None

Generate an SVG badge for a package's download count.

Parameters:

Name Type Description Default
package str

Package name.

required
period str

One of "total", "month", "week", "day".

'total'
color str | None

Badge color (default: auto-select based on count).

None

Returns:

Type Description
str | None

SVG string for the badge, or None if no stats available.

fetch_github_stats

fetch_github_stats(
    packages: list[str] | None = None,
    use_cache: bool = True,
) -> list[RepoResult]

Fetch GitHub repository stats for tracked packages.

Parameters:

Name Type Description Default
packages list[str] | None

Specific packages to fetch. If None, fetches all tracked.

None
use_cache bool

Whether to use cached GitHub API responses (24h TTL).

True

Returns:

Type Description
list[RepoResult]

List of RepoResult with stats or error for each package.

clear_github_cache

clear_github_cache(expired_only: bool = True) -> int

Clear GitHub API cache.

Parameters:

Name Type Description Default
expired_only bool

If True, only clear expired entries.

True

Returns:

Type Description
int

Number of entries cleared.

get_github_cache_stats

get_github_cache_stats() -> dict[str, int]

Get GitHub cache statistics.

Returns:

Type Description
dict[str, int]

Dict with 'total', 'valid', and 'expired' counts.

cleanup

cleanup() -> tuple[int, int]

Clean up orphaned stats and return counts.

Removes stats for packages that are no longer being tracked.

Returns:

Type Description
tuple[int, int]

Tuple of (orphaned_deleted, packages_remaining).

prune

prune(days: int = 365) -> int

Remove stats older than the specified number of days.

Parameters:

Name Type Description Default
days int

Delete stats older than this many days.

365

Returns:

Type Description
int

Number of records deleted.

get_database_info

get_database_info() -> DatabaseInfo

Get database statistics and metadata.

Returns:

Type Description
DatabaseInfo

DatabaseInfo with package count, record count, date range, and file size.

Data Classes

PackageInfo dataclass

Information about a tracked package.

FetchResult dataclass

Result of a fetch operation.

PackageDetails dataclass

Detailed statistics for a package.

SyncResult dataclass

Result of syncing packages from a PyPI user.