arn:aws:iam::<account-id>:role/<role-name>eu-central-1:Bucket: financialreports-bucket
Region: eu-central-1s3://financialreports-bucket/
├── financialreports/media/filings/ Raw filing documents (PDF, HTML, ZIP, XBRL)
├── processed/markdown/{shard}/{id}.md Processed markdown (one file per filing)
├── processed/html/{shard}/{id}.html Pre-rendered HTML (one file per filing)
└── manifests/ Parquet catalog (your entry point)
├── README.md
├── schema.json
├── filings/year=YYYY/part-00000.parquet
├── incremental/YYYY-MM-DD.parquet
├── companies.parquet
└── reference/
├── sources.parquet
├── filing_types.parquet
├── filing_categories.parquet
└── languages.parquetmanifests/ prefix is the queryable index of the entire platform. Start here — use it to discover filings, then pull the raw or processed content you need via the S3 keys in each row.manifests/filings/year=YYYY/part-00000.parquet| Column | Type | Description |
|---|---|---|
filing_id | int64 | Unique filing identifier |
company_id | int64 | FinancialReports company ID (join to companies.parquet) |
company_name | string | Legal name of the company |
company_tagline | string | Short company description |
ticker | string | Primary stock ticker |
is_listed | bool | Whether the company is currently listed |
main_stock_exchange | string | Primary exchange URL |
lei | string | Legal Entity Identifier (20-char) |
country_code | string | ISO 3166-1 alpha-2 country code |
sector_code | string | ISIC Section code |
industry_group_code | string | ISIC Division code |
industry_code | string | ISIC Group code |
sub_industry_code | string | ISIC Class code |
filing_type_id | int64 | Filing type ID (join to filing_types.parquet) |
filing_type_code | string | Short code (e.g. 10-K, AR, RNS) |
filing_type_name | string | Full name (e.g. Annual Report) |
filing_category | string | Top-level disclosure category |
source_id | int64 | Regulatory source ID (join to sources.parquet) |
source_name | string | Regulatory source name (e.g. SEC, BaFin) |
language_id | int64 | Language ID (join to languages.parquet) |
language_code | string | ISO 639-1 language code |
title | string | Filing title (if available) |
fiscal_year | int16 | Fiscal year the filing covers |
fiscal_period | string | Period: FY, Q1, Q2, Q3, Q4, H1, H2, 9M |
period_ending_date | date | End date of the reported period |
release_datetime | timestamp (UTC) | When the filing was published by the source |
dissemination_datetime | timestamp (UTC) | When the filing was disseminated to the public |
added_to_platform | timestamp (UTC) | When FinancialReports ingested the filing |
updated_date | timestamp (UTC) | Last modification timestamp |
file_extension | string | Original file format (PDF, HTML, ZIP, etc.) |
file_size | int64 | File size in bytes |
processing_status | string | COMPLETED, PENDING, FAILED, SKIPPED |
filing_type_confidence | float64 | AI classification confidence (0.0–1.0) |
language_confidence | float64 | Language detection confidence (0.0–1.0) |
raw_document_s3_key | string | S3 key for the original document |
markdown_s3_key | string | S3 key for the processed markdown |
html_s3_key | string | S3 key for the pre-rendered HTML |
companies.parquet via company_id.manifests/incremental/YYYY-MM-DD.parquetmanifests/companies.parquet| Column | Type | Description |
|---|---|---|
company_id | int64 | Unique company identifier |
name | string | Legal name |
slug | string | URL-safe identifier |
tagline | string | Short description |
ticker | string | Primary stock ticker |
is_listed | bool | Currently listed |
main_stock_exchange | string | Primary exchange URL |
lei | string | Legal Entity Identifier |
isins | list<string> | All ISINs associated with the company |
country_code | string | ISO 3166-1 alpha-2 |
sector_code | string | ISIC Section |
industry_group_code | string | ISIC Division |
industry_code | string | ISIC Group |
sub_industry_code | string | ISIC Class |
homepage_link | string | Company website |
ir_link | string | Investor relations page |
headcount | int64 | Number of employees |
shares_outstanding | int64 | Total shares outstanding |
date_ipo | date | IPO date |
date_public | date | Date public on FinancialReports |
year_founded | date | Founding date |
legal_status | string | ACTIVE, INACTIVE, MERGED, etc. |
| File | Rows | Key columns |
|---|---|---|
reference/sources.parquet | ~43 | id, name, url, description |
reference/filing_types.parquet | ~38 | id, code, name, category_name |
reference/filing_categories.parquet | ~11 | id, name |
reference/languages.parquet | ~186 | id, code, name, alpha3_code |
manifests/schema.jsonmanifest_version for tracking breaking changes.raw_document_s3_key, markdown_s3_key, and html_s3_key columns in the filings parquet are paths relative to the bucket root. To access a file:s3://financialreports-bucket/{key_from_parquet}processed/markdown/{filing_id // 10000}/{filing_id}.md
processed/html/{filing_id // 10000}/{filing_id}.htmlmanifest_version field is embedded in each parquet file's metadata and in schema.json. The current version is 1.1.isins column from filings parquet. ISINs are available in companies.parquet and should be joined via company_id. This eliminates redundant data for companies with large numbers of ISINs (e.g. structured product issuers).companies.parquet via company_id and use list_contains(c.isins, 'YOUR_ISIN').