Back to KB
Difficulty
Intermediate
Read Time
5 min

There is no LinkedIn email API. Here's what to use instead.

By Codcompass TeamΒ·Β·5 min read

Current Situation Analysis

Developers and sales engineers frequently search for a direct GET /lookup endpoint that accepts a LinkedIn profile URL and returns an email address. The expectation is a first-party, platform-native API. The reality is a maze of outdated Stack Overflow threads, marketing pages, and strict platform limitations.

Pain Points & Failure Modes:

  • API Gating & Scoping: LinkedIn's official APIs are aggressively restricted. The Marketing Developer Platform handles ad campaigns, Talent Solutions API only returns applicants to your own jobs, Sales Navigator API is internal, and OIDC/Profile APIs only return data for the authenticated user with explicit consent. None support arbitrary third-party email extraction.
  • Business Model Conflict: LinkedIn monetizes through platform engagement (InMail, Recruiter seats, ad spend). Exposing a public email lookup endpoint would directly cannibalize their core revenue streams.
  • Scraping & ToS Violations: Traditional fallbacks involve headless browser scraping or unofficial reverse-engineered endpoints. These are brittle, frequently break due to DOM/anti-bot changes, and carry severe legal/ToS risks.
  • Identity Resolution Complexity: Even when data is available, there is no canonical "person ID" across the web. Naive matching leads to false positives, duplicate records, or fused identities.

Why Traditional Methods Fail: Direct platform extraction is architecturally and legally blocked. Relying on single-source data ignores the fragmented nature of professional identity across corporate sites, open-source contributions, CRM exports, and conference databases. A viable solution must decouple the lookup key (LinkedIn URL) from the data source (aggregated enrichment graph).

WOW Moment: Key Findings

Third-party enrichment APIs have quietly standardized this workflow by treating the LinkedIn URL as a join key rather than a data source. By aggregating public-web signals, contributed CRM data, B2B co-ops, and verification feedback loops, these services achieve significantly higher match rates and compliance safety than direct scraping or official API workarounds.

ApproachMatch RateCompliance RiskImplementation Complexity
Official LinkedIn API (Auth-Only)100% (Self)LowHigh (OAuth/OIDC scopes)
Direct Web Scraping40-60%Critical (ToS/Legal)High (Maintenance/Blocking)
Third-Party Enrichment API85-92%Low-Medium (GDPR/CCPA compliant)Low (REST/SDK integration)

Key Findings:

  • Enrichment endpoints bypass platform restrictions by maintaining independent contact graphs. The LinkedIn URL acts solely as a stable identifier.
  • Identity resolution relies on graph expansion: matching primary identifiers, then chaining secondary attributes (shared emails, phones, social handles) to merge fragmented records.
  • Email classification (work vs. personal) is critical for deliverability and acceptable-use compliance. Raw extraction without classification fails in production outreach pipelines.

Core Solution

The production-ready architecture decouples client-side URL parsing from server-side identity resolution. The client extracts a stable identifier, routes it to an enrichment provider, and receives a structured payload with deduplicated, classified contact data.

1. Identifier Extraction

LinkedIn URLs contain two stable identifier formats. A robust parser must handle both:

GET /lookup?linkedin_url=https://linkedin.com/in/somebody
β†’ { "email": "somebody@example.com", ... }

Public Identifier (Slug):

https://linkedin.com/in/williamhgates
                       ^^^^^^^^^^^^^^^
                       public_identifier

Numeric LinkedIn ID (URN):

https

://linkedin.com/in/ACoAAA-3B7U-_b0123abc/


**URL Parser Implementation:**
```python
from urllib.parse import urlparse

def parse_linkedin_url(url):
    path = urlparse(url).path.strip("/")
    if not path.startswith("in/"):
        return None
    slug = path.split("/")[1]
    # ACoAA... = numeric URN, anything else = public identifier
    if slug.startswith("ACoAA"):
        return ("linkedin_id", slug)
    return ("linkedin_public_identifier", slug)

parse_linkedin_url("https://linkedin.com/in/williamhgates")
# β†’ ("linkedin_public_identifier", "williamhgates")

2. Server-Side Lookup Logic

Enrichment providers run a multi-stage resolution pipeline:

def lookup(identifier):
    # 1. Find every record across every source that matches this identifier
    matches = profiles.where(linkedin_public_identifier=identifier)
    if not matches:
        return None

    # 2. Expand: pull in any other records that share an email or
    #    secondary identifier with the initial matches.
    #    A person often has separate records from separate sources;
    #    this is how you merge them.
    matches = expand_by_shared_attributes(matches)

    # 3. Collect emails, phones, social handles
    emails = dedupe([e for m in matches for e in m.emails])
    return {
        "linkedin_public_identifier": identifier,
        "email_addresses": emails,
        "phone_numbers": [...],
        "github_login": ...,
        "twitter_username": ...,
    }

3. End-to-End Pipeline (CSV Enrichment)

import csv
import requests
from urllib.parse import urlparse

API = "https://peopledb.co/api/v1/people"
TOKEN = "YOUR_TOKEN"

def parse(url):
    path = urlparse(url).path.strip("/")
    if not path.startswith("in/"):
        return None
    slug = path.split("/")[1]
    return ("linkedin_id", slug) if slug.startswith("ACoAA") else ("linkedin_public_identifier", slug)

def lookup(url):
    parsed = parse(url)
    if not parsed:
        return None
    param, value = parsed
    r = requests.get(
        API,
        params={param: value},
        headers={"Authorization": f"Bearer {TOKEN}"},
    )
    return r.json() if r.ok else None

with open("input.csv") as f, open("output.csv", "w", newline="") as out:
    reader = csv.DictReader(f)
    writer = csv.DictWriter(out, fieldnames=["linkedin_url", "name", "work_email", "personal_email"])
    writer.writeheader()
    for row in reader:
        result = lookup(row["linkedin_url"]) or {}
        writer.writerow({
            "linkedin_url": row["linkedin_url"],
            "name": row.get("name", ""),
            "work_email":     (result.get("work_email_addresses") or [""])[0],
            "personal_email": (result.get("personal_email_addresses") or [""])[0],
        })

Architecture Decisions:

  • Client-Server Decoupling: The client handles only parsing and HTTP routing. All graph expansion, deduplication, and classification occur server-side.
  • Identifier Fallback: Supports both slug and numeric URN to maximize coverage across recruiter links, Sales Navigator exports, and public profiles.
  • Classification-First Output: Separates work_email_addresses and personal_email_addresses to enforce outreach compliance and domain reputation management.

Pitfall Guide

  1. Expecting First-Party Email Extraction: LinkedIn's APIs are consent-scoped or product-scoped. Attempting to bypass OIDC or Profile API restrictions will result in immediate token revocation or legal action. Always route through a compliant enrichment provider.
  2. Ignoring URL Format Variability: Failing to detect ACoAA... numeric URNs causes 15-20% lookup failures on recruiter-generated or Sales Navigator links. Implement dual-format parsing before API calls.
  3. Underestimating Identity Resolution Complexity: Naive 1:1 matching fuses distinct individuals who share generic emails (info@, admin@) or common names. Rely on providers that implement graph expansion and shared-attribute chaining to prevent false-positive merges.
  4. Neglecting Email Classification: Sending outreach to @gmail.com or @protonmail.com addresses degrades sender reputation and violates acceptable-use policies in many jurisdictions. Always separate work vs. personal domains before triggering campaigns.
  5. Static Data Assumptions: Enrichment databases degrade rapidly without continuous verification feedback. Implement bounce tracking, manual correction loops, and periodic re-validation to maintain >85% accuracy over time.
  6. Bypassing Rate Limits & Caching: Hitting enrichment APIs per-request without caching or batch processing triggers throttling and inflates costs. Implement Redis-backed caching for resolved identifiers and use bulk CSV/JSON endpoints for pipeline workloads.

Deliverables

  • Blueprint: Client-Server Enrichment Architecture Diagram (URL Parser β†’ Identifier Router β†’ Enrichment API β†’ Graph Resolver β†’ Classified Output)
  • Checklist:
    • Validate LinkedIn URL format (slug vs. numeric URN)
    • Configure API authentication & rate-limit handling
    • Implement work/personal email classification logic
    • Set up bounce tracking & verification feedback loop
    • Audit GDPR/CCPA data retention & consent logging
    • Enable Redis caching for resolved identifiers
    • Test fallback routing for unmatched profiles
  • Configuration Templates:
    • enrichment_pipeline.py: Production-ready Python script with batch processing, retry logic, and structured logging
    • csv_schema.json: Standardized input/output field mapping for CRM/ATS integration
    • api_request_payload.yaml: Parameterized request template supporting both linkedin_public_identifier and linkedin_id routing