đź“‘ HTML Report Template#

This document shows the use of pv_evaluation to automatically report on a disambiguation’s performance using the pv_evaluation.templates.render_inventor_disambiguation_report() function.

This function requires:

  • A list of disambiguations saved to file (tables with a “mention_id” column and a second column representing cluster ID assignment).

  • A “inventor_not_disambiguated” file with the columns “patent_id”, “inventor_sequence”, “raw_inventor_name_first”, and “raw_inventor_name_last”. For granted patents, this should be the “g_inventor_not_disambiguated.tsv” file from PatentsView’s bulk data downloads.

Below, we download “g_inventor_not_disambiguated.tsv” and prepare a set of disambiguations to evaluate.

Data Preparation#

Downloading “g_inventor_not_disambiguated.tsv” and the file containing persistent inventor disambiguations:

import pandas as pd
import wget
import zipfile
import os

if not os.path.isfile("g_inventor_not_disambiguated.tsv"):
    with zipfile.ZipFile("g_inventor_not_disambiguated.tsv.zip", 'r') as zip_ref:

if not os.path.isfile("g_persistent_inventor.tsv"):
    with zipfile.ZipFile("g_persistent_inventor.tsv.zip", 'r') as zip_ref:

Preparing a set of distinct disambiguations saved to file:

if not os.path.isfile("disambiguation_20211230.tsv") or not os.path.isfile("disambiguation_20220630.tsv"):
    g_persistent_inventor = pd.read_csv("g_persistent_inventor.tsv", sep="\t", dtype=str)
    g_persistent_inventor["mention_id"] = "US" + g_persistent_inventor.patent_id + "-" + g_persistent_inventor.sequence

    g_persistent_inventor.set_index("mention_id").disamb_inventor_id_20211230.to_csv("disambiguation_20211230.tsv", sep="\t")
    g_persistent_inventor.set_index("mention_id").disamb_inventor_id_20220630.to_csv("disambiguation_20220630.tsv", sep="\t")

Rendering Report#

We can now generate the report using the render_inventor_disambiguation_report() function. The results are saved to the current folder “.”.

Note that, if we wish to compare more disambiguations, then we can add more files to the list disambiguation_files.

from pv_evaluation.templates import render_inventor_disambiguation_report

render_inventor_disambiguation_report(".", disambiguation_files=["disambiguation_20211230.tsv", "disambiguation_20220630.tsv"],
Starting python3 kernel...Done

Executing 'index.ipynb'
  Cell 1/30...Done
  Cell 2/30...Done
  Cell 3/30...Done
  Cell 4/30...Done
  Cell 5/30...Done
  Cell 6/30...Done
  Cell 7/30...Done
  Cell 8/30...Done
  Cell 9/30...Done
  Cell 10/30...Done
  Cell 11/30...Done
  Cell 12/30...Done
  Cell 13/30...Done
  Cell 14/30...Done
  Cell 15/30...Done
  Cell 16/30...Done
  Cell 17/30...Done
  Cell 18/30...Done
  Cell 19/30...Done
  Cell 20/30...Done
  Cell 21/30...Done
  Cell 22/30...Done
  Cell 23/30...Done
  Cell 24/30...Done
  Cell 25/30...Done
  Cell 26/30...Done
  Cell 27/30...Done
  Cell 28/30...Done
  Cell 29/30...Done
  Cell 30/30...Done

WARNING: Warning: diff of engine output timed out. No source lines will be available.
  to: html
  output-file: index.html
  standalone: true
  self-contained: true
  section-divs: true
  html-math-method: mathjax
  wrap: none
  default-image-extension: png
  toc: true
  toc-depth: 3
  document-css: false
  link-citations: true
  date-format: long
  lang: en
  title: Inventor Disambiguation Report
  date: today
  author: PatentsView-Evaluation
  toc-location: left
  jupyter: python3
  theme: cosmo
  fig-cap-location: margin
  code-copy: true
  code-block-border-left: '#31BAE9'
Output created: index.html


The result can be seen at https://patentsview.github.io/PatentsView-Evaluation/source/examples/templates/index.html