đź“‘ HTML Report Template#
This document shows the use of pv_evaluation to automatically report on a disambiguation’s performance using the pv_evaluation.templates.render_inventor_disambiguation_report()
function.
This function requires:
A list of disambiguations saved to file (tables with a “mention_id” column and a second column representing cluster ID assignment).
A “inventor_not_disambiguated” file with the columns “patent_id”, “inventor_sequence”, “raw_inventor_name_first”, and “raw_inventor_name_last”. For granted patents, this should be the “g_inventor_not_disambiguated.tsv” file from PatentsView’s bulk data downloads.
Below, we download “g_inventor_not_disambiguated.tsv” and prepare a set of disambiguations to evaluate.
Data Preparation#
Downloading “g_inventor_not_disambiguated.tsv” and the file containing persistent inventor disambiguations:
import pandas as pd
import wget
import zipfile
import os
if not os.path.isfile("g_inventor_not_disambiguated.tsv"):
wget.download("https://s3.amazonaws.com/data.patentsview.org/download/g_inventor_not_disambiguated.tsv.zip")
with zipfile.ZipFile("g_inventor_not_disambiguated.tsv.zip", 'r') as zip_ref:
zip_ref.extractall(".")
os.remove("g_inventor_not_disambiguated.tsv.zip")
if not os.path.isfile("g_persistent_inventor.tsv"):
wget.download("https://s3.amazonaws.com/data.patentsview.org/download/g_persistent_inventor.tsv.zip")
with zipfile.ZipFile("g_persistent_inventor.tsv.zip", 'r') as zip_ref:
zip_ref.extractall(".")
os.remove("g_persistent_inventor.tsv.zip")
Preparing a set of distinct disambiguations saved to file:
if not os.path.isfile("disambiguation_20211230.tsv") or not os.path.isfile("disambiguation_20220630.tsv"):
g_persistent_inventor = pd.read_csv("g_persistent_inventor.tsv", sep="\t", dtype=str)
g_persistent_inventor["mention_id"] = "US" + g_persistent_inventor.patent_id + "-" + g_persistent_inventor.sequence
g_persistent_inventor.set_index("mention_id").disamb_inventor_id_20211230.to_csv("disambiguation_20211230.tsv", sep="\t")
g_persistent_inventor.set_index("mention_id").disamb_inventor_id_20220630.to_csv("disambiguation_20220630.tsv", sep="\t")
Rendering Report#
We can now generate the report using the render_inventor_disambiguation_report()
function. The results are saved to the current folder “.”.
Note that, if we wish to compare more disambiguations, then we can add more files to the list disambiguation_files
.
from pv_evaluation.templates import render_inventor_disambiguation_report
render_inventor_disambiguation_report(".", disambiguation_files=["disambiguation_20211230.tsv", "disambiguation_20220630.tsv"],
inventor_not_disambiguated_file="g_inventor_not_disambiguated.tsv")
Starting python3 kernel...Done
Executing 'index.ipynb'
Cell 1/30...Done
Cell 2/30...Done
Cell 3/30...Done
Cell 4/30...Done
Cell 5/30...Done
Cell 6/30...Done
Cell 7/30...Done
Cell 8/30...Done
Cell 9/30...Done
Cell 10/30...Done
Cell 11/30...Done
Cell 12/30...Done
Cell 13/30...Done
Cell 14/30...Done
Cell 15/30...Done
Cell 16/30...Done
Cell 17/30...Done
Cell 18/30...Done
Cell 19/30...Done
Cell 20/30...Done
Cell 21/30...Done
Cell 22/30...Done
Cell 23/30...Done
Cell 24/30...Done
Cell 25/30...Done
Cell 26/30...Done
Cell 27/30...Done
Cell 28/30...Done
Cell 29/30...Done
Cell 30/30...Done
WARNING: Warning: diff of engine output timed out. No source lines will be available.
pandoc
to: html
output-file: index.html
standalone: true
self-contained: true
section-divs: true
html-math-method: mathjax
wrap: none
default-image-extension: png
toc: true
toc-depth: 3
metadata
document-css: false
link-citations: true
date-format: long
lang: en
title: Inventor Disambiguation Report
date: today
author: PatentsView-Evaluation
toc-location: left
jupyter: python3
theme: cosmo
fig-cap-location: margin
code-copy: true
code-block-border-left: '#31BAE9'
Output created: index.html
Output#
The result can be seen at https://patentsview.github.io/PatentsView-Evaluation/source/examples/templates/index.html