🌞

19-11-27 W: similarity suggestions

GOAL: suggest similar from note as I'm writing it.

How close am I? Start with this doc: Research vs prototyping: research is deeper.

ARCHITECTURE:

Link → representation (ping BERT as a service)

Get ml-box working!
Fix bugs, get link running.
Get whole text of file.
Find BERT as a service code, POST to it.

Embed all my Notion documents

DB table with their URL + Django admin page
Add the pages I want scraped
Add "scrape" action
Reproduce that error - seems like a lot of text is getting made?
Scrape document: get page, decide if DB or page. Print decision.
get or create Notion docs by URL... and/or ID?
get or create Text blobs by... text?
Users care about root nodes
Actually scrape everything!
Bookmarked docs: filterable, scrapeable, awesome!
Fix: web.models.NotionDocument.DoesNotExist: NotionDocument matching query does not exist.
Filter all should filter to anything
Fix: there only 10 results for People because it's choosing first view.
UPSERT text records
Make sure BERT works remotely!
Reinstall python, pip, django-numpy
Can't migrate with Numpy ArrayField. Fuck it - convert to list!
Create embeddings for all text as they're being scraped/processed!

BOOM!!! Amazing progress. Tomorrow: make a script that returns similarity for a document. Maybe a Bitbar plugin too?

Representation → similar (vector similarity).

BertClient connects over localhost:5555, which is forwarded to the server. Connection secured.

Bert server then tries to communicate back over 5556. To where? Localhost? A return IP? I have 5556 remotely forwarding back to local 5556.

would BertClient see it?

OK - BERT server receives a connection.

Trying to debug with https://jvns.ca/networking-zine.pdf sudo tcpdump port 5556 -w thing.pcap, but getting no packets. What am I misunderstanding? Read tcpdump, perhaps, and how it interacts with SSH forwarding.

OK - so I actually am receiving a bunch of traffic on 5556, but wait - is remotely forwarding to 5556 blocking my port here? No, I don't think so. 5555 has a listener on it, but not 5556.

Wait, I think it's because BertClient isn't listening on 5556? What is it supposed to be doing? Go read the code, and figure out what out_port does?

Next: I'm a bit stumped. Think about what could be going wrong. Seems like server is sending things back, but it's not getting to the client.

  • RemoteForward 5556.
  • Neither.
  • LocalForward 5556.
  • YUP! Makes sense, because both connections were initiated on the client. Awesome.

Well - I got something...!

WHILE WRITING A DOC - SEE WHAT'S SIMILAR!

Holy COW I can't believe how much better cosine similarity is than a simple dot product (for NLP - image interpretability research uses dot products). Cosine similarity normalizes their lengths, right?

These suggestions are just so awesome.