🌞

19-11-27 W: similarity suggestions

GOAL: suggest similar from note as I'm writing it.

How close am I? Start with this doc: Research vs prototyping: research is deeper.

ARCHITECTURE:

Link → representation (ping BERT as a service)

Get ml-box working!
Fix bugs, get link running.
Get whole text of file.
Find BERT as a service code, POST to it.

Embed all my Notion documents

DB table with their URL + Django admin page
Add the pages I want scraped
Add "scrape" action
Reproduce that error - seems like a lot of text is getting made?
Scrape document: get page, decide if DB or page. Print decision.
get or create Notion docs by URL... and/or ID?
get or create Text blobs by... text?
Users care about root nodes
Actually scrape everything!
Bookmarked docs: filterable, scrapeable, awesome!
Fix: web.models.NotionDocument.DoesNotExist: NotionDocument matching query does not exist.
Filter all should filter to anything
Fix: there only 10 results for People because it's choosing first view.
UPSERT text records
Make sure BERT works remotely!
Reinstall python, pip, django-numpy
Can't migrate with Numpy ArrayField. Fuck it - convert to list!
Create embeddings for all text as they're being scraped/processed!

BOOM!!! Amazing progress. Tomorrow: make a script that returns similarity for a document. Maybe a Bitbar plugin too?

Representation → similar (vector similarity).

BertClient connects over localhost:5555, which is forwarded to the server. Connection secured.

Bert server then tries to communicate back over 5556. To where? Localhost? A return IP? I have 5556 remotely forwarding back to local 5556.

would BertClient see it?

OK - BERT server receives a connection.

Trying to debug with https://jvns.ca/networking-zine.pdf sudo tcpdump port 5556 -w thing.pcap, but getting no packets. What am I misunderstanding? Read tcpdump, perhaps, and how it interacts with SSH forwarding.

OK - so I actually am receiving a bunch of traffic on 5556, but wait - is remotely forwarding to 5556 blocking my port here? No, I don't think so. 5555 has a listener on it, but not 5556.

Wait, I think it's because BertClient isn't listening on 5556? What is it supposed to be doing? Go read the code, and figure out what out_port does?

Next: I'm a bit stumped. Think about what could be going wrong. Seems like server is sending things back, but it's not getting to the client.

  • RemoteForward 5556.
  • Traceback (most recent call last):
      File "/Users/jasonbenn/.pyenv/versions/worldview/lib/python3.7/site-packages/bert_serving/client/__init__.py", line 211, in arg_wrapper
        return func(self, *args, **kwargs)
      File "/Users/jasonbenn/.pyenv/versions/worldview/lib/python3.7/site-packages/bert_serving/client/__init__.py", line 237, in server_config
        return jsonapi.loads(self._recv(req_id).content[1])
      File "/Users/jasonbenn/.pyenv/versions/worldview/lib/python3.7/site-packages/bert_serving/client/__init__.py", line 169, in _recv
        raise e
      File "/Users/jasonbenn/.pyenv/versions/worldview/lib/python3.7/site-packages/bert_serving/client/__init__.py", line 157, in _recv
        response = self.receiver.recv_multipart()
      File "/Users/jasonbenn/.pyenv/versions/worldview/lib/python3.7/site-packages/zmq/sugar/socket.py", line 475, in recv_multipart
        parts = [self.recv(flags, copy=copy, track=track)]
      File "zmq/backend/cython/socket.pyx", line 791, in zmq.backend.cython.socket.Socket.recv
      File "zmq/backend/cython/socket.pyx", line 827, in zmq.backend.cython.socket.Socket.recv
      File "zmq/backend/cython/socket.pyx", line 191, in zmq.backend.cython.socket._recv_copy
      File "zmq/backend/cython/socket.pyx", line 186, in zmq.backend.cython.socket._recv_copy
      File "zmq/backend/cython/checkrc.pxd", line 19, in zmq.backend.cython.checkrc._check_rc
    zmq.error.Again: Resource temporarily unavailable
  • Neither.
  • Traceback (most recent call last):
      File "/Users/jasonbenn/.pyenv/versions/worldview/lib/python3.7/site-packages/bert_serving/client/__init__.py", line 211, in arg_wrapper
        return func(self, *args, **kwargs)
      File "/Users/jasonbenn/.pyenv/versions/worldview/lib/python3.7/site-packages/bert_serving/client/__init__.py", line 237, in server_config
        return jsonapi.loads(self._recv(req_id).content[1])
      File "/Users/jasonbenn/.pyenv/versions/worldview/lib/python3.7/site-packages/bert_serving/client/__init__.py", line 169, in _recv
        raise e
      File "/Users/jasonbenn/.pyenv/versions/worldview/lib/python3.7/site-packages/bert_serving/client/__init__.py", line 157, in _recv
        response = self.receiver.recv_multipart()
      File "/Users/jasonbenn/.pyenv/versions/worldview/lib/python3.7/site-packages/zmq/sugar/socket.py", line 475, in recv_multipart
        parts = [self.recv(flags, copy=copy, track=track)]
      File "zmq/backend/cython/socket.pyx", line 791, in zmq.backend.cython.socket.Socket.recv
      File "zmq/backend/cython/socket.pyx", line 827, in zmq.backend.cython.socket.Socket.recv
      File "zmq/backend/cython/socket.pyx", line 191, in zmq.backend.cython.socket._recv_copy
      File "zmq/backend/cython/socket.pyx", line 186, in zmq.backend.cython.socket._recv_copy
      File "zmq/backend/cython/checkrc.pxd", line 19, in zmq.backend.cython.checkrc._check_rc
    zmq.error.Again: Resource temporarily unavailable
  • LocalForward 5556.
  • YUP! Makes sense, because both connections were initiated on the client. Awesome.

Well - I got something...!

WHILE WRITING A DOC - SEE WHAT'S SIMILAR!

Holy COW I can't believe how much better cosine similarity is than a simple dot product (for NLP - image interpretability research uses dot products). Cosine similarity normalizes their lengths, right?

These suggestions are just so awesome.