Homepage
    🌞

    19-11-27 W: similarity suggestions

    GOAL: suggest similar from note as I'm writing it.

    How close am I? Start with this doc: Research vs prototyping: research is deeper.

    ARCHITECTURE:

    Link → representation (ping BERT as a service)

    Get ml-box working!
    Fix bugs, get link running.
    Get whole text of file.
    Find BERT as a service code, POST to it.

    Embed all my Notion documents

    DB table with their URL + Django admin page
    Add the pages I want scraped
    Add "scrape" action
    Reproduce that error - seems like a lot of text is getting made?
    Scrape document: get page, decide if DB or page. Print decision.
    get or create Notion docs by URL... and/or ID?
    get or create Text blobs by... text?
    Users care about root nodes
    Actually scrape everything!
    Bookmarked docs: filterable, scrapeable, awesome!
    Fix: web.models.NotionDocument.DoesNotExist: NotionDocument matching query does not exist.
    Filter all should filter to anything
    Fix: there only 10 results for People because it's choosing first view.
    UPSERT text records
    Make sure BERT works remotely!
    Reinstall python, pip, django-numpy
    Can't migrate with Numpy ArrayField. Fuck it - convert to list!
    Create embeddings for all text as they're being scraped/processed!

    BOOM!!! Amazing progress. Tomorrow: make a script that returns similarity for a document. Maybe a Bitbar plugin too?

    Representation → similar (vector similarity).

    BertClient connects over localhost:5555, which is forwarded to the server. Connection secured.

    Bert server then tries to communicate back over 5556. To where? Localhost? A return IP? I have 5556 remotely forwarding back to local 5556.

    would BertClient see it?

    OK - BERT server receives a connection.

    Trying to debug with https://jvns.ca/networking-zine.pdf sudo tcpdump port 5556 -w thing.pcap, but getting no packets. What am I misunderstanding? Read tcpdump, perhaps, and how it interacts with SSH forwarding.

    OK - so I actually am receiving a bunch of traffic on 5556, but wait - is remotely forwarding to 5556 blocking my port here? No, I don't think so. 5555 has a listener on it, but not 5556.

    Wait, I think it's because BertClient isn't listening on 5556? What is it supposed to be doing? Go read the code, and figure out what out_port does?

    Next: I'm a bit stumped. Think about what could be going wrong. Seems like server is sending things back, but it's not getting to the client.

    • RemoteForward 5556.
    • Neither.
    • LocalForward 5556.
    • YUP! Makes sense, because both connections were initiated on the client. Awesome.

    Well - I got something...!

    WHILE WRITING A DOC - SEE WHAT'S SIMILAR!

    Holy COW I can't believe how much better cosine similarity is than a simple dot product (for NLP - image interpretability research uses dot products). Cosine similarity normalizes their lengths, right?

    These suggestions are just so awesome.

    Traceback (most recent call last):
      File "/Users/jasonbenn/.pyenv/versions/worldview/lib/python3.7/site-packages/bert_serving/client/__init__.py", line 211, in arg_wrapper
        return func(self, *args, **kwargs)
      File "/Users/jasonbenn/.pyenv/versions/worldview/lib/python3.7/site-packages/bert_serving/client/__init__.py", line 237, in server_config
        return jsonapi.loads(self._recv(req_id).content[1])
      File "/Users/jasonbenn/.pyenv/versions/worldview/lib/python3.7/site-packages/bert_serving/client/__init__.py", line 169, in _recv
        raise e
      File "/Users/jasonbenn/.pyenv/versions/worldview/lib/python3.7/site-packages/bert_serving/client/__init__.py", line 157, in _recv
        response = self.receiver.recv_multipart()
      File "/Users/jasonbenn/.pyenv/versions/worldview/lib/python3.7/site-packages/zmq/sugar/socket.py", line 475, in recv_multipart
        parts = [self.recv(flags, copy=copy, track=track)]
      File "zmq/backend/cython/socket.pyx", line 791, in zmq.backend.cython.socket.Socket.recv
      File "zmq/backend/cython/socket.pyx", line 827, in zmq.backend.cython.socket.Socket.recv
      File "zmq/backend/cython/socket.pyx", line 191, in zmq.backend.cython.socket._recv_copy
      File "zmq/backend/cython/socket.pyx", line 186, in zmq.backend.cython.socket._recv_copy
      File "zmq/backend/cython/checkrc.pxd", line 19, in zmq.backend.cython.checkrc._check_rc
    zmq.error.Again: Resource temporarily unavailable
    Traceback (most recent call last):
      File "/Users/jasonbenn/.pyenv/versions/worldview/lib/python3.7/site-packages/bert_serving/client/__init__.py", line 211, in arg_wrapper
        return func(self, *args, **kwargs)
      File "/Users/jasonbenn/.pyenv/versions/worldview/lib/python3.7/site-packages/bert_serving/client/__init__.py", line 237, in server_config
        return jsonapi.loads(self._recv(req_id).content[1])
      File "/Users/jasonbenn/.pyenv/versions/worldview/lib/python3.7/site-packages/bert_serving/client/__init__.py", line 169, in _recv
        raise e
      File "/Users/jasonbenn/.pyenv/versions/worldview/lib/python3.7/site-packages/bert_serving/client/__init__.py", line 157, in _recv
        response = self.receiver.recv_multipart()
      File "/Users/jasonbenn/.pyenv/versions/worldview/lib/python3.7/site-packages/zmq/sugar/socket.py", line 475, in recv_multipart
        parts = [self.recv(flags, copy=copy, track=track)]
      File "zmq/backend/cython/socket.pyx", line 791, in zmq.backend.cython.socket.Socket.recv
      File "zmq/backend/cython/socket.pyx", line 827, in zmq.backend.cython.socket.Socket.recv
      File "zmq/backend/cython/socket.pyx", line 191, in zmq.backend.cython.socket._recv_copy
      File "zmq/backend/cython/socket.pyx", line 186, in zmq.backend.cython.socket._recv_copy
      File "zmq/backend/cython/checkrc.pxd", line 19, in zmq.backend.cython.checkrc._check_rc
    zmq.error.Again: Resource temporarily unavailable