Parse UTF-16 surrogates pairs for calculating pattern's position (#11915)

<!-- Enter a brief description/summary of your PR here. What does it fix/what does it change/how was it tested (even manually, if necessary)? -->
## Summary of the Pull Request

Properly handle UTF-16 surrogates when calculating the position of matched pattern.

Fix #8709

<!-- Other than the issue solved, is this relevant to any other issues/existing PRs? --> 
## References
b88ffb21b0/src/buffer/out/search.cpp (L335-L339)

<!-- Please review the items on the PR checklist before submitting-->
## PR Checklist
* [ ] Closes #8709
* [x] CLA signed. If not, go over [here](https://cla.opensource.microsoft.com/microsoft/Terminal) and sign the CLA
* [ ] Tests added/passed
* [ ] Documentation updated. If checked, please file a pull request on [our docs repo](https://github.com/MicrosoftDocs/terminal) and link it here: #xxx
* [ ] Schema updated.
* [ ] I've discussed this with core contributors already. If not checked, I'm ready to accept this work might be rejected in favor of a different grand plan. Issue number where discussion took place: #xxx

<!-- Provide a more detailed description of the PR, other things fixed or any additional comments/features here -->
## Detailed Description of the Pull Request / Additional comments
use `Utf16Parser::Parse` to handle code points from U+010000 to U+10FFFF in UTF-16.

<!-- Describe how you validated the behavior. Add automated tests wherever possible, but list manual validation steps taken as well -->
## Validation Steps Performed

![image](https://user-images.githubusercontent.com/1068203/145421736-c842c7d4-0136-42d0-ad72-f004f58d9e3b.png)

also the case by @mas90  https://github.com/microsoft/terminal/issues/8709#issuecomment-884915485:

![image](https://user-images.githubusercontent.com/1068203/145420264-3fe220b4-42c5-44ac-aa94-4e604b164ed3.png)
This commit is contained in:
Comzyh 2021-12-10 02:42:12 +08:00 committed by GitHub
parent 509ecb1050
commit a2d96d6b1f
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -8,6 +8,7 @@
#include "../types/inc/utils.hpp"
#include "../types/inc/convert.hpp"
#include "../../types/inc/Utf16Parser.hpp"
#include "../../types/inc/GlyphWidth.hpp"
#pragma hdrstop
@ -2585,16 +2586,17 @@ PointTree TextBuffer::GetPatterns(const size_t firstRow, const size_t lastRow) c
// match and the previous match, so we use the size of the prefix
// along with the size of the match to determine the locations
size_t prefixSize = 0;
for (const auto ch : i->prefix().str())
for (const std::vector<wchar_t> parsedGlyph : Utf16Parser::Parse(i->prefix().str()))
{
prefixSize += IsGlyphFullWidth(ch) ? 2 : 1;
const std::wstring_view glyph{ parsedGlyph.data(), parsedGlyph.size() };
prefixSize += IsGlyphFullWidth(glyph) ? 2 : 1;
}
const auto start = lenUpToThis + prefixSize;
size_t matchSize = 0;
for (const auto ch : i->str())
for (const std::vector<wchar_t> parsedGlyph : Utf16Parser::Parse(i->str()))
{
matchSize += IsGlyphFullWidth(ch) ? 2 : 1;
const std::wstring_view glyph{ parsedGlyph.data(), parsedGlyph.size() };
matchSize += IsGlyphFullWidth(glyph) ? 2 : 1;
}
const auto end = start + matchSize;
lenUpToThis = end;